Because much of the earliest work in computing and networking occurred in the United States and Europe, the Latin alphabet and its conventions—such as a left-to-right ordering of characters—got baked into software and hardware. But after spending years as the general manager of a domain registry for the Asia Pacific region, Don Hollander became convinced that Internet applications should support as many languages and writing systems as possible.
Which is why Hollander is now the secretary general of the Universal Acceptance Steering Group (UASG), a group that champions the idea that all valid top-level domains (TLDs), such as .com, .tech, or .信息, should function in any Web or email application. In the process, not only would the Web become more globally accessible, but companies would also be able to make sales or capture customer information that they currently lose, with the UASG estimating that the economic benefits could be US $9.8 billion per annum.
“The domain name space has changed a lot in the last few years,” Hollander says. Originally, TLDs used to be either three letters long, such as .edu, or two letters long, for country codes like .de. But around 2010, things started changing. People were clamoring for more diversity in what could be used for a TLD.
That led to two big changes. First was the creation of extended gTLDS—generic TLDs that can be three letters or longer—which is why .law and .info are now valid options (the UASG website itself uses .tech). Second, TLDs could be set up in languages that don’t use the Latin alphabet, allowing general Unicode characters in email addresses and TLDs. By 2013, over 2,000 new TLDs had been established.
By 2015, Hollander says, the ability to handle these new and various TLDs had been largely sorted out at the Domain Name System (DNS) level—that is, at the level of the directories that manage TLDs and associate them with specific numeric Internet addresses. (There are still some problems, however. Emojis are fickle because from a code perspective, the same emoji can be composed in multiple ways. That’s why emoji-based URLs, while they do exist, are difficult to work with.)
The remaining challenge, according to Hollander, is spreading the word, because it doesn’t matter if everything works at the network level if the code driving specific applications still supports only two- or three-letter TLDs and Latin-character email addresses. And unfortunately, many application developers have not kept up with the times.
Creating a software routine to check if an email address or TLD was valid used to be pretty straightforward. Ten years ago, if an application asked a user for an email address, for example, the developer could check if the response was valid by testing it in a few simple ways: It should have the symbol @, and it should end in a period followed by two or three letters. If it didn’t pass those tests, it was garbage.
When longer domain names and Unicode came along, those developers’ tests got more convoluted. “Now, I need to look for 2, 3, 4, 6, or 7 characters,” Hollander says. Nevertheless, it’s a largely solved problem: “It’s not a hard fix,” he says, adding that there is plenty of code available on GitHub and Stack Overflow for developers looking to make sure their applications are universal-acceptance compliant. For those looking to dig deeper into the issue, the UASG’s website offers documentation and links to relevant standards. UASG also has information about various languages and code libraries and which ones are up to date. (Hollander says, for example, that Python is currently not up to date.)
Ultimately, universal acceptance is an easy way to make the Internet more accessible for the billions of people whose first language is not written in Latin characters. Hollander wants developers to be mindful of that. “The world changed, and they should bring their systems up to date,” he says.
This article appears in the June 2019 print issue as “The Universal Internet.”