Focusing on Internationalized Domain Names
May 29, 2008
In SEO web development, Internet domain names which potentially carry non-ASCII characters are referred to as internationalized domain names. These domain names carry letters with diacritic required by non-English languages, as well as characters from non-Latin text like Hindi, Chinese, and Arabic. They have many purposes, some of which would be in developing good websites and efficient lead generating
But standard for domain names prohibits such characters, although in recent years a lot of work has been made to internationalize domain names and have them in standard ASCII format to maintain the domain name system’s stability.
IDN was originally proposed in the mid-90s before its implementation in 1998. Finally, a system referred to as Internationalizing Domain Name in Applications or IDNA was chosen as the standard, eventually rolled out in a handful of top level domains.
The phrase internationalized domain name refers specifically to a domain name that consists of labels for which successful application of the IDNA ToASCII can be made. By March 2008 updating of the current IDNA protocol commenced with the formation of another IDN Working Group.
Conversion from ASCII to non-ASCII forms can be accomplished through algorithms called ToUnicode and ToASCII. The said algorithm are applied to individual labels instead to domain names as a whole. Thus, if a certain domain name is www.website.com, the labels would be www, website, and com. In the same way, ToASCII and ToUnicode will be applied to each labels separately.
The particulars of the algorithms are complex, and they are identified in the RFCs that are indicated at the conclusion of this article.
ToASCII makes any ASCII label unchanged, however it will not work if the label is not suited for DNS. ToASCII will also utilize the Nameprep algorithm if it is provided with a label that has at least a non-ASCII character. It will then convert the result to ASCII by using the Punycode prior to ending the 4-character string tagged as “xn—“. Named the ACE prefix, this 4-character string distinguishes ordinary ASCCI labels from Punycode-encoded labels. The ToASCII algorithm wont work in several ways like having the last string exceeding the limit of 63 characters for DNS. Any label on which the ToASCII algorithm fails to work cannot be utilized in internationalized domain names that are used for website development.
By stripping off ACE prefix and then applying Punycode decode algorithm, ToUnicode overturns any action made by ToASCIII. It however does not overturn Nameprep processing because it is merely normalization, meaning that it is irreversible by nature. ToUnicode however is more likely to succeed unlike ToASCII.
Comments
Got something to say?

