news
The reason behind the appearance of those strange symbols in internet links
While browsing the internet at any time, you have definitely noticed a strange phenomenon that happens to links from time to time, especially when copying a website link or webpage and pasting it elsewhere. You may find that the simple link like “مصر/example.com” suddenly turns into a complex string of symbols when copied from the browser: “example.com/%D9%85%D8%B5%D8%B1“. You can try this yourself, just copy a link containing Arabic characters and you will see that when pasted elsewhere, the letters are replaced by strange symbols consisting of the percentage sign (%), numbers, and English letters. Of course, this sudden transformation may seem bewildering at first glance, but it is actually the result of a long history of internet evolution and the need for what we can call “global compatibility.” Let’s start the following lines in this article to understand its causes and effects.
The Strange Symbols in Internet Links
American Computing and the Need for Global Adaptation
It is no secret that the history of computers and computing in general is closely linked to the history of the United States. The vast majority of modern technologies were either developed in the United States or developed from technologies originally made there. This has a deep impact on how computers deal with different languages when spreading worldwide, which is the reason we address in this article. To understand the phenomenon of transforming link characters into strange symbols, we need to go back in time to the beginnings of the computing era when it was necessary to have a standard that converts our human language made up of letters, numbers, and symbols into a language that machines understand, consisting of binary numbers 0 and 1.
Therefore, in 1960, a coding system called ASCII was developed. This system was revolutionary in its time, as it helped to convert letters, numbers, and some symbols into the machine language in a way that each character in ASCII occupies only 7 bits. For remembrance, the technology of data storage and transfer at the time ASCII coding emerged was very limited, hence the adoption of the 7-bit system allowed storing 128 different symbols (2^7 = 128), which was sufficient to represent English letters, numbers, and essential symbols. It had another additional advantage, leaving the eighth bit to be used in data integrity checking (like parity bit) in many old communication systems, which enhanced the accuracy of information transfer.
But ASCII had a major problem in that it was designed to deal compatibly with only the English language. This means it contains the Latin letters from A to Z (in upper and lower case), numbers, and some basic symbols. However, it did not include letters specific to many other European languages, let alone non-Latin languages such as Arabic, Chinese, or Russian where you cannot write characters in these languages using the ASCII coding system, so a solution had to be found to align with the spread of computing outside the United States.
Introduction of Percent Encoding Method
With the widespread use of computers globally, it became evident that there was an urgent need for a more inclusive coding system than ASCII. Several solutions emerged, such as UTF-8, which allows representing almost all writing systems in the world. However, ASCII remained the foundation in many aspects of programming and the internet, due to the dominance of the English language in the world of technology. When the web first appeared in the 1990s, ASCII coding was chosen as the basic encoding system for links. This means that in the early years of the internet, all website addresses and links were written exclusively in English characters. This tradition continues to this day, even within our site. However, with the global growth of the internet, it became necessary to find a way to include characters from different languages in links to facilitate the expansion of the internet further.
There were two options for web developers at that time: the first is to completely change the encoding system used in web links, which is easier said than done in practice given the size and complexity of the internet. The second, which has already been adopted, is to find a way to represent non-English characters using the characters allowed in the ASCII system. This led to the emergence of what is known as “Percent Encoding.”
Percent Encoding is a simple and smart method to avoid adopting another encoding system other than ASCII. Its basic idea is to use the percentage sign (%) as an indicator of the beginning of a specific symbol, so that the browser understands that after the percentage sign, there are two hexadecimal numbers representing the desired character or symbol from a language not attributed to the ASCII system. Initially, this method was used to input some characters existing in the ASCII encoding but it has a specific use case in web addresses. For example, the forward slash symbol (/) signifies going to a subpage within the site, so it should not be randomly replaced in the link, but rather replaced to be “%2F“, and similarly, the exclamation mark does not appear in its usual form (!) but appears as “%21“, and if you want to include the percentage sign itself, you must convert it to its corresponding symbol “%25” and a space turns into “%20“.
With the spread of the internet, the same method was used to represent any character or symbol in any language using only the characters allowed in ASCII, but due to the limitation of what can be expressed in a single percent encoding, most non-Latin characters are usually encoded into consecutive pairs of percent encoding.
Thus, links can contain texts in any language while maintaining compatibility with ASCII. For example, the Arabic letter “ص” is encoded as “%D8%B5“. As a result of this method, links containing Arabic text are characterized by considerable length, as a simple word like “مرحبا” in a link may turn into a long string of symbols (“%D9%85%D8%B1%D8%AD%D8%A8%D8%A7“).
The Difference Between Displaying and Copying Links
When dealing with internet browsers today, you may notice that links containing Arabic text appear as normal in the address bar without any conversions, and this is the result of the advancement of modern browsers, as they automatically decode the links to facilitate reading for users. However, this simple appearance hides a complex process, as in fact, when the browser sends a website link or page, it is transmitted in its encoded form using percent encoding. This means that the link you see as clear Arabic text in the address bar is actually sent – to the servers – as a string of symbols and numbers.
This was not the case in the past, as with the early days of the internet, browsers displayed links using percent encoding only, making it very difficult to read links containing non-Latin languages. In fact, many old browsers did not support inputting links with non-Latin characters at all and only accepted links compatible with the full ASCII system.
With the evolution of browsers and their support for displaying original characters, new challenges arose regarding compatibility. Links that appear correctly in modern browsers may not work in older versions, posing a barrier to a smooth and unified browsing experience. To overcome this, an intelligent compromise solution was found, where browsers display links in their original characters for easy reading, but retain the encoded version while interacting on the web. Thus, when you copy a link containing Arabic text, you get the encoded version. This solution ensures compatibility with older systems and services that still rely on ASCII encoding, while providing a smooth and easy-to-read experience for users.
The Impact of Percent Encoding on Link Appearance
The historical constraints on how links were written had a profound impact on how websites were named and organized. Initially, most websites, even those targeting non-English-speaking audiences, were forced to use English names in their addresses. Even today, most major websites use domain names in English or at least a transliteration of the original name written in English.
As the web evolved and supported different languages, some sites began naming page titles in their local languages, while website addresses (domain names) remained in English. This is the case for most websites, as due to the browsers’ ability to display links without encoding, websites worldwide started using their original languages in the domain name as well. However, in the Arab world, for example, this practice is still relatively rare. It is true that there is nothing preventing naming both the site and internal page addresses in Arabic, but the domain extension itself, like “COM.“, is not translatable. Therefore, if the site name is written in Arabic, there will be many formatting and writing issues, as the Arabic language is written from right to left, while English and most world languages are written from left to right.
In the end, the strange symbols we see in links are the result of a long history of internet evolution and the need for compatibility between old and new systems, and different languages around the world. They represent a practical solution to a complex problem, allowing the internet to be truly a global network while maintaining accessibility and compatibility for all.