Why Arabic is so difficult in localization?

“Arabic poses some of the greatest web localization challenges because of poor software support and an acute shortage of Arabic translators”
- John Yunker, Beyond Borders – Web Globalization Strategies

Arabic is from the Semitic language family, hence its grammar is very different from English. Arabic has a three consonant root as its basis. All words (parts of speech) are formed by combining the three-root consonants with fixed vowel patterns and, sometimes, an affix. Arab learners may be confused by the lack of patterns in English that would allow them to distinguish nouns from verbs or adjectives, etc.

Arabic has 28 consonants (English 24) and 8 vowels (English 22). Short vowels are unimportant in Arabic, and indeed do not appear in writing. Texts are read from right to left and written in a cursive script. No distinction is made between upper and lower case, and the rules for punctuation are much looser than in English.

Arabic letters shape:

Arabic letters have two shapes:

- Connected letters shape: This goes whenever the character is in the Middle of the word. In this case it will have a shape to be connected with its preceding letter and the letter after in the same word.

- Separated letters shape: This goes when the letter comes before or after some special letters or when it is at the end of a word.

Arabic Layout:

When it comes to Arabic, it should be read from right to left, not left to right. This goes for

- Direction of letters making the word: When incorrect, it will be like in English to read CAT as TAC).

- Direction of words to complete the sentence. When incorrect, it will be like in English to read: "This is a Good Boy" as "Boy Good a is this").

Non-Arabic speakers need to get used to this right to left direction when learning Arabic and will have to learn the two shapes of each letter and when to apply each shape.

When it comes to developing automated localization applications, another problem arises. There is insufficient linguistic research in Arabic to create computer resources needed in a modern computing environment. There are no grammar checkers for Arabic, no OCR, and, most importantly, no powerful linguistically-aware search engines or string-processing utilities to handle Arabic.

