Understanding localization traps

My English-speaking readers may think that localization is just a step you have to think about when you make software, like burning the CDs, having promotional T-shirts printed or visuals for the press and trade-shows. That's actually a big mistake and a mistake that many projects experienced the hard way. For any big software project, localization is not one of the final steps of the software production process, localization has to be part of the whole process.

That's true for software and that's true for web content. Actually, I would compare it with designing standards-compliant web pages. Many web-designers make web pages using old crappy html hacks dating back to the late 90s and when the project is nearing completion they decide to make it standards compliant. The result is that they spend way more time twiddling with their code to make it pass the HTML validator than they would have spent learning how to directly produce well structured html.

Of course you always think that your framework is well designed, that your page template is flexible and that the text is pretty simple and therefore shouldn't be difficult to translate... Self-confidence is the biggest mistake, that's forgetting Murphy's laws: if things can go wrong, they will go wrong. Things that look simple and straight-forward to you are likely to be considered way more complex with somebody's else eyes, especially if he is from a different culture.

A few common traps we had to deal with in Mozilla Europe :

  • Translations are usually 30% longer than the original text, not only because English has a less verbose syntax but also (and probably mostly) because you have to transpose an English way of explaining things into your own language. That's usually not a problem, except when you have to make the text fit in a small little graphics box like the download box or a menu item, "Free Download" or "Press area" can be a long sentence in some languages.

  • What happens if there is no version of the software in the language of the visitor because it hasn't been released yet? We chose to propose both the newer English version and the last version of the software that is available in the visitor's language. But wait, wouldn't a Czech be more happy with a newer Slovak version than an English one? Isn't a Catalan or Galician more likely to prefer a Spanish version of Firefox or Thunderbird if it is not available in their language rather than an English version? That's the kind of problems that impact not only the visual design but also the script logic behind the scene, something you have to discuss with localizers because they know best.

  • Screenshots. Supporting 22 languages means supporting 22 different sets of screenshots, it means using as neutral as possible webpages pictures, without references to a specific culture. The Mozilla.com Firefox screenshots on the right column are examples of something we can't really use since they are US-centric, a "way to san jose" text in the search bar or the New York Times - CNN tab titles don't really cut it in Cyrillic language...

  • Text as images are usually a bad practice as they can't be re-used in other languages and are simply more difficult to update, see for example the Firefox has been updated page. This page isn't localized yet but one thing is sure, either we remove these text-based images or we automatically generate them server-side which can prove tricky. (Actually, there is a third solution which is to use SVG or Canvas, which would I think make sense for a Firefox only page).

  • The original text may simply be irrelevant in the target language or need a total rewrite to make sense. Here is an example taken from mozilla.com current download page that is not really relevant outside of the US : Firefox 1.5 (Windows version) is also the first browser to meet US federal government requirements that software be easily accessible to users with physical impairments.. If you want to reuse this point, you need to know what the current situation is with your own regional laws, but most of all, it poses the problem of geography targeting vs. language targeting. This point is valid for a US citizen, who may speak Spanish as his first language, but not for an English native speaker living in Belfast. The fact that translations are planned may impact the very content of your original text.

The above points are just a few examples to show that the devil is in the details, and once you deal with many languages you have to deal with many details related to culture, fonts, visual design and even unexpected Gecko bugs in Right To Left languages for instance. Of course, if it were as easy as it looks at first sight, all big projects would have multilingual websites.

In this article, I just talked about a few technical traps to underline what kind of problems you are likely to meet when working on content meant to be internationalised, but let's not forget the human factor in a project where web content localization means dozens more people involved in the project living in different countries and timezones. We certainly have to work on making this collaborative work easier for web content just as we did in the last two years product-wise. The English-section of this blog is one of the tools I will use to get feedback from the community but more things are coming.

If you want to follow the English articles only from the blog, use this RSS feed: http://chevrel.org/fr/carnet/rss.php?lang=en

Haut de page