Understanding localization traps

Billet

My English-speaking readers may think that localization is just a step you have to think about when you make software, like burning the CDs, having promotional T-shirts printed or visuals for the press and trade-shows. That's actually a big mistake and a mistake that many projects experienced the hard way. For any big software project, localization is not one of the final steps of the software production process, localization has to be part of the whole process.

That's true for software and that's true for web content. Actually, I would compare it with designing standards-compliant web pages. Many web-designers make web pages using old crappy html hacks dating back to the late 90s and when the project is nearing completion they decide to make it standards compliant. The result is that they spend way more time twiddling with their code to make it pass the HTML validator than they would have spent learning how to directly produce well structured html.

Of course you always think that your framework is well designed, that your page template is flexible and that the text is pretty simple and therefore shouldn't be difficult to translate... Self-confidence is the biggest mistake, that's forgetting Murphy's laws: if things can go wrong, they will go wrong. Things that look simple and straight-forward to you are likely to be considered way more complex with somebody's else eyes, especially if he is from a different culture.

A few common traps we had to deal with in Mozilla Europe :

  • Translations are usually 30% longer than the original text, not only because English has a less verbose syntax but also (and probably mostly) because you have to transpose an English way of explaining things into your own language. That's usually not a problem, except when you have to make the text fit in a small little graphics box like the download box or a menu item, "Free Download" or "Press area" can be a long sentence in some languages.

  • What happens if there is no version of the software in the language of the visitor because it hasn't been released yet? We chose to propose both the newer English version and the last version of the software that is available in the visitor's language. But wait, wouldn't a Czech be more happy with a newer Slovak version than an English one? Isn't a Catalan or Galician more likely to prefer a Spanish version of Firefox or Thunderbird if it is not available in their language rather than an English version? That's the kind of problems that impact not only the visual design but also the script logic behind the scene, something you have to discuss with localizers because they know best.

  • Screenshots. Supporting 22 languages means supporting 22 different sets of screenshots, it means using as neutral as possible webpages pictures, without references to a specific culture. The Mozilla.com Firefox screenshots on the right column are examples of something we can't really use since they are US-centric, a "way to san jose" text in the search bar or the New York Times - CNN tab titles don't really cut it in Cyrillic language...

  • Text as images are usually a bad practice as they can't be re-used in other languages and are simply more difficult to update, see for example the Firefox has been updated page. This page isn't localized yet but one thing is sure, either we remove these text-based images or we automatically generate them server-side which can prove tricky. (Actually, there is a third solution which is to use SVG or Canvas, which would I think make sense for a Firefox only page).

  • The original text may simply be irrelevant in the target language or need a total rewrite to make sense. Here is an example taken from mozilla.com current download page that is not really relevant outside of the US : Firefox 1.5 (Windows version) is also the first browser to meet US federal government requirements that software be easily accessible to users with physical impairments.. If you want to reuse this point, you need to know what the current situation is with your own regional laws, but most of all, it poses the problem of geography targeting vs. language targeting. This point is valid for a US citizen, who may speak Spanish as his first language, but not for an English native speaker living in Belfast. The fact that translations are planned may impact the very content of your original text.

The above points are just a few examples to show that the devil is in the details, and once you deal with many languages you have to deal with many details related to culture, fonts, visual design and even unexpected Gecko bugs in Right To Left languages for instance. Of course, if it were as easy as it looks at first sight, all big projects would have multilingual websites.

In this article, I just talked about a few technical traps to underline what kind of problems you are likely to meet when working on content meant to be internationalised, but let's not forget the human factor in a project where web content localization means dozens more people involved in the project living in different countries and timezones. We certainly have to work on making this collaborative work easier for web content just as we did in the last two years product-wise. The English-section of this blog is one of the tools I will use to get feedback from the community but more things are coming.

If you want to follow the English articles only from the blog, use this RSS feed: http://chevrel.org/fr/carnet/rss.php?lang=en

Commentaires

1. Le dimanche 27 août 2006, 08:40 par Harold

Thank you for raising several important points, the most prominent one being that localization is not a single process step, it is part of the whole process. It is in the coding (just think of differing endings depending on case, numerus, et cetera). It is in the design, as you have shown. It is in all the tools and web sites that are not immediately part of a particular piece of software but that are nevertheless part of the overall project. And of course, all the user documentation.

As for CNN: this is in fact sometimes annoying for non-Americans. I could only think of one thing being worse: a screenshot of "Fox News" with a headline about G. Double-U. Thanks for giving such issues a voice too.

2. Le dimanche 27 août 2006, 10:50 par Cameron

An interesting read.

The team behind addons.mozilla.org is planning on localising it and work is underway on the backend system. Someone with a background in l10n with any information or advice that would be of assistance to the project would be very welcome to contribute. Drop into #amo on irc.mozilla.org sometime - clouserw is the guy working on l10n, but others of us would be interested in chatting with you.

3. Le dimanche 27 août 2006, 13:32 par pascalc

->Cameron

Actually, I intended to write my next article about addons.mozilla.org so as to incitate people to participate ;) I will for sure pop up in #amo

To all, the comment spam filters tend to temporarily block most messages in English, hence the delay in seeing your messages. I will tune it to be more tolerant with English soon.

4. Le dimanche 27 août 2006, 16:04 par kourge

There are also contextual problems plaguing localization efforts. For example, you can see this kind of string from time to time: "Automatically update every". The better way to do this is "Automatically update every %s", then replace %s with something meaningful.

5. Le lundi 4 septembre 2006, 20:17 par mauriz

Thanks for posting your experience with l10n. This is probably one of the most difficult part for a growing project. I can't wait to read your article about addons.mozilla.org to see how MozFo will deal with l10n of content and interface.

6. Le samedi 9 septembre 2006, 23:50 par Jonathan Quince

Much of the above advice is good, particularly in regards to the subtleties of translation problems. However, there is one little quibble I have: If a French-based project included screenshots of a browser opened to Agence France-Presse—or, for that matter, if a Chinese-based project had screenshots of 星島日報—would that be equally bad to Firefox using U.S.-related text and CNN screenshots? Would you likewise advise that they should create dozens of sets of screenshots, so as to avoid American or Persian users thinking their project was “France-centric” or “China-centric”?

Part of globalization is for end-users to accept that products are (partly or mostly) made in other places. The end-users benefit from this and should be grateful for it: I have certainly used plenty of German-based open-source software; I appreciate their efforts, and I do not begrudge them their Deutsch screenshots. Localization is to make the software accessible and usable on a practical level, not to impose some sort of local cultural homogeneity. This means that good translations and language handling (e.g., right-to-left/top-to-bottom, complex scripts) are of utmost priority; and the translation issues you mention are critical—but if anybody is so put off by screenshots of CNN (or Fox News), perhaps they *shouldn’t* be using the fruits of a project with headquarters and many developers in America.

As an American, I can say that if Chinese open-source takes off, I would not be offended by good products using screenshots with the above example. If they went to the effort to support a l10n project producing an English-language version, I would enjoy it, be thankful, and maybe pitch in to help polish it up; I wouldn’t worry too much about their screenshots, as long as the screenshots still conveyed the functionality I need to see without regard to the textual content (which could be “lorem ipsum” in this case). I challenge others to be likewise accepting—and to realize that a subtle part of i18n/l10n is internationalizing the users.