List of lame excuses for poor or no internationalization

List of lame excuses for poor or no internationalization

Postby Wuzzy » 29 May 2016, 14:10

I have helped translating many free software games into German, including, but not limited to Hedgewars, Voxelands, Minetest, WarMUX, Me & My Shadow, Cataclysm: Dark Days Ahead, The Butterfly Effect, Dust Racing 2D, Pingus, Xonotic, 0 A.D., Stunt Rally and Wildlands.
So I have a bit experience in this field. :)

When I have worked with developers, I have heard many excuses for either not doing internationalization (roughly, the process of making a program translatable) at all or for only providing incomplete support.

I will present you a list of those lame excuses and an explanation of why I think those excuses are so lame. If you have any used of those excuses, read on.

List of lame excuses for poor or no internationalization:
  • “We have way too many strings.”
  • “Nobody can ever translate this.”
  • “Our strings change very often.”
  • “This particular string does not need to be translated.”
  • “Credits don't need to be translated.”
  • “Names do not need to be translated.”
  • “I don't know how to this particular string can possibly be translated.”
  • “We don't accept your translation in XYZ because it is not a language.”

“We have way too many strings.” / “Nobody can ever translate this.”
This excuse often comes up when there is no support for translation yet or it is incomplete and a key part of the program is not yet available for translation.
This is wrong on so many levels.
First, remember, as a developer, you don't need to do the translations all by yourself, and, in fact, you probably can't. That's what translators are for. :)
Second: You never know when a horde of translators shows up and boost translation.
Third: It is not you to decide how much strings are “too many”. If you are really concerned, you may talk to people who are willing to translate.
Fourth: Translations may take their time, just as your software project. If you don't have a hard deadline, then translation can be seen as a long-term goal which does not need to be met instantly. But this is no reason for avoiding support altogether.

Granted, some software projects have a scarily high number of strings. But that shouldn't stop people from trying. Humans have been able to translate entire books and movies, so some strings in a software should be doable as well.

If you decide to add internationalization, your task as a developer is to make the entry barrier for translators as low as possible, which may attract more translators.

“Our strings change very often.”
This is another excuse for not providing internationalization at all. But this is also flawed: The fact that your software is still in English only allows you to mess with the strings as often as you like. It makes sense for you to not bother about
changes.

But if you add internationalization, you may want to change your habits.
But even if you can't or don't want to change your habits, the excuse is still not neccessarily a good one.
If you'd use GNU gettext or Qt's translation system, then it is not that a big deal even if strings change often. Gettext still saves old versions of strings, so from a translator view, they see old versions which they can reuse, so they only need to do some adjustments and not neccessarily write the entire translation from scrath. There are also other tools with this feature. Check out the features of your toolchain.
Another way to overcome rapidly-changing strings is to make sure all strings have a reasonable length (about 1 paragraph), so a minor change to a string has not a major impact on the whole. (See also: https://www.gnu.org/software/gettext/ma ... ng-Strings). Extremely long strings (let's say > 1000 words) need to be broken up into multiple strings for sure.

You may also consider doing string freezes, a period of time before release where no strings are added or changed so translators can translate without fear.

In general, listen and talk to the translators and work out what might be best for the project.

“This particular string does not need to be translated.”
It depends. I'd say in 95% of cases it is wrong.
There is a very simple rule I'd follow: If the string is exposed to the user interface, then it clearly must be translated, unless it is an unchangable string like a file name (see below). Strings which are only used internally and are never seen by the user do of course not need to be translated.

I have heard many reasons for not translating a particular string:
  • Too short
  • Too long
  • Only contain punctation characters
  • Too much work for translators (see above)

Those and similar reasons are all invalid, at the end of the day the user ends up seeing an English (untranslated) string in a supposedly localized application, and as such will be cryptic for those who not speak English.

So: Normally, all visible strings must be translated, no excuses. I hope it should be clear to everyone that it is generally a bad idea for intentionally leaving out parts of the program for translation.

“Credits don't need to be translated.”
This excuse seems to be fairly common, I have no idea why. For me, it does not make any sense whatsoever. The credits are a part of the program as everything else, so they need to be translated.

“Names do not need to be translated.”
This probably is based on the assumption that a name is just a string which can be dumped as-is into the text, no matter the language. But this assumption is wrong, let me explain why:

First, names can be handled very differently in different languages.
There are languages where names are written differently depending on the grammatical case. Always writing the name as-is would be ungrammatical. There are languages which do not use the Latin script (surprise!). Some languages make use of transliteration in this case, and it would be unusual to use Latin characters.

The only names which you don't need to make translatable (and shouldn't) are unchangable strings (see below) like file names.

Long story short, make names translatable and let translators decide what to do. They probably know better. You may want to add a translator comment (if your translation system supports this) for complex of “difficult” names, explaining things like pronounciation, historical background (if it is important), etc.

Here is some interesting insight on names:
https://www.gnu.org/software/gettext/ma ... html#Names

“I don't know how to this particular string can possibly be translated.”
(or a number of strings, of course)
That's fine, but no reason to either dropping internationalization at all or to not mark a particular string for translation. Generally, let the translators figure it out.
If you think the string itself is flawed, confusing or poor in its English form, then consider rewriting it, that way you improve text quality of both English and (indirectly) other languages.
But even if some strings are flawed, that's still better than having no internationalization at all. Either fix it immediately or but it on your bug list to fix it later.

Unless the string is an unchangable string (see below) or not exposed to the user, you should probably mark the string for translation anyway.
If you are really baffled by the string and can't figure out a better worded alternative, you may also start an open discussion with translators to work out how to fix this.

“We don't accept your translation in XYZ because it is not a language.”
(or “(…) because it is not a real language.”, which is not much better)
This is actually related localization, not internationalization, and I heard this crazy excuse once only, but it was too crazy for me not to include. ;)
In that particular case, someone translated a software into Esperanto which is a constructed language. But the translation was intentionally not included into the software because the developers thought Esperanto is not a language.
Of course constructed languages like Esperanto are languages, they are, perfectly suited for communication between humans, as a quick lookup on Wikipedia would reveal. ;)

The only thing which should matter to you is: Is this a human langauge, i.e. can it be used by humans for communication? If yes, then you probably should include it.

Definitions
Unchangable strings: An unchangable string is a string which can only be used as-is. Only the exact string of characters can be used, any changes to it will break it, so it must be copied perfectly, like a code. Usually this is for technical reasons. Examples: file names, identifiers, passphrases, IP adresses, URLs. Counter-example: Names of humans
Developers should try to keep the number of unchangable strings exposed to the user interface at a minimum. If it is possible for exposing an alternative, maybe more language-friendly, represantation of such an unchangable string, then go for it. For example, instead of cryptic identifiers for maps, show the “real” map name which can also be translated.

Conclusion
Many lame excuses probably come from the fact that developers have a unfamiliarity with languages in general. They make assumptions based on the languages they know and try to apply them to all languages. Most fallacies can be overcome by accepting that you don't know all languages and you better make fewer, not more assumptions about them. Some other excuses seem to come from the fact that translations are only seen as a “bonus”, so incomplete translations are seen as no big deal. I have explained above why this is flawed.

In hope for better internationalization in software, I have a few guidelines to follow:
  • Follow the hints for preparing strings in the GNU gettext manual, they are important to know even if you don't use Gettext
  • Read the manual of the translation system you're using, you may miss out on key features otherwise
  • Make all strings exposed to the UI translatable, unless they are unchangable strings
  • Don't be afraid of a mountain of strings
  • Be prepared to change your habits if you add internationalization
  • Listen to your translators and work with them

Feel free to add your comments, questions, etc. :) Maybe you have heard a lame excuse I have not included, I would be glad to hear about it.
User avatar
Wuzzy
 
Posts: 989
Joined: 28 May 2012, 23:13

Re: List of lame excuses for poor or no internationalization

Postby eugeneloza » 29 May 2016, 18:57

Well... It's not about excuses, it's about reasons.
It's very hard to find free and dedicated collaborator for the free project. Be it interpreter, or programmer, or even just a tester. Only few relatively famous games projects can afford that.
Sometimes it would even require not just text translation, but also sounds and GUI elements. It's not something impossible. It's just a lot of work, sometimes unaffordable amount of.
And another reason is... would there be any yield and would it worth the effort? It's fun to have the 1000 lines of the game text translated into Amharic language. It's somewhere about a week or two of not extremely hard work. But... will there be at least a single player in Amharic? And who would like to add right-to-left scripting support in favor of Hebrew and Arabic? Or anybody up for struggling with misplaced accent marks in Greek?

Everything is possible. But everything comes for a price.
User avatar
eugeneloza
 
Posts: 500
Joined: 22 Aug 2014, 12:15
Location: Ukraine

Re: List of lame excuses for poor or no internationalization

Postby leilei » 30 May 2016, 03:56

My excuses are these:

- Compatibility
There's a potential risk of breaking support for 3rd party stuff and memory leakage by expanding to unicode/utf-8

- Video memory
Do you know how many glyphs you'd need to keep in the VRAM at all times? Consider this for various fonts and their different sizes. This makes it a bit prohibitive for low-end targets like onboard video chips, Pi's, 3dfx cards, etc. It's only feasible with a compressed texture format, which is even more prohibitive in Free Software because of the S3TC patents that are set to finally expire in 2 years. Otherwise you'd need a 8gb video card to support 72pt RGBA8 mandarin chinese.

- id software is lazy with externalizing strings
Self-explanatory. :)
At least Raven Software made an effort to do so with Elite Force and JO/JA to support mulitple languages in the same release. Unfortunately doing this doesn't guarantee localizaiton in old mods with hardcoded english, etc
There's also the possibility of iconfying some strings off, like frag messages could be just names with weapon symbols. The translation demand at this point would just be the menu interface and hud (which already has externalized strings), and onscreen messages.


Other localization changes on the other hand, I can do. :) like no blood (for germany), no gibbing and nudity (for japan), no skeletons (for china), etc.
User avatar
leilei
 
Posts: 154
Joined: 03 Apr 2012, 02:53

Re: List of lame excuses for poor or no internationalization

Postby Sauer2 » 30 May 2016, 14:22

like no blood (for germany)


Don't do that. Thanks.

Also: As Eugene said, it comes as a price. And given that even many AAA game/movie translation sucks...
And I mean not necessarily that the translation isn't correct, but feels lame or otherwise inappropriate.
User avatar
Sauer2
 
Posts: 430
Joined: 19 Jan 2010, 14:02

Re: List of lame excuses for poor or no internationalization

Postby Duion » 30 May 2016, 23:41

Wuzzy you can translate my game, if you think it is important and there are no excuses.
Duion
 
Posts: 251
Joined: 16 Mar 2013, 20:33
Location: Germany

Re: List of lame excuses for poor or no internationalization

Postby Wuzzy » 31 May 2016, 21:42

It's very hard to find free and dedicated collaborator for the free project. Be it interpreter, or programmer, or even just a tester. Only few relatively famous games projects can afford that.
Sometimes it would even require not just text translation, but also sounds and GUI elements. It's not something impossible. It's just a lot of work, sometimes unaffordable amount of.
And another reason is... would there be any yield and would it worth the effort? It's fun to have the 1000 lines of the game text translated into Amharic language. It's somewhere about a week or two of not extremely hard work. But... will there be at least a single player in Amharic? And who would like to add right-to-left scripting support in favor of Hebrew and Arabic? Or anybody up for struggling with misplaced accent marks in Greek?

Everything is possible. But everything comes for a price.

I'd say you'd be surprised to see in how many langauges some projects can possibly be translated.

Even Me & My Shadow (here on FreeGameDev) is internationalized and this is not exactly a big project. That's actually a plus, since this means there are only a few strings to translate. Anyone could translate Me & My Shadow in about 1 hour or so. Therefore, small projects have even less reason for not going internationalized, since they can expect translations in many different languages.
I'm not saying all small projects need this, for example, for a tiny test/hobby or Ludum Dare project which nobody will care about in a few months you don't really need to bother about internationalization. But if the game is longer-lasting, internationalization should clearly be considered.

Teeworlds is another relatively small game (read: few strings to translate), and it has a healthy number of translations, see here: https://www.transifex.com/teeworlds/tee ... languages/

I think some tools like web translation (Transifex or Weblate for example) can be very useful to overcome barriers, as those help those people who want to translate to get out of their way and just start contributing.
As I said, the less you get in the way of those who are already willing to contribute, the better.

Finding contributers is one thing, but enabling them to contribute in the first place is another thing.

Basically you just restated the “Nobody is ever going to translate this!” excuse. As for the question how many languages are “enough”: It all depends on the translators. If there are translators, there will be translations. Duh! Usually I expect that translators only appear if they are at least a bit familiar with the game. So at the very least, if a translator translates a software for an exotic language, he/she is doing it for him/herself at least which means the translation is at least for one player. ;)

Stuff like RTL support is a huge can of worms, and I have no experience in this field, so I can't really comment on that. My general thought would be, if there is a large interest from the player base, it should definitely be considered.

For the question whether internationalitation would be worth the effort: I think, yes, it certainly would. Not everyone speaks English so by internationalization and (eventually) the actual translations you are increasing the potential audience. Especially young players who may not know more than their native language will need this. By sticking to English-only (no matter why), you are automatically losing some of the audience.


- Compatibility
There's a potential risk of breaking support for 3rd party stuff and memory leakage by expanding to unicode/utf-8

This is only true if you don't implement UTF-8 support properly, so this is a lame excuse. Thanks. :)
But if you need to depend on low-quality legacy software (read: without UTF-8 support :P) for some reason, that's a problem, yes. But this begs the question why your software has those dependencies in the first place.

- Video memory
Do you know how many glyphs you'd need to keep in the VRAM at all times? Consider this for various fonts and their different sizes. This makes it a bit prohibitive for low-end targets like onboard video chips, Pi's, 3dfx cards, etc. It's only feasible with a compressed texture format, which is even more prohibitive in Free Software because of the S3TC patents that are set to finally expire in 2 years. Otherwise you'd need a 8gb video card to support 72pt RGBA8 mandarin chinese.

Oh, man. I have not even thought of platforms other than PC, haha. But yeah, hardware and memory constraints in general are a valid excuse (sadly), if they can actually not be overcome. I am not familiar with concrete examples and also don't understand the patent stuff, so I can't add much here.
I guess a similar reason / valid excuse would be for old or legacy software where you would have to rewrite half of the application from scratch to make internationalization possible. But I think that's a problem with legacy software in general.

- id software is lazy with externalizing strings

Haha, yeah, that's the lamest excuse of them all, “because others do it”. How could I forget?

Wuzzy you can translate my game, if you think it is important and there are no excuses.

First, this thread is about internationalization, not translation in itself. Internationalization is the process of making a program translatable in the first place. So the emphasis lies on the developers, not the translators.
Second, I have a “policy” that I only even consider to translate software I use myself, that way I get a better “feeling” and things and it is less error-prone. I probably don't play your game.
Also, I don't translate everything in existance, my time is limited, but I think I have done a lot already. And I am not the only translator in the world. That's an excuse, but certainly not a lame one. :)

I am only against lame excuses, not all excuses, since some excuses are valid. :)
User avatar
Wuzzy
 
Posts: 989
Joined: 28 May 2012, 23:13

Re: List of lame excuses for poor or no internationalization

Postby KroArtem » 02 Jun 2016, 23:04

As a former translator (including but not limited to SumWars, SuperTuxKart, SuperTux, Me&MyShadow, Widelands, TeXStudio and so on and so forth), currently I'm ok with only English support. High-quality translation requires a lot of time and interaction with story writer/developer. If a game is more or less stable then it's probably ok to start translating. Just my 2 cents.
KroArtem
 
Posts: 375
Joined: 26 Aug 2010, 19:04

Re: List of lame excuses for poor or no internationalization

Postby onpon4 » 02 Jun 2016, 23:38

This is another important thing to point out: for some games, translation is entirely or mostly pointless. If there's no text beyond a menu, in particular, translating that text is not going to do much of anything useful.

Also, my excuse for not translating Project: Starfighter and ReTux is that it's technically not possible without substantial changes. Project: Starfighter uses ASCII for text, so that would have to be rewritten, and both of these games use sprite fonts containing only a limited set of characters. So Starfighter would have to be rewritten to both recognize Unicode and use a regular font that has Unicode support rather than a sprite font, and ReTux would have to abandon the nice glyphs that look hand-drawn for a regular font, which would look more ugly. This is in addition to adding the code that would enable translations (e.g. with gettext).
onpon4
 
Posts: 596
Joined: 13 Mar 2014, 18:38

Re: List of lame excuses for poor or no internationalization

Postby Lyberta » 03 Jun 2016, 09:53

I honestly think not having Unicode support these days is a crime.
Lyberta
 
Posts: 765
Joined: 19 Jun 2013, 10:45

Re: List of lame excuses for poor or no internationalization

Postby mdwh » 03 Jun 2016, 10:22

eugeneloza: "It's very hard to find free and dedicated collaborator for the free project. ... would there be any yield and would it worth the effort? It's fun to have the 1000 lines of the game text translated into Amharic language."

I think the point is not that games should be translated up front, but that they should be translatable. So that way you can leave the choice of when to translate to when you get people offering to do the translations.

"Sometimes it would even require not just text translation, but also sounds and GUI elements."

That's a good point about sounds, something that typically would apply more to games than other applications. One counter-argument is that a game with text translation and speech still in English is more useful than no translation at all (especially if the text can be displayed together with speech - it would give you subtitles).

What do you mean by GUI elements? Text in a GUI should be translatable like anything else - though yes, I can see a game might have more cases where text is stored say as a graphic (or even a 3D model).

For my own excuse: For the most part, it was a lack of knowledge when I started writing them. Now for new projects I try to make it so that they are translatable, but shoehorning that into older projects is harder.

Other things that may affect game developers: to some degree it depends on the toolkit being used. These days anything being used for non-game applications seems to have good support for translation, and actively encourages practices such as storing text and code separarely. E.g., translating my Android applications has been no trouble, as even when I was new to Android development, the SDK actively encourages storing all strings separately, and it's easy to access those strings in code. Qt is a less nice imo (requires more manual interaction e.g. to update translations), but it's still supported. But many games will be written using lower level game libraries that often have no support for translations, so you're left finding another library, or writing your own - admittedly reading strings from a file shouldn't be that hard (and it's good practice in general to store data in general separate from code), but until people realise the importance of translations, it's easy to not even bother. And as people have noted, supporting non-ASCII or more obscure language features (like right-to-left) becomes a lot harder - these sorts of things are less of a problem if you're using standard OS GUI elements, fonts etc, but games typically do not.

But don't take this as disagreement - I think posts like this are good, to make developers think about it and realise that it is important :)

“Credits don't need to be translated.”

For things like documentation, one risk is whether an outdated translation becomes more detrimental. For in-application text like for GUI elements or messages, I can always have it so that I put a new "string ID" if it's significantly changed (or it's new anyway), so any languages where it isn't yet translated will fall back to the English default. This gets harder if you have big chunks of text like in-game documentation. E.g., think of something like Civilization's Civilopedia - whilst the text should still be translatable (for other reasons, it's good practice that such huge amounts of text are not stored in game code), I'd be wary about accepting translations, at least while a game is still in ongoing development: if I change a unit so that it no longer has such and such feature, then any text saying this is now out of date - I could simply mark all the translations of this unit as being out of date, though that seems heavy handed if only one thing is out of date. And I can't simply delete the relevant sentences if I don't speak the languages.

Credits are a particularly important example of this for free games: if I'm updating the credits but the translations are out of date, then that's potentially breaking licence requirements where attribution is required for using third party material.

Though admittedly, at least with credits one could always have a "string ID" per item, and fall back to the English for new items - but it does mean one has to be careful.

All this is solved if you have a team of translators who will update everything before release, but in my experience, whilst it's easy to find people offering to translate if your application becomes sufficiently popular, it's still very hard to find people who are willing to continually translate, on demand. Whilst games should be translatable, it's not feasible to get into a situation where an update can't ship because someone hasn't updated a translation.

Another thing to note is that translations can add an ongoing overhead in testing, due to having multiple versions. I have had a language-specific crash on an Android application (due to an unescaped ampersand character - I claim it's an Android bug that (a) this caused an exception in the GUI element but (b) it wasn't spotted by Android Lint, but nonetheless it's my application that users will blame, and needs extra testing to avoid this). Another issue is dealing with different lengths of strings (I love that Google recommend testing translations in German, known for its very long words...), which is probably even harder for games, which typically use their own GUI routines and aren't so robust at handling changes in sizes. I'm not saying these are valid excuses, but they are less obvious points to bear in mind when taking on translations.
mdwh
 
Posts: 67
Joined: 13 Aug 2011, 01:53

Re: List of lame excuses for poor or no internationalization

Postby onpon4 » 03 Jun 2016, 11:02

FaTony {l Wrote}:I honestly think not having Unicode support these days is a crime.

Perhaps; I've never worked on Unicode support for a C program, so I don't know personally. But Project: Starfighter is an old project from 2003, and it's already designed to use plain ASCII. Unicode support would have to be retrofitted into it. I don't know what that would entail. (Unless just using SDL's text rendering functions would be sufficient. Would it?)
onpon4
 
Posts: 596
Joined: 13 Mar 2014, 18:38

Re: List of lame excuses for poor or no internationalization

Postby Lyberta » 04 Jun 2016, 15:36

onpon4 {l Wrote}:Perhaps; I've never worked on Unicode support for a C program, so I don't know personally. But Project: Starfighter is an old project from 2003, and it's already designed to use plain ASCII. Unicode support would have to be retrofitted into it. I don't know what that would entail. (Unless just using SDL's text rendering functions would be sufficient. Would it?)


Well I'm not sure about C, but C++11 support UTF-8 for char, UTF-16 for char16_t and UTF-32 for char32_t. Some stuff like std::regex only works on char and wchar_t. I'm pretty sure any decent library support UTF-8 chars. For some trivial stuff that's not in standard library you can use my lib.
Lyberta
 
Posts: 765
Joined: 19 Jun 2013, 10:45

Re: List of lame excuses for poor or no internationalization

Postby onpon4 » 04 Jun 2016, 21:55

FaTony {l Wrote}:Well I'm not sure about C, but C++11 support UTF-8 for char, UTF-16 for char16_t and UTF-32 for char32_t.

Right, but that's not the issue. UTF-8 is basically just a way multiple 8-bit characters are sequenced together to produce special characters. The problem is if you don't recognize those sequences as special, you'll just end up printing them as-is, meaning non-ASCII characters just get rendered as a bunch of gibberish. Come to think of it, I would hazard to guess that SDL's (or rather, SDL-ttf's) text-drawing functions handle this properly, but Project: Starfighter uses its own crude text rendering system which is based on pre-rendered monospace glyphs representing ASCII characters, and it's always assumed that one character value corresponds to one glyph. That would have to be redesigned, probably replaced with use of SDL-ttf, for Starfighter to support UTF-8. It's something that would be worth doing, and I suspect I'll do it somewhere down the line, but not entirely trivial.

There's also the more minor issue of string length. If you use a crude ASCII-only method like Starfighter does, you make the assumption that 8 characters equals 8 rendered characters, which is not necessarily true with Unicode text. So there's a chance of some programmer thinking he's being clever by choosing a size for strings that is based on this assumption (like making it 8 for a string that has to be 8 rendered characters or less to look right).
onpon4
 
Posts: 596
Joined: 13 Mar 2014, 18:38

Re: List of lame excuses for poor or no internationalization

Postby leilei » 04 Jun 2016, 22:00

Targeting Windows 95 as a minimum requirement is also possibly an excuse :)
User avatar
leilei
 
Posts: 154
Joined: 03 Apr 2012, 02:53

Re: List of lame excuses for poor or no internationalization

Postby Lyberta » 04 Jun 2016, 23:00

onpon4 {l Wrote}:Right, but that's not the issue. UTF-8 is basically just a way multiple 8-bit characters are sequenced together to produce special characters. The problem is if you don't recognize those sequences as special, you'll just end up printing them as-is, meaning non-ASCII characters just get rendered as a bunch of gibberish. Come to think of it, I would hazard to guess that SDL's (or rather, SDL-ttf's) text-drawing functions handle this properly, but Project: Starfighter uses its own crude text rendering system which is based on pre-rendered monospace glyphs representing ASCII characters, and it's always assumed that one character value corresponds to one glyph. That would have to be redesigned, probably replaced with use of SDL-ttf, for Starfighter to support UTF-8. It's something that would be worth doing, and I suspect I'll do it somewhere down the line, but not entirely trivial.

There's also the more minor issue of string length. If you use a crude ASCII-only method like Starfighter does, you make the assumption that 8 characters equals 8 rendered characters, which is not necessarily true with Unicode text. So there's a chance of some programmer thinking he's being clever by choosing a size for strings that is based on this assumption (like making it 8 for a string that has to be 8 rendered characters or less to look right).


Well this is solved by using a Unicode-aware library to render text. In my own experience, I've written ncurses UTF-8 apps that assume that 1 code point equals 1 character so no support for combining code points.
Lyberta
 
Posts: 765
Joined: 19 Jun 2013, 10:45

Re: List of lame excuses for poor or no internationalization

Postby onpon4 » 05 Jun 2016, 00:29

Right. But rewriting the code to use a library when it's already using some half-baked custom method does take work.
onpon4
 
Posts: 596
Joined: 13 Mar 2014, 18:38

Re: List of lame excuses for poor or no internationalization

Postby Wuzzy » 07 Jun 2016, 13:25

I'm going to reply on an interesting remark:

“Credits don't need to be translated.”

For things like documentation, one risk is whether an outdated translation becomes more detrimental. For in-application text like for GUI elements or messages, I can always have it so that I put a new "string ID" if it's significantly changed (or it's new anyway), so any languages where it isn't yet translated will fall back to the English default. This gets harder if you have big chunks of text like in-game documentation. E.g., think of something like Civilization's Civilopedia - whilst the text should still be translatable (for other reasons, it's good practice that such huge amounts of text are not stored in game code), I'd be wary about accepting translations, at least while a game is still in ongoing development: if I change a unit so that it no longer has such and such feature, then any text saying this is now out of date - I could simply mark all the translations of this unit as being out of date, though that seems heavy handed if only one thing is out of date. And I can't simply delete the relevant sentences if I don't speak the languages.


Well, in both Gettext and the Qt translation system, all strings which have changed will automatically be counted as “untranslated” until they become translated again.
Meaning, those simply fall back to English.
As for long strings: This can also be resolved by breaking the string into multiple chunks (for the translators), about 1 paragraph per string should be OK.

I think Gettext and Qt are both very good tools for translation already, if you can use them in your application, you should totally go for it.
Why? Because those are mature and proven systems with an extensive toollchain (this is important, too) to handle the translation files.

In many “hand-written” translation systems I've seen, I either found a lot of bugs or missed a lot of important features, especially they are lacking in the toolchain, i.e. it is not possible to automatically mark a string as “outdated”, in the worst case each string status has to be maintained manually. This is of course very error-prone.
If you still want to make your very own translation system for some reason, you better make sure it has an extensive toolchain, especially the possibility to update strings and automatically mark them as outdated, otherwise it will be a pain in the ass to maintain.

Lame excuses extracted from your post ;) :
- The strings are too long → Break them into smaller chunks
- Strings could become outdated → Similar excuse like “Strings change too often”
User avatar
Wuzzy
 
Posts: 989
Joined: 28 May 2012, 23:13

Re: List of lame excuses for poor or no internationalization

Postby Arthur » 07 Jun 2016, 15:39

Some languages are a real pain to support properly. In addition to RTL often wreaking havoc (need to account for untranslated English strings in the middle of left-aligned paragraphs for example), there's this example which I got explained by one of our developers about:

There's at least two extra libraries, in addition to Tinygettext and Freetype that's needed for languages with complex text layouts. For example, in Tamil there might be a character "a" and "b" if written next to each other that combines to character "c". But "c" has no Unicode presentation. So you need the extra library Harfbuzz which will find the "c" from the font file, which uses Freetype. But Harfbuzz isn't Fribidi aware, so you need to put the words in correct order before using harfbuzz_shape on it. There are probably more details to this and the explanation I tried to write here after a brief chat might not be perfect but that's the gist of it.

What is Harfbuzz exactly? Well I can cite what they say themselves, since they are way better at it than me:
https://github.com/behdad/harfbuzz/blob/master/docs/usermanual-what-is-harfbuzz.xml {l Wrote}:Harfbuzz is a text shaping engine. It solves
the problem of selecting and positioning glyphs from a font given a
Unicode string.

Why do I need it?

Text shaping is an integral part of preparing text for display. It
is a fairly low level operation; Harfbuzz is used directly by
graphic rendering libraries such as Pango, and the layout engines
in Firefox, LibreOffice and Chromium. Unless you are
writing one of these layout engines yourself,
you will probably not need to use Harfbuzz - normally higher level
libraries will turn text into glyphs for you.

However, if you are writing a layout engine
or graphics library yourself, you will need to perform text
shaping, and this is where Harfbuzz can help you. Here are some
reasons why you need it:

OpenType fonts contain a set of glyphs, indexed by glyph ID.
The glyph ID within the font does not necessarily relate to a
Unicode codepoint. For instance, some fonts have the letter
a as glyph ID 1. To pull the right glyph out of
the font in order to display it, you need to consult a table
within the font (the cmap table) which maps
Unicode codepoints to glyph IDs. Text shaping turns codepoints
into glyph IDs.

Many OpenType fonts contain ligatures: combinations of
characters which are rendered together. For instance, it's
common for the fi combination to appear in
print as the single ligature . Whether you should
render text as fi or does not
depend on the input text, but on the capabilities of the font
and the level of ligature application you wish to perform.
Text shaping involves querying the font's ligature tables and
determining what substitutions should be made.

While ligatures like are typographic
refinements, some languages require such
substitutions to be made in order to display text correctly.
In Tamil, when the letter ட letter is
followed by உ, the combination should appear
as the single glyph டு. The sequence of Unicode
characters டஉ needs to be rendered as a single
glyph from the font - text shaping chooses the correct glyph
from the sequence of characters provided.

Similarly, each Arabic character has four different variants:
within a font, there will be glyphs for the initial, medial,
final, and isolated forms of each letter. Unicode only encodes
one codepoint per character, and so a Unicode string will not
tell you which glyph to use. Text shaping chooses the correct
form of the letter and returns the correct glyph from the font
that you need to render.

Other languages have marks and accents which need to be
rendered in certain positions around a base character. For
instance, the Moldovan language has the Cyrillic letter
ж with a breve accent, like so: ӂ. Some
fonts will contain this character as an individual glyph,
whereas other fonts will not contain a zhe-with-breve glyph
but expect the rendering engine to form the character by
overlaying the two glyphs ж and ˘. Where you should draw the
combining breve depends on the height of the preceding glyph.
Again, for Arabic, the correct positioning of vowel marks
depends on the height of the character on which you are
placing the mark. Text shaping tells you whether you have a
precomposed glyph within your font or if you need to compose a
glyph yourself out of combining marks, and if so, where to
position those marks.

If this is something that you need to do, then you need a text
shaping engine: you could use Uniscribe if you are using Windows;
you could use CoreText on OS X; or you could use Harfbuzz. In the
rest of this manual, we are going to assume that you are the
implementor of a text layout engine.

Note: after copying the XML for this quote I found there was not a proper way to display some of the examples, so I resorted to using the italic BB-code for those. So that kind of ruins it a bit, but it exemplifies the world has not even standardized properly so we can copy/paste everywhere without worry.

So, counting it all to support and show all languages well we would need Tinygettext, Freetype, Fribidi and Harfbuzz. That's FOUR libraries to cobble together, and then you need to find fonts as said before. So yeah, supporting German or other languages using the Latin alphabet is relatively easy. Supporting the rest, not so much, and you are speaking out of ignorance if you believe it's a matter of a couple hours for a developer to do.
Hey pal, I took an oath for justice! "In happy days or tightest tights..." or something like that.
User avatar
Arthur
 
Posts: 1073
Joined: 06 Dec 2009, 00:49

Re: List of lame excuses for poor or no internationalization

Postby GunChleoc » 17 Mar 2017, 10:06

As a side note, the most common i18n issues I run into as a translator are:

  • No support for plural forms. Even in projects that already use gettext or QT et al, strings often lack the correct markup (ngettext...), because the developers weren't aware that it's an issue.
  • Incomplete sentences presented to the translator. The worst offenders recently have been activity streams, e.g. "likes your". Completely untranslateable, as "X likes your Y" is "'S toigh le X a' Y agad" in my language. So, the string to translate should be "%1$s likes your %2$s". Yes, some language will also need to reverse the order here.
  • Not enough space on screen, resulting in truncation. Especially short strings can easily need 300% of the original string. The best solution is dynamic widget size, but that's not always available in your toolkit, so element size will sometimes have to be fixed manually. I usually find that if both Scottish Gaelic and Hungarian fit, everything else should fit as well.
  • No context comments for translators. Is "View" a verb or a noun? I am capable of digging through the source code, but it's very time consuming and doesn't always answer your question. Many translators can't read source code, because they are not programmers.

eugeneloza {l Wrote}:And another reason is... would there be any yield and would it worth the effort? It's fun to have the 1000 lines of the game text translated into Amharic language. It's somewhere about a week or two of not extremely hard work. But... will there be at least a single player in Amharic?

The yield for your project may be small, but the yield for the target language might be huge, even if only a comparatively small number of people use the translation. There are many small languages out there that can't get any games at all apart from FLOSS games. Some people will be happy with any partial translation at all, because they're so starved for their language that any bits at all are appreciated. So, you can pat yourself on the back for making the world a better place as well as delivering a fun game :)


mdwh {l Wrote}:But many games will be written using lower level game libraries that often have no support for translations, so you're left finding another library, or writing your own - admittedly reading strings from a file shouldn't be that hard (and it's good practice in general to store data in general separate from code), but until people realise the importance of translations, it's easy to not even bother.


If you can avoid it at all, do not write your own. It will lack essential features like plural support, disambiguation and tools for the translator to use. Having translators hack on plain-text files is the worst thing you can do for many reasons.

  • Source string changes can't be tracked by the translators
  • Translators can accidentally break the code in all kinds of ways, and there are no validation tools available - unless you want to reinvent the wheel on those too!
  • Translators can't see the source and target language side-by-side
  • Translators can't track which strings they have already translated and which strings still need translation. Not to mention marking strings as reviewed.

Credits are a particularly important example of this for free games: if I'm updating the credits but the translations are out of date, then that's potentially breaking licence requirements where attribution is required for using third party material.

Though admittedly, at least with credits one could always have a "string ID" per item, and fall back to the English for new items - but it does mean one has to be careful.

Exactly. If you run into trouble here, it's a sign of poor string markup. Separate the titles from the names and Bob's your uncle.


Whilst games should be translatable, it's not feasible to get into a situation where an update can't ship because someone hasn't updated a translation.

As I said above, partial translations are perfectly fine for FLOSS projects.

Regarding translations causing crashes, at least for Gettext there are validation tools that you can use and wrap in a script, for example like this. Another argument against inventing your own! So, it's another excuse, because problems like that can be avoided with the proper tools.
User avatar
GunChleoc
 
Posts: 502
Joined: 20 Sep 2012, 22:45

Re: List of lame excuses for poor or no internationalization

Postby onpon4 » 17 Mar 2017, 15:38

Incomplete sentences presented to the translator. The worst offenders recently have been activity streams, e.g. "likes your". Completely untranslateable, as "X likes your Y" is "'S toigh le X a' Y agad" in my language. So, the string to translate should be "%1$s likes your %2$s". Yes, some language will also need to reverse the order here.

Actually, I can think of another possible problem sort of related to this: synonyms that work in English, but not in some other language. At least, that's a problem with gettext. For example, you could have two menu options that say "Cancel" in English, but mean totally different things.

My most recent run-in with this risk was when I added a confirmation before quitting Hexoshi. At first I was going to make it say "Abandon unsaved progress" and "Cancel", but it could be that some language's word for "cancel" means the same thing as "quit", which would be confusing if translated literally, and yet if I used "Cancel" in another menu, it could make no sense to translate it as "return to the game". So it's probably a good idea to never use one string in multiple places unless it means exactly the same thing in all contexts. If in doubt, write out exactly what you mean. In that example, I chose to make the English text "Return to game" just to be on the safe side.
onpon4
 
Posts: 596
Joined: 13 Mar 2014, 18:38

Re: List of lame excuses for poor or no internationalization

Postby GunChleoc » 18 Mar 2017, 11:39

Gettext offers the pgettext macro for these cases, bu it needs an extra gettext.h file that you need to add to your project.

Another problem that we sometimes tun into that English "you" can need different translations depending on context: What is the relative status of the speaker/addressee? How many people are being addressed? This is where developer comments are super helpful.

And one final issue: Some languages don't have words for yes/no. Use ok/cancel, on/off and enabled/disabled instead. Or disambiguate with pgettext.
User avatar
GunChleoc
 
Posts: 502
Joined: 20 Sep 2012, 22:45

Re: List of lame excuses for poor or no internationalization

Postby Wuzzy » 18 Jun 2017, 17:26

GunChleoc {l Wrote}:As a side note, the most common i18n issues I run into as a translator are:

I agree with your post 100%. These are all valid points, and also often inexcusable from the software developer point of view.
User avatar
Wuzzy
 
Posts: 989
Joined: 28 May 2012, 23:13

Re: List of lame excuses for poor or no internationalization

Postby eugeneloza » 18 Jun 2017, 19:34

Aww... I've missed all the fun :)
GunChleoc, well. I always prefer English to Ukrainian or Russian in a game. No, don't get me wrong, I love Ukrainian language. It's awesome! But the translated software (and even worse - games) are such a mess... Even when the translation is done professionally.
At the moment at my office we had many computers switched to Ukrainian Windows. And when I'm asked to help somebody out... Not speaking of Windows 10 being very stupid and inconvenient... I have another additional headache, "how on the earth did they name this parameter in Ukrainian???"
However, yes, I know a few people that don't understand neither English nor even Russian, only Ukrainian. But... actually they don't have a copmuter/smartphone :)

mdwh, I was speaking of pre-rendered images or textures containing text. E.g. if you have a button with "MENU" image, or a sign reads "SHOP" - it's not so trivial to localize such feature, especially speaking of a texture or even model in a 3D game.

Wuzzy, example. I've translated my https://github.com/eugeneloza/FireMadness game into English, Ukrainian and Russian. Based on my Project Helena statistics, I've realized that I had 300 downloads per year at most, I estimate there were 10 to 30 actual installs. Does it really matter to ask anybody to spend 3 to 4 hours to translate it into any other language? Moreover those Russian-speaking users that looked at my game didn't even notice the game can switch into Russian in the main menu.
So, let's do it again. If there will be zero players in Amharic - they'll just never know about my game - why would I ask to translate the game (spend quiet a lot of somebody's precious time) into Amharic? Why would I care spending a dozen of hours trying to figure out how to make right-to-left script for Hebrew support?

Ok. In https://github.com/eugeneloza/decoherence I try to make not just "translatable", but provide a powerful tool for the translators. Yep. I've spent 4 years for this game and it's not even alpha. I've spent 10-15% of this time developing data storage logic, linking of media files with translated text, support for lot's of other multilingual stuff. Also I try to write the scenario (which is huge) both in English and Russian. Maybe it would have been better to forget about multilingual (which not only takes a lot of time, but also kills motivation due to lot's of work with zero visible result) and have a playable alpha already?
User avatar
eugeneloza
 
Posts: 500
Joined: 22 Aug 2014, 12:15
Location: Ukraine

Re: List of lame excuses for poor or no internationalization

Postby GunChleoc » 20 Jun 2017, 12:27

Actually, it is a lot easier to provide quality translations for FLOSS projects than for commercial software, because you can communicate with the developers directly in order to fix issues, and because you can fix any translation mistake that you find as you please without having to go through a chain of command. Also, because you can test it any time and look at stuff in context, especially when there are nightly builds available. Of course, it doesn't pay the bills.

I guess the problem you're having with Windows in Ukranian is that it will be fairly new and the terminology not well-established yet - it's a lot easier for big languages like German that have had translations for many decades. It also helps if one has control over an online dictionary to add new terms to. You might also find the following link useful for "unlocalizing" Windows: https://www.microsoft.com/Language/en-US/Search.aspx

Maybe you should stick to writing stuff in English for a bit and try to find other people to help with localizing into your own languages - there are only so many hours in the day.

I recently did an e-mail interview about my localization work: https://medium.com/r12n/conquering-digi ... eef9f3aade

@Wuzzy: I always cut the developers some slack, because more often than not bad string markup is simply a lack of knowledge. FLOSS projects have always been happy to fix any issues for me.
User avatar
GunChleoc
 
Posts: 502
Joined: 20 Sep 2012, 22:45

Who is online

Users browsing this forum: No registered users and 1 guest

cron