From Fedora Project Wiki
(translate format strings in source files, not glade)
(review doc)
 
(3 intermediate revisions by 3 users not shown)
Line 4: Line 4:
changes to strings to allow the translators time to work.  We can make exceptions, but it takes
changes to strings to allow the translators time to work.  We can make exceptions, but it takes
coordination.
coordination.
'''Note:''' This article is a guide for developers about how to handle translations in the Anaconda source code. If you want to help to translate the texts in Anaconda to your language, see [https://fedoraproject.org/wiki/Anaconda/Contribute#Translations_and_localization Translations and localization] in the contribution guide.


= What not to translate =
= What not to translate =
Line 67: Line 69:
would get entered into that database alongside its various translations.  So far, so good.
would get entered into that database alongside its various translations.  So far, so good.


Runtime is where the problem occurrs.  If "e" above is "I/O error", then that will get substituted in
Runtime is where the problem occurs.  If "e" above is "I/O error", then that will get substituted in
and gettext will look for the following string in its database:
and gettext will look for the following string in its database:


Line 84: Line 86:
     msg = _("Device %s (%s) needs %s more space to install %s") % (dev, fs, needed, productName)
     msg = _("Device %s (%s) needs %s more space to install %s") % (dev, fs, needed, productName)


This is really hard to translate, because the translators do not see the context this string is defined
This is really hard to translate because the translators do not see the context this string is defined
in.  All they see is the raw text inside the quotes.  They are likely translating into a language where
in.  All they see is the raw text inside the quotes.  They are likely translating into a language where
the words are in a completely different order than English.  Thus, they need more help to be able to
the words are in a completely different order than English.  Thus, they need more help to be able to
Line 113: Line 115:
Whenever possible, do not hardcode newlines into any string that will be displayed in the GUI.  Instead,
Whenever possible, do not hardcode newlines into any string that will be displayed in the GUI.  Instead,
leave it as one big line and set word wrap mode on the widget.  Certain languages (like German) tend to
leave it as one big line and set word wrap mode on the widget.  Certain languages (like German) tend to
have much longer strings.  Hardcoding newlines makes it more difficult for GTK to get the word wrapping
have much longer strings.  Hardcoding newlines makes it more difficult for GTK+ to get the word wrapping
right.  The failure mode for this is pretty weird:  You will commonly see the the window displayed in the
right.  The failure mode for this is pretty weird:  You will commonly see the window displayed in the
upper right hand corner of the screen, with various widgets going off the right side.
upper right-hand corner of the screen, with various widgets going off the right side.


= Exceptions =
= Exceptions =
Line 137: Line 139:


But even if we did want the error translated, there's little we can do about it.  Not all programs are
But even if we did want the error translated, there's little we can do about it.  Not all programs are
translated fully, and for space reasons we do not include the translations for every program we use on the
translated fully, and for space reasons, we do not include the translations for every program we use on the
installation media.
installation media.


Line 176: Line 178:


= Contexts =
= Contexts =
A particular string has the same translation everywhere it is used in Anaconda. This can be problematic for strings that are use to control the UI.
A particular string has the same translation everywhere it is used in Anaconda. This can be problematic for strings that are used to control the UI. Any string with context-sensitive content, such as a keyboard accelerator underline, needs to include a translation context.


In the GUI, for instance, an underline character is used to mark a letter a keyboard accelerator. A window may have a button with a "_Select" label to indicate that Alt-S can be used to press the button. In Czech, this could be translated as "_Vybrat". Now suppose that elsewhere in the program there is a different window with a "_Select" label and a "_Forward" label, translated as "_Vpřed". The Czech translator may prefer to use 'V' as the keyboard accelerator on "Vpřed", but they will be unable to change the accelerator for "Vybrat" without changing the accelerator for every instance of "Vybrat" throughout the program.
For example, a window in the GUI may have a button with a "_Select" label, where "_S" indicates that Alt-S can be used to press the button. In Czech this might be translated as "_Vybrat". Now suppose that elsewhere in the GUI there is a different window with a "_Select" label and a "_Forward" label, translated as "Vpřed". The Czech translator may prefer to use 'V' as the keyboard accelerator on "Vpřed", but will be unable to change the accelerator for "Vybrat" without changing the accelerator for every instance of "Vybrat" throughout the program. The solution is to add a context to the string so that instances of the same string with different contexts can be given different translations.


The solution to this is to add a context to the string. Different instances of the same string can be given different contexts, and the string can this way be translated differently for different contexts. In glade, a context can be added as part of the label's text properties. In Python, use C_(context, message) instead of _(message).
A context can be any string, but they are usually arranged hierarchically using pipes to separate components. For example, "GUI|Date and Time|NTP" is used as a context within the NTP dialog of the Date and Time spoke in the GUI.


Contexts should be used any time different instances of a string could have different meanings. For the most part, this means strings with underline accelerators in the GUI and single-letter abbreviations in the TUI.
Contexts in glade can be added as part of the label's text properties. In Python, use C_(context, message) instead of _(message), CN_(context, message) instead of N_(message), and CP_(context, message, message_plural, n) instead of P_(message, message_plural, n).


= Providing additional information to translators =
= Providing additional information to translators =
Comments that being with "TRANSLATORS:" will be included in the template file used by translators. In glade, these comments can be specified in the comments attribute of the label property. In Python, the comment should appear directly above the string being translated.
Comments that begin with "TRANSLATORS:" will be included in the template file used by translators. In glade, these comments can be specified in the comments attribute of the label property. In Python, the comment should appear directly above the string being translated.


Adding a comment can be helpful for strings that need additional information, such as single-letter abbreviations.
Adding a comment can be helpful for strings that need additional information, such as single-letter abbreviations.

Latest revision as of 00:11, 8 August 2018

Making sure we have good translations takes a lot of work during development. There's several techniques to keep in mind. In addition, the most critical thing to remember is that each release of a product has a string change deadline. After this point, we should not make any changes to strings to allow the translators time to work. We can make exceptions, but it takes coordination.

Note: This article is a guide for developers about how to handle translations in the Anaconda source code. If you want to help to translate the texts in Anaconda to your language, see Translations and localization in the contribution guide.

What not to translate

We do not mark log messages for translation. We are the most frequent viewers of log files via bug reports, so translated logs just makes our job harder.

Adding new files

If you add a new file with translatable text (source code or glade files), you need to add the file name to po/POTFILES.in in the correct place. This file gets passed to the tools that build the anaconda.pot file. Any strings in files not mentioned here will be left out, which means they do not get worked on by translators.

Marking strings for translation

If you have a user-visible string, it should be marked for translation. There are three commonly used functions to do this, which are conveniently defined in pyanaconda/i18n.py:

   _ = lambda x: gettext.ldgettext("anaconda", x)

This function both marks a string for translation so the gettext tools will include it in anaconda.pot, and causes the string argument to be replaced by its translation at this point in the code. This is the most commonly used translation marking function.

   N_ = lambda x: x

This function obviously does nothing in anaconda. However, it does also mark a string for translation and inclusion in anaconda.pot. When you later want to display the translated text, you will need to use _. This function is seldom used.

   P_ = lambda x, y, z: gettext.ldngettext("anaconda", x, y, z)

Finally, this function works just like _ but correctly handles plurals. The rules for how to display the singular or plural of a string vary wildly between languages, but gettext somehow knows how to figure it all out. All you have to do is provide the singular string, the plural string, and the number. We use this one to display variations like "1 disk selected" or "2 disks selected" instead of the much lamer and lazier "1 disk(s) selected".

Global strings

Sometimes, you may want to use the same string in several places. Naturally, this means you will want to define it as a global variable outside of any class. If you do this, you must mark the string using the N_ function and then later display it with _.

This is because anaconda commonly starts up in English, but the user is free to change language at the welcome screen. We may import our python modules before this change occurs. If this happens, strings marked with _ outside of a class will be translated from English (since all our strings are in English in the source code) to English (the language in use at import time). The user then changes the runtime language but because the string has already been translated, it will appear in English.

A great example of avoiding this problem is in pyanaconda/ui/gui/spokes/custom.py. Note how unrecoverable_error_msg is defined with N_ at the top of the file, but is then wrapped as _(unrecoverable_error_msg) later on when it is displayed on the screen.

String substitutions

Substituting a value into a string is something we do a lot of. Consider the following:

   msg = "An error occurred when running lvm: %s" % e

At first, you may think the string should be marked for translation like this:

   msg = _("An error occurred when running lvm: %s" % e)

However, this won't do what you want. Remember how translations work. At package build time, gettext will build a database of all strings marked with one of the function described above. So, this string would get entered into that database alongside its various translations. So far, so good.

Runtime is where the problem occurs. If "e" above is "I/O error", then that will get substituted in and gettext will look for the following string in its database:

   "An error occurred when running lvm: I/O error"

But this string isn't in there, because at build time we entered the string without the substitution into the gettext database. So, substitutions must always be done outside of the _ call, like so:

   msg = _("An error occurred when running lvm: %s") % e

This will substitute the error message into the translated string, not the source string.

Complex string substitutions

Consider the following string:

   msg = _("Device %s (%s) needs %s more space to install %s") % (dev, fs, needed, productName)

This is really hard to translate because the translators do not see the context this string is defined in. All they see is the raw text inside the quotes. They are likely translating into a language where the words are in a completely different order than English. Thus, they need more help to be able to know how to deal with this string.

Luckily, python allows us to do substitutions using a dictionary. This makes the string more self-documenting for the translators. Whenever you have a string with more than one substitution, it makes sense to use this format:

   msg = _("Device %(deviceName)s (%(fileSystemType)s) needs %(neededSpace)s more space to "
           "install %(productName)s") % {"deviceName": dev, "fileSystemType": fs, "neededSpace": needed,
                                         "productName": productName}

Again, watch to make sure you do the substitution outside of the _ call.

If making substitutions on strings in glade, include in the string in the python sources. This way gettext can detect problems with format string translations.

In other words, do this:

  myWidget.set_label(_("WELCOME TO %(name)s %(version)s") % {"name": productName, "version": productVersion})

DO NOT do this:

  myWidget.set_text(myWidget.get_label() % {"name": productName, "version": productVersion})

You can still include a copy of the string in glade as a placeholder, but do not mark it as translatable.

Word wrapping

Whenever possible, do not hardcode newlines into any string that will be displayed in the GUI. Instead, leave it as one big line and set word wrap mode on the widget. Certain languages (like German) tend to have much longer strings. Hardcoding newlines makes it more difficult for GTK+ to get the word wrapping right. The failure mode for this is pretty weird: You will commonly see the window displayed in the upper right-hand corner of the screen, with various widgets going off the right side.

Exceptions

One of the more confusing areas is what to do with exception text.

First, you need to decide whether the exception text is ever going to be handled and displayed to the user or not. Note that "handled and displayed" does not mean the top-level all-else-has-failed exception handler we've got that displays a traceback to the user. If the text is not user-visible, then nothing needs to be done.

Then you need to decide what to do based upon where the text came from:

If it came from inside anaconda (when we raise our own exception with our own text) then the text should be marked for translation following the above rules. Note that for most exceptions we raise ourselves, the text is typically not important. We have defined a lot of very specific exception classes and handle them in such a way that the text commonly does not need to be translated.

If it came from some program we ran, we do not have much control over the translation. Because errors from programs we ran typically fall into the same category as log messages (more useful for us than for users), we do not want them translated. We enforce this with the augmentEnv function in pyanaconda/iutil.py.

But even if we did want the error translated, there's little we can do about it. Not all programs are translated fully, and for space reasons, we do not include the translations for every program we use on the installation media.

Finally, the text could have come from some other library we call. We do more commonly want to display these errors (especially when they come from pykickstart, or cracklib for instance) and we have better control over making sure they are translated. Before doing any special work, try just displaying the error and seeing what happens. Many are already correctly translated.

If not, however, you need to use gettext directly to look up the translation. Our _ functions will not work here because of translation domains. Basically, each program has its own database of translated text. Strings contained in libparted, for instance, will not be in anaconda's database - they will be in parted's. You must therefore look them up in the right place, like so:

   msg = gettext.ldgettext("parted", e)

Markup

The GUI uses Pango markup to modify how certain strings are displayed. Markup that is applied to an entire string should not be included in the string used for translation. For example:

   <b>Error adding FCoE SAN.</b>

In this case, the <b> tags are not useful to the translator. Instead of using <b> to mark the string as bold, set the weight to "bold" in the Pango attribute list in the glade file. For strings in Python where setting Pango attributes is not convenient, translate the content and add the surrounding markup to the result. Something like this example from pyanaconda/ui/gui/spokes/lib/resize.py:

   freeSpaceString = "<span foreground='grey' style='italic'>%s</span>" % escape_markup(_("Free space"))

escape_markup is defined in pyanaconda.ui.gui.utils and is used to ensure that the string being inserted into markup doesn't contain any markup itself.

Markup that is applied to only part of the string, on the other hand, should be translated.

   "Your current <a href=\"\"><b>%(product)s</b> software selection</a> requires <b>%(total)s</b> of available space"

In this example, only certain parts of the sentence are contained in <b> and <a> tags, so it's up to the translator to decide where those should go.

Unfortunately, we can't mix Pango attribute lists and markup, so there are some cases where unnecessary markup has to be included in the translatable string.

   <span size=\"small\"><b>Example:</b> squid.mysite.org:3128</span>

In this case, <b> is applied only to "Example:", so it must be specified with markup and included in the translatable string, but the <span> applies to the entire string and is not useful to the translator. The <span> is included in this case because we have no other way of setting the size="small" attribute on the string.

Contexts

A particular string has the same translation everywhere it is used in Anaconda. This can be problematic for strings that are used to control the UI. Any string with context-sensitive content, such as a keyboard accelerator underline, needs to include a translation context.

For example, a window in the GUI may have a button with a "_Select" label, where "_S" indicates that Alt-S can be used to press the button. In Czech this might be translated as "_Vybrat". Now suppose that elsewhere in the GUI there is a different window with a "_Select" label and a "_Forward" label, translated as "Vpřed". The Czech translator may prefer to use 'V' as the keyboard accelerator on "Vpřed", but will be unable to change the accelerator for "Vybrat" without changing the accelerator for every instance of "Vybrat" throughout the program. The solution is to add a context to the string so that instances of the same string with different contexts can be given different translations.

A context can be any string, but they are usually arranged hierarchically using pipes to separate components. For example, "GUI|Date and Time|NTP" is used as a context within the NTP dialog of the Date and Time spoke in the GUI.

Contexts in glade can be added as part of the label's text properties. In Python, use C_(context, message) instead of _(message), CN_(context, message) instead of N_(message), and CP_(context, message, message_plural, n) instead of P_(message, message_plural, n).

Providing additional information to translators

Comments that begin with "TRANSLATORS:" will be included in the template file used by translators. In glade, these comments can be specified in the comments attribute of the label property. In Python, the comment should appear directly above the string being translated.

Adding a comment can be helpful for strings that need additional information, such as single-letter abbreviations.

An example in Python:

   # TRANSLATORS: 'c' to continue
   if key == _('c'):