No edit summary |
|||
(8 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
{{admon/important| This page is a | {{admon/important|This page needs proofreading. Thank you.}} | ||
{{admon/note|This page describes and explains how to setup and use a python tool that aims at detecting typographic faults in translation files. It assumed that the typographic correction described here is part of a whole process of pulling, modifying and pushing translated files from/to Zanata. see [[L10N/Zanata_Reviewing_.po_Files_Locally| L1ON/Zanata Reviewing .po Files Locally]]. }} | |||
= Updated for = | |||
This page has been updated for '''Fedora 23'''. | |||
=Context= | |||
While reviewing translation on Zanata, the reviewer may find some repetitive faults the translator has made. This kind of situation may result from various reasons such as: | |||
* Lack of attention by the translator to some aspects that are not always very visible while editing (e.g. double spaces) | |||
* Ignorance by the translator of a grammar or punctuation rule that leads to repetition of the error (e.g. in French double punctuation – :,;,!,? – should be preceded by a fine non breakable space contrarily to the English language) | |||
* Pure translation error for a repetitive word | |||
* Any other unknown rule | |||
In such a situation, Zanata's search and replace functionality is not of great help. To be able to search and replace repetitive faults, the reviewer has to pull the translated files from Zanata, use some OS tools to do so and, eventually, push back the modified files to Zanata. | |||
The present page describes a tool to check typographic righteousness of .po files. | |||
{{admon/note| This page assumes the reviewer is using Fedora as OS – see the "Updated for" section above.}} | |||
=Description of the tool= | =Description of the tool= | ||
==User Interface== | ==User Interface== | ||
This tool is a python script that can be found at [https://github.com/jaaf/po_purifier https://github.com/jaaf/po_purifier]. | This tool is a python script that can be found at [https://github.com/jaaf/po_purifier https://github.com/jaaf/po_purifier]. | ||
It | It scans a directory searching for .po files. For each file, it checks translated messages against typographic rules that reside in a configuration file named ''typorules.py''. | ||
Each time a typographic rule is not satisfied, the program stops and asks the user what to do. The figure 1 below shows how it looks like : | Each time a typographic rule is not satisfied, the program stops and asks the user what to do. The figure 1 below shows how it looks like : | ||
[[Image:po_purifier_1.png|center|thumb|400px|''Figure 1: Typographic Fault Detected'']] | [[Image:po_purifier_1.png|center|thumb|400px|''Figure 1: Typographic Fault Detected'']] | ||
* The | * The question asked to the user, that appears in English here, normally appears in the user's language, provided that the program has been localized. It has 2 parts: | ||
** The first part that tells the user a typo rule is infringed and that he has to decide for change or not (this message is part of the program and has to be localized) | ** The first part that tells the user a typo rule is infringed and that he has to decide for change or not (this message is part of the program and has to be localized) | ||
** The typo rule itself (it belongs to the typorules.py file) | ** The typo rule itself (it belongs to the typorules.py file) | ||
Line 27: | Line 42: | ||
* The corrected message is displayed in blue color. | * The corrected message is displayed in blue color. | ||
* Then the program informs the user it has not changed some | * Then the program informs the user it has not changed some messages because no typo faults were detected. It should do likewise till the next fault detection. | ||
The figure 3 below shows a case where the user could use the c (for prior change) option. Indeed, we can see that an hyphen has been used in place of a semi-em dash. In French the spacing rules for hyphen and semi-em dash are different. An hyphen requires no space between the previous and the following word, while a semi-em dash requires a spaces for both. It appears that changing the hyphen with a semi-em dash is the best solution here. | The figure 3 below shows a case where the user could use the c (for prior change) option. Indeed, we can see that an hyphen has been used in place of a semi-em dash. In French the spacing rules for hyphen and semi-em dash are different. An hyphen requires no space between the previous and the following word, while a semi-em dash requires a spaces for both. It appears that changing the hyphen with a semi-em dash is the best solution here. | ||
Line 49: | Line 64: | ||
|} | |} | ||
Frequently, it happens that a message is not really a text in the target language but a very long output of command or the description of the command itself with a lot of options. In this case, the typo rules are very often infringed and the program stops many times. To avoid this the user can chose 's' to skip the message and prevent repetition of false detection. See fig 8 below. | Frequently, it happens that a message is not really a text in the target language but a very long output of a command or the description of the command itself with a lot of options. In this case, the typo rules are very often infringed and the program stops many times. To avoid this the user can chose 's' to skip the message and prevent repetition of false detection. See fig 8 below. | ||
[[File:po_purifier_8.png|center|thumb|400px|''Figure 8 - An example of message that should be skipped'']] | [[File:po_purifier_8.png|center|thumb|400px|''Figure 8 - An example of message that should be skipped'']] | ||
==Getting the tool== | ==Getting the tool== | ||
Line 59: | Line 73: | ||
Once unzipped, you should have a {{Path|po_purifier-master folder}}. Inside this folder is a {{Path|po_purifier}} folder that contains the program '''po_typo_purifier.py'''. | Once unzipped, you should have a {{Path|po_purifier-master folder}}. Inside this folder is a {{Path|po_purifier}} folder that contains the program '''po_typo_purifier.py'''. | ||
Aside this {{Path|po_purifier}} folder is a {{Path|fr}} folder. This folder contains a list of .po file that are here for testing purposes. The best is to take the {{Path|po_purifier}} folder and to place | Aside this {{Path|po_purifier}} folder is a {{Path|fr}} folder. This folder contains a list of .po file that are here for testing purposes. The best is to take the {{Path|po_purifier}} folder and to place it aside the folder that contains the translated files (.po) you have already pulled from Zanata (see [[L10N/Zanata_Reviewing_.po_Files_Locally| L1ON/Zanata Reviewing .po Files Locally]]). | ||
{{admon/note| Generally the folder that contains the .po file is named after your locale e.g. 'fr'. You should set the ''locale'' variable at the beginning of your typorules.py file with this value to allow the program to find the translated files. }} | {{admon/note| Generally the folder that contains the .po file is named after your locale e.g. 'fr'. You should set the ''locale'' variable at the beginning of your typorules.py file with this value to allow the program to find the translated files. }} | ||
{{admon/note|If you chose to name the folder that contains the .po files otherwise, or to place it elsewhere, you have to use the -t (--trans-dir) option when calling the program to indicate the path of the .po files relative to | {{admon/note|If you chose to name the folder that contains the .po files otherwise, or to place it elsewhere, you have to use the -t (--trans-dir) option when calling the program to indicate the path of the .po files relative to the location of the program's parent e.g. <br/> | ||
python3 po_typo_purifier.py -t de/pot }} | python3 po_typo_purifier.py -t de/pot }} | ||
Line 70: | Line 84: | ||
You should set: | You should set: | ||
* the locale variable with your contry code as it is named by zanata e.g. 'fr', 'de', etc. | * the locale variable with your contry code as it is named by zanata e.g. 'fr', 'de', etc. | ||
* the language code e.g. 'fr_FR.UTF-8' where the localized messages of the program are found under {{Path|locale}}. Note that 'locale' is the name of a folder, not the locale variable above. | * the language code e.g. 'fr_FR.UTF-8' where the localized messages of the program are found under the {{Path|locale}} folder. Note that 'locale' here is the name of a folder, not the locale variable above. | ||
* the sr variable that contains the list of typographic rules for your language. Please take the French example as a guide for formatting these rules. | * the sr variable that contains the list of typographic rules for your language. Please take the French example as a guide for formatting these rules. | ||
Line 92: | Line 106: | ||
Of course, if you agree with the changes, you should replace the original files in your pull, modify, push from/to zanata process with these modified files and pursue your modifications manually or with others tools. | Of course, if you agree with the changes, you should replace the original files in your pull, modify, push from/to zanata process with these modified files and pursue your modifications manually or with others tools. | ||
[[Category:Localization]] |
Latest revision as of 03:25, 3 June 2016
Updated for
This page has been updated for Fedora 23.
Context
While reviewing translation on Zanata, the reviewer may find some repetitive faults the translator has made. This kind of situation may result from various reasons such as:
- Lack of attention by the translator to some aspects that are not always very visible while editing (e.g. double spaces)
- Ignorance by the translator of a grammar or punctuation rule that leads to repetition of the error (e.g. in French double punctuation – :,;,!,? – should be preceded by a fine non breakable space contrarily to the English language)
- Pure translation error for a repetitive word
- Any other unknown rule
In such a situation, Zanata's search and replace functionality is not of great help. To be able to search and replace repetitive faults, the reviewer has to pull the translated files from Zanata, use some OS tools to do so and, eventually, push back the modified files to Zanata.
The present page describes a tool to check typographic righteousness of .po files.
Description of the tool
User Interface
This tool is a python script that can be found at https://github.com/jaaf/po_purifier. It scans a directory searching for .po files. For each file, it checks translated messages against typographic rules that reside in a configuration file named typorules.py. Each time a typographic rule is not satisfied, the program stops and asks the user what to do. The figure 1 below shows how it looks like :
- The question asked to the user, that appears in English here, normally appears in the user's language, provided that the program has been localized. It has 2 parts:
- The first part that tells the user a typo rule is infringed and that he has to decide for change or not (this message is part of the program and has to be localized)
- The typo rule itself (it belongs to the typorules.py file)
- In this case the French typo rule requires a narrow no break space between a value and its unit and the location of the fault is shown with a green highlight.
- To help the user, the message is shown twice. Firstly with the various spaces colorized according to their type, secondly with the typo fault highlighted in green color.
The figure 2 below shows what happens after the user has accepted the change.
- The message Change accepted is displayed.
- The corrected message is displayed in blue color.
- Then the program informs the user it has not changed some messages because no typo faults were detected. It should do likewise till the next fault detection.
The figure 3 below shows a case where the user could use the c (for prior change) option. Indeed, we can see that an hyphen has been used in place of a semi-em dash. In French the spacing rules for hyphen and semi-em dash are different. An hyphen requires no space between the previous and the following word, while a semi-em dash requires a spaces for both. It appears that changing the hyphen with a semi-em dash is the best solution here.
The following figures show how the process occurs.
Frequently, it happens that a message is not really a text in the target language but a very long output of a command or the description of the command itself with a lot of options. In this case, the typo rules are very often infringed and the program stops many times. To avoid this the user can chose 's' to skip the message and prevent repetition of false detection. See fig 8 below.
Getting the tool
If you don't plan to improve the program, just go to The github page and use the green Clone or download button to download the po_purifier-master.zip file. Once unzipped, you should have a po_purifier-master folder. Inside this folder is a po_purifier folder that contains the program po_typo_purifier.py.
Aside this po_purifier folder is a fr folder. This folder contains a list of .po file that are here for testing purposes. The best is to take the po_purifier folder and to place it aside the folder that contains the translated files (.po) you have already pulled from Zanata (see L1ON/Zanata Reviewing .po Files Locally).
Configuring the tool
The configuration of the tool for your language is done in the typorule.py file. You should set:
- the locale variable with your contry code as it is named by zanata e.g. 'fr', 'de', etc.
- the language code e.g. 'fr_FR.UTF-8' where the localized messages of the program are found under the locale folder. Note that 'locale' here is the name of a folder, not the locale variable above.
- the sr variable that contains the list of typographic rules for your language. Please take the French example as a guide for formatting these rules.
Using the tool
If the translation folder is aside the program's parent folder po_purifier and contains the .po file at its first level, just use:
python3 <path_to_program>/po_typo_purifier.py
Otherwise use:
python3 <path_to_program>/po_typo_purifier.py -t <path_to_po_file>
where<path_to_po_file> is the path to the .po file starting from the program's parent folder po_purifier e.g.
python3 <path_to_program>/po_typo_purifier.py -t de/pot
Finding the results
Normally the original files remain unchanged and the corrected files (or not corrected for some of them) are placed in a folder named purified aside the program's parent folder po_purifier.
Of course, if you agree with the changes, you should replace the original files in your pull, modify, push from/to zanata process with these modified files and pursue your modifications manually or with others tools.