Localization measurement and tooling
Summary
Provide a public website for end users and contributors, containing Fedora Workstation translation progress and useful files for translators (as an example: translation memories).
Owner
- Name: Jean-Baptiste Holcroft, Francois Andrieu
- Email: <jean-baptiste@holcroft.fr>
Current status
- Targeted release: Fedora 34
- Last updated: 2021-01-27
- FESCo issue: #2545
- Tracker bug: #1921178
- Release notes tracker: #645
Detailed Description
Language support is a transversal activity, there is no way to know the actual language support provided by Fedora as an Operating System.
Because language support and translations are part of each upstream software, the Linux language community is as spread as the Free Libre and Open Source community is.
The ability to share efforts is limited (with data, tools, etc.):
- because of the complexity to get an overview of the current localization status of the Linux community,
- because translators often have a low level of technical knowledge,
- because development experts are more keen to use English by default, and don't know much about languages support requirements.
Debian did something similar (20 years ago) https://www.debian.org/international/l10n/ . But this work:
- is limited in terms of features (no translation memories there)
- is too deeply integrated with Debian infrastructure (data extraction, computation and website generation are 100% debian specific)
- is using a programming language that doesn't allow to share easily with existing i18n/l10n libraries (it did not exist 20 years ago)
Feedback
Wouldn't it be better to e.g. enhance Weblate to report stats for projects which are externally translated through some different project?
https://translate.fedoraproject.org/ contains what is specific to Fedora project (documentation, websites, FAS, etc.) but most of what is contained in the operating system we build is not specific to the Fedora project.
Each upstream project decides their translation process. Gnome: https://l10n.gnome.org/ KDE: https://l10n.kde.org Mozilla: https://pontoon.mozilla.org/ Libreoffice: https://translations.documentfoundation.org/ etc.
What we can measure in https://translate.fedoraproject.org is the health of the Fedora community.
What we will measure with this change, is what is what the Linux ecosystem is delivering to end users. Which should help to make the Linux community more effective.
Weblate is a translation platform. Using it to display translations of projects who did not choose to be part of our translation would be equivalent to fork what upstream do.
Ubuntu does it with launchpad, but the limit between upstream work and distribution work isn't clear enough. Translators do some work in launchpad but the translation don't go upstream automatically (which means most of the time it never goes upstream).
This would probably be really confusing for end-users and Fedora community would find it incompatible with Fedora values.
In addition, Weblate is a great tool, but really complex and moving quite fast. We do share technical components (translate toolkit and language lists), but more won't make sense for our usecase.
translations should be controlled by upstream projects, not distributions. Fedora should stay closer to the upstream projects, not drift away from them.
This change is aligned with this, one idea would be to help contributors to understand where to go translate each project upstream.
The Language-Team attribute probably is good enough to lead a contributor at the right place, here are a few examples for French language:
0ad: "Language-Team: French (http://www.transifex.com/wildfire-games/0ad/language/fr/)\n" ABRT: "Language-Team: French <https://translate.fedoraproject.org/projects/abrt/" Apstream: "Language-Team: French <https://hosted.weblate.org/projects/appstream/" Audacious: "Language-Team: French (http://www.transifex.com/audacious/audacious/language/fr/)\n" Gnome shell: "Language-Team: GNOME French Team <gnomefr(a)traduc.org" Krita: "Language-Team: French <kde-francophone(a)kde.org"
Why not using transtats? What's the future of transtats?
Transtats covers 100 manually configured packages, while the change does the following (stats are for f33):
- use dnf to download all srpm for a fedora relaese (21330 packages)
- detect po files (2230 packages have at least one po file, more file format exists, but
it will be for the future ;))
- extract all po files (200 337 po files)
- deduct language list (344 languages)
- produce stats and consolidated files (16GB of files before compression)
- publish a website (2 GB once files are compressed)
The Transtats UI is good, but it really is focused on translation propagation across systems, bringing a huge complexity.
We could probably try to merge both tools together by writing down the goals each tool want to achieve. Measuring the usage of transtats would help to identify if some features are to be preserved.
Workshop is to be organized to build a plan. Proposed date is before flock (if this is a physical Flock).
Benefit to Fedora
It is a progress for the project: provide a new tool to translator community.
It helps the Linux community to better understand the language support challenges.
It increases contributors effectiveness by providing translation memories and other tools.
These translation memories open new possibilities:
- to train machines to suggest new translations?
- to detect quality issues (spellcheck, linters, etc)?
- the change the way we ship translations to users? (Ubuntu does it, but never bring back translation to main project)
- to advertise user that Linux is available in many languages?
Scope
All of the work is isolated, as long as dnf works, the automation works. The closer to mirror the cheaper it is for network cost (all Fedora is downloaded at each execution).
- Proposal owners:
- Francois Andrieu integrate the existing scripts into containers to allow execution into openshift
- Infra team:
- provide some space for script execution (50 GB per release)
- provide the languages.fedoraproject.org domain name
- provide a location for static website (about 2 GB per release, may increase over time)
- Other developers: N/A (not a System Wide Change)
- Release engineering: #Releng issue number (a check of an impact with Release Engineering is needed)
- Policies and guidelines: N/A (not a System Wide Change)
- Trademark approval: N/A (not needed for this Change)
- Alignment with mission: In our community, contributors of all kinds come together to advance the ecosystem for the benefit of everyone.
.
Upgrade/compatibility impact
N/A (not a System Wide Change)
How To Test
N/A (not a System Wide Change)
User Experience
Dependencies
N/A (not a System Wide Change)
Contingency Plan
- Contingency mechanism: (What to do? Who will do it?) N/A (not a System Wide Change)
- Contingency deadline: N/A (not a System Wide Change)
- Blocks release? N/A (not a System Wide Change), Yes/No
- Blocks product? product
Documentation
A draft with simplistic template is there: https://jibecfed.fedorapeople.org/partage/fedora-localization-statistics/f32/language/fr/
Code and "documentation" are there: https://pagure.io/fedora-localization-statistics
About other project:
- Debian's code to build website with language progress: https://salsa.debian.org/webmaster-team/webwml/-/commits/master/english/international/l10n/scripts/transmonitor-check
- Ubuntu's code to build langpacks: https://bazaar.launchpad.net/~ubuntu-langpack/langpack-o-matic/main/files
- Note: ubuntu does provide language progress in launchpad: https://translations.launchpad.net/ubuntu and some useful documentation is there: https://dev.launchpad.net/Translations