Glibc Unicode 7.0
Summary
We are updating Glibc Unicode data from Unicode 5.1 to Unicode 7.0 version. It took a long time, since there was not much documentation on how to update Unicode data and also there was the chance of loosing backward compatibility. Most of the issues are resolved now and patches are ready for inclusion. This update adds around 8000 supported characters in Glibc, and also corrects the Unicode data of many characters per the latest Unicode standard.
Owner
- Name: Mike Fabian Pravin Satpute Siddhesh Poyarekar
- Email: mfabian At redhat DOT com, pravins At fedoraproject DOT org, spoyarek AT redhat DOT com
- Release notes owner:
Current status
Detailed Description
In this update we are planning to update Glibc's Unicode locale data— character map and LC_CTYPE information— to Unicode version 7.0. This data is used almost in all locales and is going to affect all applications using these locales. It is a system wide change since it impacts glibc and application dependent on it. Glibc provides two files for Unicode data, UTF-8 and i18n. The UTF-8 file provides information about CHARMAP and WIDTH for Unicode characters. i18n file provides CTYPE (uppercase, lowercase, punctuation, etc.) information for all Unicode characters. This has not been updated for a long time due to incomplete documentation and also the possible chance of loosing backward compatibility. Work has been started on this 5-6 months back, and now most of the issues are resolved.
Relevant bugs in upstream for more information.
Github repo for scripts.
Benefit to Fedora
With this change, users and developers of Fedora will get Unicode 7.0 support through Glibc. Though we are upgrading from Unicode 5.1 to 7.0, users will get an updated Unicode data from Glibc locales. No new functionality is added. Fedora is the leading distribution when it comes to internationalization. By including this change, Fedora users and developers will get the latest Unicode locale data.
Scope
- Proposal owners:
1. Writing scripts for generating UTF-8 and i18n files from Unicode character database. 2. Preparing patch for UTF-8 and i18n files. 3. Preparing backward compatibility report. 4. Applying patches to Fedora. 5. Testing whether does it breaks anything around.
- Other developers: This change will impact glibc and all applications that use locales. Other Developers do not need to make any changes from their end, but they need to watch how their application behaves with improved localedata. We need proper testing to see that it does not break any application.
- Release engineering: No work required from Release engineering.
- Policies and guidelines: No, this change does not required any updates to Policies or packaging guideline updates.
Upgrade/compatibility impact
Upgrade will be smooth. Users will get exact things with updated Unicode data.
How To Test
- Glibc includes extensive test-case coverage to test localedata changes.
- This change is affecting Unicode characters, so users will notice little effect on rendering if any.
- Glibc is used by rendering engine for determining the type of characters, so again observe rendering and report if any issue.
- Document section provide detailed report regarding change.
User Experience
- Users and Developers will get support for Unicode standard 7.0 through locales.
Dependencies
- No other RPMs packages depends on this.
- This patch won't be able to make it into glibc upstream release 2.21, so we're going to maintain it as a patch for Fedora 22.
Contingency Plan
- Contingency mechanism: Will drop patches from Glibc build.
- Contingency deadline: Before F22 Beta release eg. Beta freeze.
- Blocks release? No
- Blocks product? product No
Documentation
- This will be available with backward compatibility report.
- Backward compatibility report for UTF-8 file.
- i18n CTYPE file additions [report https://raw.githubusercontent.com/pravins/glibc-i18n/master/report-ctype ]. Its 2.1 MB in size.