Introduction
Voice data packages enable application to play words of human languages. Usually such voice data are from various sources and for various purpose. For example, dictionaries contain voice file to demonstrate the pronunciation; web browser reader contain synthesizer to read the web pages; input method with voice helps to confirm the typing. Consequently, those voice files are put in various location and in various format, and make it hard for other applications which wants to share the data.
Common voice data packaging guide address this problem by providing an application-independent common interface for voice files. So applications can install voice data in various location, yet other applications can easily locate those voice files reuse them.
For example, Edward Liu from gcin provides Chinese voice data for input methods under gcin. The directory structure of those voice data files, of courses, are for gcin only. Project Ekho also have its own directory structures and file naming scheme for Cantonese, Mandarin, and Korean. If IBus want to these voice date files for its speak-as-you-type function, then there should be a common interface for voice data from various source.
Voice data license
According to Tom Callaway, the voice data should be in one of our Good Free Software , Approved content License, or at least freely redistributable with out limitation.
Voice file format
Voice data can be stored as individual files, packed in an archived file, or produced by synthesizers.
Voice data should be stored in open audio format format such as ogg. If using archive files, the archive file should be in open format such as 7z or tar. 7z is recommend for its capability of handling cross platform unicode filename.
Package naming guide
voicedata-<locale>-<source>-<variant>
where
- locale is the locale string.
- source is the upstream or the project name that provide the voice. default should be reserved word, as it will be a link to the preferred voice from org/variant.
- variant is an optional field for noticeable info, such as the person who provide the voice, algorithm name.
For example, gcin voice data in this scheme should be named as:
voicedata-zh_TW-gcin-EdwardLiu
Voice data preference can show something like:
Locale | Voice |
---|---|
en_US | UncleSam |
en_GB | QueenElizabeth |
....... | ....... |
zh_TW | EdwardLiu |
How to Pack
Theoretically, the voice data files can be with the main package (like gcin-voice can be with gcin). However, it is recommended to put voice data file under voicedata directory, such as:
%{_datadir}/voicedata/''locale''/''source-variant''
Suppose %{_datadir}/voicedata/locale/source-variant is %{pkg_data_dir}.
Under %{pkg_data_dir} there should be a sqlite3 database, named voicedata.sqlite, which has a table, phonemes_table, which records phonemes, corresponding voice data file, archive file, or the command to pronounce the phoneme.
The content of phonemes_table looks like:
phonemes | file | archive | command |
---|---|---|---|
a | a/a.ogg | a.7z | null |
ai | a/ai.ogg | a.7z | null |
b | b/b.ogg | b.7z | null |
where
- phonemes: Phonemes or word to be pronounced. Cannot be null.
- file: Relateive filename of voice file from %{pkg_data_dir}, can be null if using a voice synthesizer.
- archive: Archive file that hold the voice file, can be null if not using archive files.
- command: Command that produce the voice. This field is mainly for synthesizer. Can be null.
Post processing
Post Install
Voice data packages must register themselves to the package_table in the voice data package database, voicedata-package.sqlite. The content of package_table looks like:
package | locale | path | voice_format | archive_format | synthesizer |
---|---|---|---|---|---|
UncleSam | en_US | en_US/UncleSam | wav | 7z | null |
QueenElizabeth | en_GB | en_CommonWealth/QueenElizabath | wav | null | some_synthesizer |
gcin-EdwardLiu | zh_TW | zh_TW/gcin-EdwardLiu | ogg | null | null |
where
- package: In the form of source-variant. Cannot be null.
- locale: Locale that the package supported. Cannot be null.
- path: Relative path from %{_datadir}/voicedata to %{pkg_data_dir}. Cannot be null.
- voice_format: the voice format this package uses. Can be null
- archive_format: the format of the archive file such as tar, 7z. Can be null.
- synthesizer: Synthesizer to be used with this package.
Example post install script:
%post # This command will be moved to voice data main package. sqlite3 voicedata-package.sqlite "INSERT INTO 'package_table' ('package', 'locale', 'path', 'voice_format', 'archive_format','synthesizer') VALUES ('gcin-EdwardLiu','zh_TW', 'zh_TW/gcin-EdwardLiu', 'ogg', NULL, NULL);"
Post uninstall
Voice data packages also must unregister themselves from voicedata-package.sqlite by removing corresponding record from the package_table.
Example post uninstall script:
%postun sqlite3 voicedata-package.sqlite "DELETE FROM 'package_table' WHERE package='gcin-EdwardLiu';"
Dialect handling
If the dialect is already in system locale, then use that locale.
If the dialect uses phoneme set which is identical with main language but with different pronunciation, the recommended way to is establishing an upstream project and recording the voice of the dialect.
If the dialect is using different phonemes with main language, make use of locale's modifier.
For example, to include Shanghai dialect can be expressed as:
zh_CN@shanghai
Full example
%define pkg_data_name gcin-EdwardLiu %define voice_pkg_db %{_datadir}/voicedata/voicedata-package.sqlite Name: voicedata-zh_TW-gcin-EdwardLiu Version: 20090221 Release: 3%{?dist} License: GPLv3 Group: Applications/Multimedia URL: http://cle.linux.org.tw/trac/wiki/GcinDistros#gcinvoicedata Source0: http://www.calno.com/moto/gcin/ogg-gpl3-%{version}.tar.gz Summary: Chinese voice data from gcin project, recorded by Edward Der-Hua Liu Summary(zh_TW): Gcin 中文語音檔,由劉德華 (Edward Der-Hua Liu) 錄製 BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n) BuildRequires: sqlite >= 3.0 BuildArch: noarch %description This voice data records voice of Edward Der-Hua Liu, the gcin's author, for enabling "speak-as-you-type" functionality of gcin. The voice data is now released under GPLv3+. %description -l zh_TW 本中文語音檔由劉德華 (Edward Der-Hua Liu),gcin的作者錄製, 以實現「輸入時唸出發音」功能。 本語音檔現已 GPLv3 釋出。 %prep %setup -q -n ogg %build ls -d [^A-Za-z]* | sed -e 's/^.*$/&,&\/3.ogg,,/' | sed -e 's/2,/ˊ,/' | sed -e 's/3,/ˇ,/' | sed -e 's/4,/ˋ,/' > phonemes.csv sqlite3 voicedata.sqlite "CREATE TABLE 'phonemes_table' ('phonemes' PRIMARY KEY, 'file', 'archive', 'command');" sqlite3 -separator , voicedata.sqlite ".import phonemes.csv phonemes_table" rm phonemes.csv %install rm -rf $RPM_BUILD_ROOT mkdir -p ${RPM_BUILD_ROOT}/%{_datadir}/voicedata/zh_TW/%{pkg_data_name} cp -R * ${RPM_BUILD_ROOT}/%{_datadir}/voicedata/zh_TW/%{pkg_data_name} %post # This command will be moved to voice data main package. sqlite3 %{voice_pkg_db} "CREATE TABLE 'package_table' ('package', 'locale', 'path', 'voice_format', 'archive_format', 'synthesizer');" sqlite3 %{voice_pkg_db} "INSERT INTO 'package_table' ('package', 'locale', 'path', 'voice_format', 'archive_format','synthesizer') VALUES ('%{pkg_data_name}','zh_TW', 'zh_TW/%{pkg_data_name}', 'ogg', NULL, NULL);" %postun sqlite3 %{voice_pkg_db} "DELETE FROM 'package_table' WHERE package='%{pkg_data_name}';" %clean rm -rf $RPM_BUILD_ROOT %files %defattr(-,root,root,-) %doc LICENSE gpl-3.0.txt %{_datadir}/voicedata/zh_TW/%{pkg_data_name} %changelog ....