From Fedora Project Wiki
This is a draft document

Introduction

Voice data packages enable application to play words of human languages. Usually such voice data are from various sources and for various purpose. For example, dictionaries contain voice file to demonstrate the pronunciation; web browser reader contain synthesizer to read the web pages; input method with voice helps to confirm the typing. Consequently, those voice files are put in various location and in various format, and make it hard for other applications which wants to share the data.

Common voice data packaging guide address this problem by providing an application-independent common interface for voice files. So applications can install voice data in various location, yet other applications can easily locate those voice files reuse them.

For example, Edward Liu from gcin provides Chinese voice data for input methods under gcin. The directory structure of those voice data files, of courses, are for gcin only. Project Ekho also have its own directory structures and file naming scheme for Cantonese, Mandarin, and Korean. If IBus want to these voice date files for its speak-as-you-type function, then there should be a common interface for voice data from various source.

Voice data license

According to Tom Callaway, the voice data should be in one of our Good Free Software , Approved content License, or at least freely redistributable with out limitation.

Voice file format

Voice data can be stored as individual files, packed in an archived file, or produced by synthesizers.

Voice data should be stored in open audio format format such as ogg. If using archive files, the archive file should be in open format such as 7z or tar. 7z is recommend for its capability of handling cross platform unicode filename.

Package naming guide

voicedata-<locale>-<source>-<variant>

where

  • locale is the locale string.
  • source is the upstream or the project name that provide the voice. default should be reserved word, as it will be a link to the preferred voice from org/variant.
  • variant is an optional field for noticeable info, such as the person who provide the voice, algorithm name.

For example, gcin voice data in this scheme should be named as: voicedata-zh_TW-gcin-EdwardLiu

Voice data preference can show something like:

Locale Voice
en_US UncleSam
en_GB QueenElizabeth
....... .......
zh_TW EdwardLiu

How to Pack

Theoretically, the voice data files can be with the main package (like gcin-voice can be with gcin). However, it is recommended to put voice data file under voicedata directory, such as:

%{_datadir}/voicedata/''locale''/''source-variant''

Suppose %{_datadir}/voicedata/locale/source-variant is %{pkg_data_dir}.

Under %{pkg_data_dir} there should be a sqlite3 database, named voicedata.sqlite, which has a table, phonemes_table, which records phonemes, corresponding voice data file, archive file, or the command to pronounce the phoneme.

The content of phonemes_table looks like:

phonemes file archive command
a a/a.ogg a.7z null
ai a/ai.ogg a.7z null
b b/b.ogg b.7z null

where

  • phonemes: Phonemes or word to be pronounced. Cannot be null.
  • file: Relateive filename of voice file from %{pkg_data_dir}, can be null if using a voice synthesizer.
  • archive: Archive file that hold the voice file, can be null if not using archive files.
  • command: Command that produce the voice. This field is mainly for synthesizer. Can be null.


Post processing

Post Install

Voice data packages must register themselves to the package_table in the voice data package database, voicedata-package.sqlite. The content of package_table looks like:

package locale path voice_format archive_format synthesizer
UncleSam en_US en_US/UncleSam wav 7z null
QueenElizabeth en_GB en_CommonWealth/QueenElizabath wav null some_synthesizer
gcin-EdwardLiu zh_TW zh_TW/gcin-EdwardLiu ogg null null

where

  • package: In the form of source-variant. Cannot be null.
  • locale: Locale that the package supported. Cannot be null.
  • path: Relative path from %{_datadir}/voicedata to %{pkg_data_dir}. Cannot be null.
  • voice_format: the voice format this package uses. Can be null
  • archive_format: the format of the archive file such as tar, 7z. Can be null.
  • synthesizer: Synthesizer to be used with this package.

Example post install script:

%post
# This command will be moved to voice data main package.
sqlite3 voicedata-package.sqlite "INSERT INTO 'package_table' ('package', 'locale', 'path', 'voice_format', 'archive_format','synthesizer') VALUES ('gcin-EdwardLiu','zh_TW', 'zh_TW/gcin-EdwardLiu', 'ogg', NULL, NULL);"


Post uninstall

Voice data packages also must unregister themselves from voicedata-package.sqlite by removing corresponding record from the package_table.

Example post uninstall script:

%postun
sqlite3 voicedata-package.sqlite "DELETE FROM 'package_table' WHERE package='gcin-EdwardLiu';"


Dialect handling

If the dialect is already in system locale, then use that locale.

If the dialect uses phoneme set which is identical with main language but with different pronunciation, the recommended way to is establishing an upstream project and recording the voice of the dialect.

If the dialect is using different phonemes with main language, make use of locale's modifier.

For example, to include Shanghai dialect can be expressed as:

zh_CN@shanghai

Full example

%define pkg_data_name gcin-EdwardLiu
%define voice_pkg_db %{_datadir}/voicedata/voicedata-package.sqlite 
Name:       voicedata-zh_TW-gcin-EdwardLiu
Version:    20090221
Release:    3%{?dist}
License:    GPLv3
Group:      Applications/Multimedia
URL:        http://cle.linux.org.tw/trac/wiki/GcinDistros#gcinvoicedata
Source0:    http://www.calno.com/moto/gcin/ogg-gpl3-%{version}.tar.gz
Summary:    Chinese voice data from gcin project, recorded by Edward Der-Hua Liu
Summary(zh_TW): Gcin 中文語音檔,由劉德華 (Edward Der-Hua Liu) 錄製
BuildRoot:  %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n)
BuildRequires: sqlite >= 3.0
BuildArch:  noarch

%description
This voice data records voice of Edward Der-Hua Liu, the gcin's author,
for enabling "speak-as-you-type" functionality of gcin.

The voice data is now released under GPLv3+.

%description -l zh_TW
本中文語音檔由劉德華 (Edward Der-Hua Liu),gcin的作者錄製,
以實現「輸入時唸出發音」功能。

本語音檔現已 GPLv3 釋出。

%prep
%setup -q -n ogg

%build 
ls -d [^A-Za-z]* | sed -e 's/^.*$/&,&\/3.ogg,,/' | sed -e 's/2,/ˊ,/' | sed -e 's/3,/ˇ,/' | sed -e 's/4,/ˋ,/' > phonemes.csv
sqlite3 voicedata.sqlite "CREATE TABLE 'phonemes_table' ('phonemes' PRIMARY KEY, 'file',  'archive', 'command');"
sqlite3 -separator , voicedata.sqlite ".import phonemes.csv phonemes_table"
rm phonemes.csv

%install
rm -rf $RPM_BUILD_ROOT
mkdir -p ${RPM_BUILD_ROOT}/%{_datadir}/voicedata/zh_TW/%{pkg_data_name}
cp -R * ${RPM_BUILD_ROOT}/%{_datadir}/voicedata/zh_TW/%{pkg_data_name}

%post
# This command will be moved to voice data main package.
sqlite3 %{voice_pkg_db} "CREATE TABLE 'package_table' ('package', 'locale', 'path',  'voice_format', 'archive_format', 'synthesizer');"

sqlite3 %{voice_pkg_db} "INSERT INTO 'package_table' ('package', 'locale', 'path', 'voice_format', 'archive_format','synthesizer') VALUES ('%{pkg_data_name}','zh_TW', 'zh_TW/%{pkg_data_name}', 'ogg', NULL, NULL);"

%postun
sqlite3 %{voice_pkg_db} "DELETE FROM 'package_table' WHERE package='%{pkg_data_name}';"


%clean
rm -rf $RPM_BUILD_ROOT


%files 
%defattr(-,root,root,-)
%doc LICENSE gpl-3.0.txt
%{_datadir}/voicedata/zh_TW/%{pkg_data_name}


%changelog
....