Concatenative Speech Synthesis for Amharic Using Unit Selection Method
No Thumbnail Available
Date
2011-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Speech synthesis takes text as input and generates acoustic signal as output. In the process, the
input text is preprocessed to tokenize it into words or other meaningful tokens and to transliterate
numbers, abbreviations and acronyms. Text-analysis follows text preprocessing to identify
grammatical structures and context. Once the text analysis phase is completed the next step is to
convert graphical representation of sounds to their phonetic representation. A phoneme usually
has multiple phones that are used in different contexts.
Amharic language’s orthography is phonemical in the sense that a grapheme represents exactly
one phoneme. However, this statement is true as long as epenthesis and geminations are not
considered. The language’s orthography does not also show suprasegmental information that is
required to properly model speaking styles. Even though converting grapheme to phoneme is
easy in Amharic, converting phoneme to phone is very difficult because of the two necessary and
yet orthographically unrepresented components of the language – epenthesis and gemination.
Modeling prosodic features of various speaking styles is also the other challenging task in
developing Amharic TTS. This is challenging because, in one hand, the task of modeling human
speech is very challenging in itself and in the other hand, research works done for Amharic
language are relatively few.
This project work has tried to address epenthesis and gemination, which are phonologically very
important features of the language, by studying and implementing techniques found in various
literatures. Making use of orthographic property of verbs in their perfect form, this work
introduces rules that can be used to locate phones that need to be stressed. The grapheme to
phoneme conversion algorithm also addresses epenthesis. Prosodic differences of declarative and
interrogative utterances are represented by making use of unique sentence-final phones recorded
and segmented for this purpose. Transliteration of numerals and abbreviations is also addressed
in the text preprocessing phase of the system.
The results found after being evaluated by ten fluent speakers of the language are encouraging.
Description
Keywords
Concatenative ;Speech Synthesis