Diphone Based Amharic Speech Synthesis System Using Mary Tts
No Thumbnail Available
Date
2019-06-05
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
This paper tries to explore possibility of designing Amharic TTS using diphone and phone as unit. Concatenative speech synthesis is now widely adopted due to comparative performance with large volume of corpus. In this regard during the thesis Wikipedia Amharic articles are used as corpus to train letter-to-sound as well as preparation of pronunciation dictionary.
For generation of acoustic unit and features 120 sentences selected for maximum balanced phone coverage and it was recorded. The developed Amharic synthesizer for both diphone and unit selection is then evaluated for its performance, naturalness and intelligibility by five native speakers of the language.
The evaluation results show that 94% of words are correctly identified for diphone based synthesizer and 84 % for unit selection based on Dynamic Rhythmic Test (DRT). The test using MOS levels shows that the synthesizer achieved 3.6 and 2.65 for intelligibility and naturalness respectively for diphone based synthesizer and 2.68 intelligibility and 2.55 naturalness for unit selection synthesizer. The result shows that diphone based synthesizer outperforms the unit selection synthesizer and the result of diphone synthesizer looks promising and with introduction of non-standard word and professional studio level voice recording might increase the performance as well.
Description
Keywords
Diphone Based Amharic, Speech Synthesis, System Using Mary Tts