Diphone Based Amharic Speech Synthesis System Using Mary Tts

No Thumbnail Available

Date

2019-06-05

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

This paper tries to explore possibility of designing Amharic TTS using diphone and phone as unit. Concatenative speech synthesis is now widely adopted due to comparative performance with large volume of corpus. In this regard during the thesis Wikipedia Amharic articles are used as corpus to train letter-to-sound as well as preparation of pronunciation dictionary. For generation of acoustic unit and features 120 sentences selected for maximum balanced phone coverage and it was recorded. The developed Amharic synthesizer for both diphone and unit selection is then evaluated for its performance, naturalness and intelligibility by five native speakers of the language. The evaluation results show that 94% of words are correctly identified for diphone based synthesizer and 84 % for unit selection based on Dynamic Rhythmic Test (DRT). The test using MOS levels shows that the synthesizer achieved 3.6 and 2.65 for intelligibility and naturalness respectively for diphone based synthesizer and 2.68 intelligibility and 2.55 naturalness for unit selection synthesizer. The result shows that diphone based synthesizer outperforms the unit selection synthesizer and the result of diphone synthesizer looks promising and with introduction of non-standard word and professional studio level voice recording might increase the performance as well.

Description

Keywords

Diphone Based Amharic, Speech Synthesis, System Using Mary Tts

Citation