Text to Speech Synthesizer for Afaan Oromoo Using Statistical Parametric Speech Synthesis

No Thumbnail Available



Journal Title

Journal ISSN

Volume Title


Addis Ababa University


Speech synthesis systems are concerned with generating a natural sounding and intelligible speech by taking text as input. Speech synthesizers are very essential in helping impaired people, in teaching and learning process, for telecommunications and industries. Nevertheless, it has been a lot of challenging such as text processing, grapheme to phoneme and modeling prosody for years. Text preprocessing includes tokenization and normalization and then converting the grapheme representation of sounds to their phonetic representation and modeling prosodic features of various speaking styles. To address these challenges, different techniques have been studied and implemented. Speech synthesizers using statistical parametric speech based on hidden Markov model (HMM) are done for foreign languages which are not applicable for Afaan Oromoo language since the Afaan Oromoo language’s special characteristics are not considered in foreign synthesizers. Statistical parametric speech synthesis based on HMM techniques is chosen for these research because it is a model based that require less storage, it learn properties of data rather store the speech, small run time, and easy to integrate with small handheld devices. The Afaan Oromoo text to speech synthesis system has been developed using statistical parametric speech synthesis based on a hidden Markov model. The synthesizer has two main components: training and testing phases. In the training phase, source and excitation parameters of the speech are extracted from speech database. The speech and phonetic transcriptions are automatically segmented using EHMM labeling. During testing phase, the input text is processed to form phonetic strings along with the trained models. Finally, the synthesized speech is generated from speech parameters. In order to train the system being developed, we collected four hundred sentences and speeches. Additionally, we used ten sentences to test the performance of the system. In this study, the subjective Mean Opinion Score (MOS) and objective Mel Cepstral Distortion (MCD) evaluation techniques are used. The subjective results obtained using the mean opinion score (MOS) is 4.3 and 4.1 in terms of the intelligibility and naturalness of the synthesized speech respectively. The objective result obtained using mean opinion score is 6.8 out of 8 which is encouraging.



Statistical Parameter Speech Synthesis, Text To Speech, Afaan Oromoo, Hidden Markov Model Based Speech Synthesis