Text to Speech Synthesizer for Afaan Oromoo Using Statistical Parametric Speech Synthesis

dc.contributor.advisorMidekso, Dida (PhD)
dc.contributor.authorKedir, Muhidin
dc.date.accessioned2020-09-14T06:29:35Z
dc.date.accessioned2023-11-29T04:06:21Z
dc.date.available2020-09-14T06:29:35Z
dc.date.available2023-11-29T04:06:21Z
dc.date.issued2020-06-06
dc.description.abstractSpeech synthesis systems are concerned with generating a natural sounding and intelligible speech by taking text as input. Speech synthesizers are very essential in helping impaired people, in teaching and learning process, for telecommunications and industries. Nevertheless, it has been a lot of challenging such as text processing, grapheme to phoneme and modeling prosody for years. Text preprocessing includes tokenization and normalization and then converting the grapheme representation of sounds to their phonetic representation and modeling prosodic features of various speaking styles. To address these challenges, different techniques have been studied and implemented. Speech synthesizers using statistical parametric speech based on hidden Markov model (HMM) are done for foreign languages which are not applicable for Afaan Oromoo language since the Afaan Oromoo languageā€™s special characteristics are not considered in foreign synthesizers. Statistical parametric speech synthesis based on HMM techniques is chosen for these research because it is a model based that require less storage, it learn properties of data rather store the speech, small run time, and easy to integrate with small handheld devices. The Afaan Oromoo text to speech synthesis system has been developed using statistical parametric speech synthesis based on a hidden Markov model. The synthesizer has two main components: training and testing phases. In the training phase, source and excitation parameters of the speech are extracted from speech database. The speech and phonetic transcriptions are automatically segmented using EHMM labeling. During testing phase, the input text is processed to form phonetic strings along with the trained models. Finally, the synthesized speech is generated from speech parameters. In order to train the system being developed, we collected four hundred sentences and speeches. Additionally, we used ten sentences to test the performance of the system. In this study, the subjective Mean Opinion Score (MOS) and objective Mel Cepstral Distortion (MCD) evaluation techniques are used. The subjective results obtained using the mean opinion score (MOS) is 4.3 and 4.1 in terms of the intelligibility and naturalness of the synthesized speech respectively. The objective result obtained using mean opinion score is 6.8 out of 8 which is encouraging.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/123456789/22310
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectStatistical Parameter Speech Synthesisen_US
dc.subjectText To Speechen_US
dc.subjectAfaan Oromooen_US
dc.subjectHidden Markov Model Based Speech Synthesisen_US
dc.titleText to Speech Synthesizer for Afaan Oromoo Using Statistical Parametric Speech Synthesisen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Muhidin Kedir 2020.pdf
Size:
686.03 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: