Skip navigation

Please use this identifier to cite or link to this item:
Title: Syllable-Based Amharic Speech Synthesis (TTS) Using Hmm
???metadata.dc.contributor.*???: Dr. Solomon Teferra
Demessie, Bahiru
Keywords: ASR Corpus;Syllable;Speech Synthesis;Hidden Markov Model
Issue Date: Jun-2017
Publisher: Addis Ababa University
Abstract: Speech Synthesis systems have been developed gradually over the last few decades and it has been integrated into several new applications. There is still much work and improvements to be done in prosodic, text preprocessing, and pronunciation fields to achieve more natural sounding speech. In this thesis work, ASR corpus is used to develop a syllable based speech synthesis system for Amharic language using Hidden Markov Model. The datasets are randomly selected from ASR corpus with six female speakers’ corpora as training data. Both text and speech with the size of each 600 were used. These corpus were split in to two, 90% for training and 10% testing data sets. Components of Hidden Markov Model and Amharic language features are studied. Though, every feature of the Amharic language was not considered since it needs a lot of time and deep linguistic knowledge. The utterance structure generated by festival and festvox together with the parameters extracted from the raw wave data were used for training the model. Formerly, the speech parameter sequence, which is generated based on the predicted models, is used to synthesis the speech waveform by a vocoder. In this research work the text that is going to be synthesized was assumed to be transcribed. Lastly, the synthesized speech is generated from the trained model based on the labeled input text. Evaluation is done in two ways. First, based on the researcher evaluation, the systems register on the overall performance 75.56% for syllable based and 77.78% for phone based system; Preference evaluation result shows that Syllable based synthesis performs better in naturalness than intelligibility while Phone based TTS performs better in intelligibility with 550 sentences’ training data. Second, the average MOS evaluation of the system from eight listeners for the five Amharic sentences is found to be 2.94 and 3.02 for phone based and syllable based, respectively. It shows that, Syllable based TTS system outperforms the system that uses phone as basic unit. According to the MOS results, the synthesis system is categorized as good in terms of both intelligibility and naturalness. The result looks encouraging and further improvement depends on proper works in different context such as phoneme coverage, lexicon, and question set.
Description: A thesis submitted to the School of Graduate Studies of Addis Ababa University in partial fulfillment of the requirements for the Degree of Master of Science in Information Science
Appears in Collections:Thesis - Information Science

Files in This Item:
File Description SizeFormat 
Bahiru Demessie Dubie.pdf2.67 MBAdobe PDFView/Open
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.