Bidirectional Long-short Term Memory Based Text to Speech Synthesis for Amharic Language

Awel, Mahlet

Bidirectional Long-short Term Memory Based Text to Speech Synthesis for Amharic Language

dc.contributor.advisor	Assabie, Yaregal (PhD)
dc.contributor.author	Awel, Mahlet
dc.date.accessioned	2021-04-13T08:38:11Z
dc.date.accessioned	2023-11-29T04:06:27Z
dc.date.available	2021-04-13T08:38:11Z
dc.date.available	2023-11-29T04:06:27Z
dc.date.issued	2020-12-07
dc.description.abstract	Text-to-speech (TTS) synthesis is the automatic conversion of written text to spoken language. TTS systems show an imperative character in natural human-computer interaction. The aim of this work is to develop a bidirectional long-short term based TTS system for the Amharic Language. The system has two phases, the training and synthesis phases. In the training phase, first the text normalization is done and then from the normalized text linguistic features are extracted by using festival tool and the extracted features are used as input for the BLSTM based duration model. Then after that, duration model training is done and the model adds duration information on the extracted linguistic features and feeds for the BLSTM based acoustic model. The world vocoder extracts many acoustic frames composed of features which describe the signal in a more convenient way and used as an input for the acoustic model. Aco5ustic model training is done to map the input linguistic features and the associated duration features into acoustic features. We have prepared 600 speech their corresponding text transcription from Amharic audio bible by a male speaker. For this work the open source merlin speech synthesis toolkit, festival speech synthesis tool as a frontend and world vocoder are used. We have also prepared a pronunciation dictionary (lexicon) of 2500 words, phone set, letter to sound rule and question file set for frontend text processing based on the phonetic structure of Amharic language. In order to test the performance of the system we have performed subjective and objective evaluation. The evaluation with a listening test by 10 volunteers gave a score in MOS of 3.8 for intelligibility and 3.9 for naturalness to our BLSTM model and 3.65 for intelligibility and 3.7 for naturalness to our DNN model and MCD of BLSTM and DNN is 4.68 and 4.7 respectively.	en_US
dc.identifier.uri	http://etd.aau.edu.et/handle/123456789/26113
dc.language.iso	en	en_US
dc.publisher	Addis Ababa University	en_US
dc.subject	Deep Learning	en_US
dc.subject	Recurrent Neural Networks	en_US
dc.subject	Long-Short Term Memory	en_US
dc.subject	Duration Model	en_US
dc.subject	Acoustic Model	en_US
dc.subject	Vocoder	en_US
dc.subject	Linguistic Features	en_US
dc.subject	Acoustic Features	en_US
dc.title	Bidirectional Long-short Term Memory Based Text to Speech Synthesis for Amharic Language	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Mahlet Awel 2020.pdf
Size:: 1.79 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Environmental Science