Syllable-based Text-to- Speech Synthesis (tts) for Amharic

dc.contributor.advisorTeferi, Dereje(PhD)
dc.contributor.authorShiferaw, Mulat
dc.date.accessioned2018-11-28T06:01:22Z
dc.date.accessioned2023-11-18T12:44:06Z
dc.date.available2018-11-28T06:01:22Z
dc.date.available2023-11-18T12:44:06Z
dc.date.issued2012-06
dc.description.abstractThe goal of Text-to-Speech synthesis is to convert arbitrary input text to intelligible and natural sounding speech so as to transmit information from a machine to a person. In speech synthesis, the capability of information extraction is crucial in producing high quality synthesized speech. This paper describes the design of a syllable based concatenative speech waveform synthesizer for Amharic language using TD-PSOLA algorithm for the prosodic modification and speech waveform analysis/synthesis purpose. This approach is based on the decomposition of the signal into overlapping frames synchronized with the pitch period. In concatenative corpus-based TTS systems, the acoustic units of varying sizes are selected from a large speech corpus and then concatenated to produce speech waveforms. The speech corpus contains more than one instance of each unit to capture prosodic and spectral variability found in natural speech; hence the signal modifications needed on the selected units are minimized if an appropriate unit is found in the unit inventory. A syllable unit is chosen primarily because Amharic language is syllable centred; Consonant-Vowel (CV) assimilated language. The unique syllable units are then added to a syllable repository. Further, concatenation at syllable boundaries can lead to smaller error owing to the spectrum being similar across different syllable boundaries. Syllable based approach to speech processing is an interesting alternative to the diphone (triphone) - based approach, especially for the syllable-timed languages, Amharic. The system was implemented and tested using selected Amharic texts found in the language Amharic. The result gives 97.8% of word accuracy rate for automatic syllabification, which leads to improve prosody and synthesis models as well as speech waveform generation and an average score of 89.58% and 3.45 for ORT and MOS respectively based on the subjective assessment of users‟ for intelligibility and naturalness of the synthesized speech respectively. Subjective listening tests performed on the synthesized speech there is an improvement of in the quality of synthesised speech.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/12345678/14577
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectText-to-speech,en_US
dc.subjectconcatenative synthesis,en_US
dc.subjectsyllable,en_US
dc.subjectTD-PSOLA,en_US
dc.subjectCV-assimilated,en_US
dc.subjectprosodic modification,en_US
dc.subjectunit selectionen_US
dc.titleSyllable-based Text-to- Speech Synthesis (tts) for Amharicen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Mulat Shiferaw.pdf
Size:
3 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: