Text to Speech Synthesizer for Afaan Oromoo Using Statistical Parametric Speech Synthesis
No Thumbnail Available
Date
2020-06-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Speech synthesis systems are concerned with generating a natural sounding and intelligible
speech by taking text as input. Speech synthesizers are very essential in helping impaired
people, in teaching and learning process, for telecommunications and industries.
Nevertheless, it has been a lot of challenging such as text processing, grapheme to phoneme
and modeling prosody for years. Text preprocessing includes tokenization and normalization
and then converting the grapheme representation of sounds to their phonetic representation
and modeling prosodic features of various speaking styles. To address these challenges,
different techniques have been studied and implemented. Speech synthesizers using statistical
parametric speech based on hidden Markov model (HMM) are done for foreign languages
which are not applicable for Afaan Oromoo language since the Afaan Oromoo languageās
special characteristics are not considered in foreign synthesizers. Statistical parametric speech
synthesis based on HMM techniques is chosen for these research because it is a model based
that require less storage, it learn properties of data rather store the speech, small run time, and
easy to integrate with small handheld devices. The Afaan Oromoo text to speech synthesis
system has been developed using statistical parametric speech synthesis based on a hidden
Markov model. The synthesizer has two main components: training and testing phases. In the
training phase, source and excitation parameters of the speech are extracted from speech
database. The speech and phonetic transcriptions are automatically segmented using EHMM
labeling. During testing phase, the input text is processed to form phonetic strings along with
the trained models. Finally, the synthesized speech is generated from speech parameters. In
order to train the system being developed, we collected four hundred sentences and speeches.
Additionally, we used ten sentences to test the performance of the system. In this study, the
subjective Mean Opinion Score (MOS) and objective Mel Cepstral Distortion (MCD)
evaluation techniques are used. The subjective results obtained using the mean opinion score
(MOS) is 4.3 and 4.1 in terms of the intelligibility and naturalness of the synthesized speech
respectively. The objective result obtained using mean opinion score is 6.8 out of 8 which is
encouraging.
Description
Keywords
Statistical Parameter Speech Synthesis, Text To Speech, Afaan Oromoo, Hidden Markov Model Based Speech Synthesis