A Speaker Independent Text-To-Speech Synthesis (Tts) For Amharic Language Using Hidden Markov Model

No Thumbnail Available

Date

2018-01-04

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

this paper we present Hidden Markov Model (HMM)-based speaker independent Amharic Text-to-Speech system. Amharic, the common spoken language in Ethiopia and Federal Working Language, speaker-independent modeling methods were employed using HMM-based Text-to-Speech technique on a read speech database of 726 sentences uttered by 3 female and 3 male speaker with various speak-ing styles. Speech signals were sampled at a rate of 16 kHz and windowed by using a 25-ms Blackman window with a 5-ms shift, then 25 Mel-cepstral coefficients including the zeroth coefficient, the loga-rithm of the fundamental frequency, and its dynamic values (delta and delta-delta) coefficients obtained using Mel-cepstral analysis technique to model context-dependent phoneme HMMs. The speech infor-mation is first modeled by context Dependent HMMs, including: (1) spectral envelop and gain; (2) voiced/unvoiced and fundamental frequency; and (3) duration. The corresponding 3 state left to right HMMs was automatically trained by construct speaker independent model as initial model ,then speak-er Adaptive model is estimated by using speaker independent model using one male speech data. A de-cision-based clustering technique was applied in isolation to the distributions of Mel-cepstral, log F0, and state durations of context-dependent phoneme HMMs. Finally, to improve the voice quality, trajec-tory HMM and mixed excitation model was included by applying parameter generation algorithm based on ML using dynamic features to the Gaussian Mixture Model (GMM). Objective evaluation was conducted to evaluate the speaker Independent or Speaker Adaptation training demo (SAT) using spectral analysis and preference score, it treats the training data which consists of several speakers‘ speech as that of one speaker and makes no distinctions among the training speakers of the average voice model. in addition the voice conversion technique evaluate using subjective evaluation. In a test of subjective evaluation more than 60% of the speech generated from the voice conversion models us-ing the first 30 sentences is judged to almost the same score of SI models. Finally subjective mean opinion score (MOS) evaluation was conducted to evaluate the overall perfor-mance of the adapted models and developed system. The developed Amharic Speech Synthesizer at-tains 74% intelligibility and 70 % naturalness MOS result from fifty subjects‘ first language Amharic speakers. Besidesthe intelligibility test, we have performed a unit test on the text normalizer. The per-formance of text normalizer is 85% for Amharic numbers, punctuation marks and abbreviation

Description

Keywords

Language Using Hidden Markov Model

Citation