dc.description.abstract |
A text-to-speech (TTS) synthesis converts natural language text into speech.
However, written text of a language contains both standard words (SWs) and nonstandard
words (NSWs) like numbers, abbreviations, synonyms, currency, and
dates. These NSWs cannot be detected by an application of “letter-to-sound” rule.
This study describes generalized Amharic Text-To-Speech (TTS) synthesis, which
attempt to handle both Amharic SWs and NSWs. The system is developed using
speech synthesis framework of Festival, based on diphone unit concatenative
synthesis by applying RELP coding technique. The model described in this work has
two major parts: Natural language processing (NLP) and Digital language processing
(DSP). The NLP handles the text analysis (transcription of the input SWs and NSWs)
and extraction of the speech parameters. The DSP further enable to generate the
artificial speech. Finally, the performance of the system shows that on the average
73.35% words both SWs and NSWs correctly pronounced. In addition, an
assessment of intelligibility and naturalness of synthesized speech using MOS
testing techniques results a score of 3 and 2.83, respectively. The experiment shows
a promising result to design an applicable system that synthesis both SWs and
NSWs for unrestricted text of a language. But, still there are areas need further
investigations. Thoughtfulness of all type of NSWs and those ambiguities found in
NSWs, while in test analysis block, using statistical technique to handle them based
on their context. In addition, construction of part of speech POS tag-sets, tagger and
tagged corpus for prosody analysis are also some areas that need further devotions.
Keywords: Diphone concatenation, Speech Synthesis, (NSWs), RELP coding |
en_US |