Concatenative Text-To-Speech System for Afaan Oromo Language
No Thumbnail Available
Date
2011-05
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
This paper explores the possibility of developing a concatenative TTS system for Afaan
Oromo language where diphone and triphones are the speech units that are focused on.
Nowadays, concatenative method is used in most modern TTS systems to produce
synthesized speech. But in concatenative method, selecting an appropriate unit for
creating a database is a challenging task. In the proposed approach, such database is
created with different sizes of speech units and is used to produce speech utterances
which include diphones and triphones. For the synthesis process, diphones and triphones
which are smaller speech units are used to achieve unlimited vocabulary of speech.
During the process, a diphone database consisting of 800 entries and a triphone database
with entries 1982 is constructed. The synthesizer is then evaluated for its performance
measure, naturalness and intelligence by six individuals from the language domain.
The experimental results show that 75% and 54% of words in the data set are correctly
pronounced as to the diphone and triphone speech units, respectively. The MOS levels of
the intelligence of the system also showed that a 3.03 and 2.2 scale levels were achieved for
the diphone and triphones respectively; whereas the naturalness of the system was 2.65
and 2.02 for each speech units respectively. The removal of many triphone speech unitsthat can increase the time complexity of the system and those that don’t represent the
language can be mentioned as the main reason behind the low result of the triphones as
compared to the diphones.
In fact, the values gained for the triphones has shown an increase from 2.2 to 2.23 and then
to 2.27 for the measured systems intelligence and from 2.02 to 2.05 then to 2.08 for
naturalness of the system when some of the removed entries are added to the database.
The result obtained indeed is a promising result; for which accordingly, future research
directions are proposed to improve the performance of the system.
Key words: Speech Synthesis, Concatenative methods, Festival, Afaan Oromo
Description
Keywords
Speech Synthesis, Concatenative methods, Festival, Afaan Oromo