Concatenative Text-To-Speech System for Afaan Oromo Language

Addis Ababa University


This paper explores the possibility of developing a concatenative TTS system for Afaan Oromo language where diphone and triphones are the speech units that are focused on. Nowadays, concatenative method is used in most modern TTS systems to produce synthesized speech. But in concatenative method, selecting an appropriate unit for creating a database is a challenging task. In the proposed approach, such database is created with different sizes of speech units and is used to produce speech utterances which include diphones and triphones. For the synthesis process, diphones and triphones which are smaller speech units are used to achieve unlimited vocabulary of speech. During the process, a diphone database consisting of 800 entries and a triphone database with entries 1982 is constructed. The synthesizer is then evaluated for its performance measure, naturalness and intelligence by six individuals from the language domain. The experimental results show that 75% and 54% of words in the data set are correctly pronounced as to the diphone and triphone speech units, respectively. The MOS levels of the intelligence of the system also showed that a 3.03 and 2.2 scale levels were achieved for the diphone and triphones respectively; whereas the naturalness of the system was 2.65 and 2.02 for each speech units respectively. The removal of many triphone speech unitsthat can increase the time complexity of the system and those that don’t represent the language can be mentioned as the main reason behind the low result of the triphones as compared to the diphones. In fact, the values gained for the triphones has shown an increase from 2.2 to 2.23 and then to 2.27 for the measured systems intelligence and from 2.02 to 2.05 then to 2.08 for naturalness of the system when some of the removed entries are added to the database. The result obtained indeed is a promising result; for which accordingly, future research directions are proposed to improve the performance of the system. Key words: Speech Synthesis, Concatenative methods, Festival, Afaan Oromo



