A Large Vocabulary, Speaker-Independent, Continuous Speech Recognition System for Afaan Oromo: Using Broadcast News Speech Corpus

No Thumbnail Available

Date

2016-10

Journal Title

Journal ISSN

Volume Title

Publisher

A.A.U

Abstract

Speech is a common mode of communication among human beings. Since human beings desired to communicate with machine using speech, developing automatic speech recognition system was the interesting research area. Hence, the main objective of this study was to explore the possibilities of developing a large vocabulary, speaker-independent, continuous speech recognition system for Afaan Oromo using broadcast news speech corpus. The statistical (stochastic) approach and Hidden Markov Model (HMM) modelling techniques were used in this study. Also tools like HTK, Audacity, and SRILM were also used. The speech corpus collected from different sources like: Oromia Radio and Television Organization (ORTO), Voice of America Afaan Oromo program (VOA), and Fana Broadcasting Corporate (FBC). In general, a broadcast speech corpus consisting of, 2953 utterances (about 6 hours speech) were prepared from 57 speakers (42 males and 15 females). Out of 2953 utterances, 2653 were used for training and the remaining 300 utterances prepared from 12 speakers (9 males and 3 females) which are about 40 minutes long were used for testing the developed speech recognizer. Because of the fact that our speech recognizer system is speaker independent, all speakers who are involved in testing were not involved in training. In addition, a text corpus that is required for language modelling is collected from Bariisaa Afaan Oromo newspaper and bigram language model was developed using the SRILM language modelling tool. Both context independent (mono-phones based) and context dependent (tri-phones based) acoustic models have been developed. Then, the best performance we have obtained in terms of word error rate was 91.46% WER and 89.84% WER, for context-independent and context-dependent, respectively. Based on the findings and lessons learnt, the researcher concluded that increasing the Gaussian number to 12 and tuning parameters for word insertion penalty 1.0 and grammar scale factors to 15.0 can improve the performance of the system.

Description

A Thesis Submitted to the School of Graduate Studies of Addis Ababa University in Partial Fulfillment of The Requirements for the Degree of Master Of Science in Information Science

Keywords

Afaan Oromo, Automatic speech recognition, Broadcast news

Citation