Speaker Dependent Speech Recognition for Sideman Language
No Thumbnail Available
Date
2010-07
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Speech recognition systems have been applicable in wide areas as various speech recognition
methodologies, techniques and tools have been developed and implemented to generate a natural
and intelligible speech. The main objective of this thesis is to explore the possibility of developing
prototype speaker dependent speech recognition for Sidaama language using Hidden Markov
Model(HMM). In order to come up with a working prototype model for the language extensive
study was conducted on the language to understand and come up with the language features
needed to build the model.
Additionally, the components as well as techniques used in the HMM based speech recognition
design were studied and analyzed to identify those components that are dependent on the
characteristics of the language. Besides the most commonly used speech recognition tools were
critically reviewed and as a result the most widely used Java based speech recognizer tool called
the Sphinx Systems was used to build the acoustic models as well as for testing the recognition
performance.
This research attempted to build context dependent triphone based isolated speech recognizer as
well as context independent monophone based isolated word speech recognizer models for
Sidaama language. A total of 450 unique words were selected and recorded in consultation with a
domain linguistic expert. Out of the total datasets, 300 of the recorded words were used for
training the acoustic models whereas the remaining 150 words were used for testing the
performances of the constructed acoustic models. In addition out of the 300 words used for training
the HMM acoustic model 100 words were randomly selected and used for testing constructed
models.
The performance of the context dependent triphone based model achieved 73% accuracy for 100
words selected among 300 words used for building the acoustic models whereas 68% accuracy is
obtained using 150 words which were not included in building the recognizer model. Similarly, the
context independent word based model achieved 69% accuracy for 100 words selected among the
300 words used for building the context dependent acoustic model where as 58% accuracy was
achieved using 150 words which were not used for building the acoustic model. As a result the
context dependent triphone based model is suggested to be appropriate for building speech
recognizer for Sidaama language. In conclusion the results obtained were encouraging and more
optimization works should be done in the future to improve the recognition performance.
Keywords: Automatic Speech Recognition, Sidaama Language, Sphinx System, HMM
Description
Keywords
Automatic Speech Recognition, Sidaama Language, Sphinx System, HMM