Audio-Visual Speech Recognition Using Lip Movement for Amharic  Language

Belete, Befkadu

Audio-Visual Speech Recognition Using Lip Movement for Amharic Language

dc.contributor.advisor	Assabie, Yaregal (PhD)
dc.contributor.author	Belete, Befkadu
dc.date.accessioned	2019-08-19T11:51:43Z
dc.date.accessioned	2023-11-29T04:06:01Z
dc.date.available	2019-08-19T11:51:43Z
dc.date.available	2023-11-29T04:06:01Z
dc.date.issued	2017-10-05
dc.description.abstract	Automatic Speech Recognition (ASR) is a technology that allows a computer to identify the words that a person speaks into a microphone or telephone and convert it to a written text. In recent years, there have been many advances in automatic speech reading system with the inclusion of visual speech features to improve recognition accuracy under noisy conditions. By identifying lip movements and characterizing their associations with speech sounds, the performance of speech recognition systems can be improved, particularly when operating in noisy environments. The aim of this study is to design and develop automatic audio-visual Amharic speech recognition using lip reading. In this study, for face and mouth detection we use Viola-Jones object recognizer called haarcascade face detection and haarcascade mouth detection respectively, after the mouth detection ROI is extracted. Extracted ROI is used as an input for visual feature extraction. DWT is used for visual feature extraction and LDA is used to reduce visual feature vector. For audio feature extraction, we use MFCC. Integration of audio and visual features are done by decision fusion. As a result of this, we used three classifiers. The first one is the HMM classifier for audio only speech recognition, the second one is HMM classifier for visual only speech recognition and the third one is CHHM for audio- visual integration. In this study, we used our own data corpus called AAVC. We evaluated our audio-visual recognition system with two different sets: speaker dependent and speaker independent. We used those two evaluation sets for both phone (vowels) and isolated word recognition. For speaker dependent dataset, we found an overall word recognition of 60.42% for visual only, 65.31% for audio only and 70.1 % for audio-visual. We also found an overall vowels (phone) recognition of 71.45% for visual only, 76.34% for audio only and 83.92 % for audio-visual speech. For speaker independent dataset, we got an overall word recognition of 61% for visual only, 63.54% for audio only and 67.08% for audio-visual. The overall vowel (phone) recognition on the speaker independent dataset is 68.04% for visual only, 71.96% for audio only and 76.79 % for audio-visual speech.	en_US
dc.identifier.uri	http://etd.aau.edu.et/handle/123456789/18803
dc.language.iso	en	en_US
dc.publisher	Addis Ababa University	en_US
dc.subject	Amharic	en_US
dc.subject	Lip-Reading	en_US
dc.subject	Visemes	en_US
dc.subject	Appearance-Based Feature	en_US
dc.subject	Dwt	en_US
dc.subject	Aavc	en_US
dc.title	Audio-Visual Speech Recognition Using Lip Movement for Amharic Language	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Befkadu Belete 2017.pdf
Size:: 3.26 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Environmental Science