Non-Uniform Sampling based Feature Extraction for Automatic Speech Recognition
No Thumbnail Available
Date
2012-02
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
In Automatic Speech Recognition (ASR) robustness to additive noise remains a large unsolved problem. As a result selecting a proper feature extraction method has been a key research area. So many feature extraction algorithms have been proposed that are designed specifically to have a low sensitivity to background noise. However, there are still some performance problems in noisy environments. This thesis is an attempt to develop a new feature extraction method based on a combination of non-uniform sampling and mel-frequency cepstrum coefficients (MFCCs) method since MFCC works very well under clean environment.
Non-Uniform sampling is used when fluctuations in sampling instants cannot be ignored or when signal samples can be obtained only at irregular or even random time intervals. It also sometimes deliberately introduced in order to see some useful effect such as the suppression of aliasing and to reduce the quantization noise which as a result improve the performance of Analog to Digital converter (ADC). Since improving ADC using non-uniform sampling method helps to increase the representation of the original signal in digital form and the non-uniformity of sampling as compared to uniform sampling efficiently improves the spectrogram which allows to determine true signal components at frequencies exceeding the half of mean sampling rate and also the fact that spectra of the non-uniform sampled signals are not uniform in frequency domain helps to represent the non-uniform spectral sensitivity of human hearing which might helps to autofocus on most reliable part of the spectrum in noisy cases, in this thesis we deliberately introduced non-uniform sampling in order to modify the front-end analyzer to better capture the speech information and incorporate the temporal characteristics in the feature set.
The step used for implementing the non-uniform sampling based speech recognition can be summarized by the following steps. The first step performs oversampling and end point detection. The second steps includes non-uniform sampling of the speech signal using sine-wave crossing method and speech segmentation by using short term temporal analysis. The third step includes finding the feature vectors using NU-MFCCs methods and vector quantizing (VQ) of the speech features.
ii
Finally, by using the means and variance of the feature calculated in VQ as an input the Gaussian Mixture Models (GMM) is used for classifier or modeling purpose. Experimental results show the average performance of the recognition system based on NU-MFCCs is around 92.18% under normal external surrounding (>35dB) and 51.27% under additive white Gaussian noise(AWGN) condition (between -5 and 35 dB) whereas in MFCC case, 92.36% and 42% respectively. But when the system was trained with a mixture of normal external surrounding and 10 dB SNR (AWGN) condition the average performance of NU-MFCCs went up to 74.84%. Similarly in MFCC case, it increased to 65.87%.
Keywords: Non-Uniform Sampling, MFCC, ASR, GMM
Description
Keywords
Non-Uniform Sampling, Mfcc, Asr, Gmm