Amharic Text-Prompted Speaker Verification Model
No Thumbnail Available
Date
2011-05
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Speaker Verification Is A Biometric Authentication Model That Takes Speech
Signal As An Input To Verify A Claimed Speaker. A Speaker Verification Model
Extracts Speaker Dependent Characteristics From The Speech Wave Signal So As To
Create The Voiceprint Of The Speaker. The Researcher Has Implemented The Logic In
An English Speaker Verification Model [45] For Amharic. But The Average Accuracy
Obtained For Ten Amharic Words Is 92.93%. The Research Work Is Initiated From
This Implementation And Its Subsequent Poor Accuracy Performance. In This Thesis
Research Work, Amharic Text-Prompted Speaker Verification Model (ATPSVM) Is
Designed And Implemented.
The ATPSVM Model Applies Frame-Based Processing To The Speech Wave
Signals So That All Samples In A Frame Are Processed Simultaneously. It Extracts
Speaker Feature Vectors As Mel Frequency Cepstral Coefficients For Use In Speaker
Model Construction. Then It Applies The Parameter Domain (Spectral) Normalization
Followed By The Min-Max Normalization On The Speaker Feature Vectors So As To
Scale The Feature Vector Values In [0, 1]. Finally, It Applies Support Vector Machine
Kernel Functions For Modelling Each Speaker. For A Specific Amharic Word
Prompted, It Utilizes One-Against-Each SVM Speaker Modeling Strategy To
Maintain The Balance Of The Test Speaker Feature Vectors In The Mixed Features.
The ATPSVM Model Prototype Is Evaluated Using Ten Amharic Words. Each
Amharic Word Is Uttered Ten Times Repeatedly By 5 Men And 5 Women. So That A
Total Of 100 Speech Wave Files Are Recorded From Each Speaker.
One Utterance Is Iteratively Taken For Testing While The Remaining 9 Are Used
For Training The Speaker On Leave-One-Out Basis. It Iteratively Takes One
Utterance Of Each Speaker Against Other 9 Speakers For Testing. The Respective
Amharic Text-Prompted Speaker Verification Model Page Xvii
Utterance Of Other Speaker Is Taken As Impostor Data Set For The Same. The
Remaining Respective 9 Utterances From The Two Speakers Are Taken As Training
Data Sets.
Thus For Each Amharic Word, The Model Is Evaluated Using 900 Training Data
Sets, 900 Client Testing Data Sets And Another 900 Impostor Testing Data Sets. In Total, The
ATPSVM Model Is Evaluated Using 9,000 Training Data Sets, 9,000 Client (Target) Testing
Data Sets, And Another 9,000 Impostor Testing Data Sets. The Evaluation Of The Model Is
Done By Varying The Threshold Theta Between 0.0 And 1.0 With 0.1 Differences.
Performance Is Measured In Terms Of False Acceptance, False Rejection, True
Acceptance And True Rejection. Then It Is Reported As Precision, Accuracy, Recall,
False Acceptance Rate, False Rejection Rate And Equal Error Rate (EER). The
ATPSVM Model Is Evaluated Using The SVM Linear, Gaussian Radial Basis Network
(RBF), Multilayer Perceptron (MLP), And Polynomial Power 3 Kernel Functions.
Best Performance Of The ATPSVM Model Is Obtained When The SVM Polynomial
Power 3 Kernel Function Is Applied. The Performance Difference Between The
Kernel Functions Follows From The Algorithmic Definition Of The Same.
Using The SVM Polynomial Power 3 Kernel Function, For The Ten Amharic
Words Experimented, An Average Performance Of 0.25% EER, 99.7% Accuracy,
99.8% Recall And 99.7% Precision Is Obtained. For The Same Kernel, The Performance Of
The ATPSVM Model For Each Amharic Word Is Also Evaluated Separately. For 70% Of The
Amharic Words Experimented, The Performance Of The Model Is 0.00% EER With 100%
Accuracy, 100% Precision, And 100% Recall Values. For The Remaining 30%, Its
Performance Is Slightly Lower Than These Values. By Selecting More Discriminative
Amharic Words Of Similar Nature To The Seven Words, It Is Possible To Get The
Desired Highest Performance From The ATPSVM Model.
Keywords: Speaker Verification, Speaker Recognition, Biometric Authentication,
Amharic Text-Prompted Speaker Verification, Text-Prompted Speaker Verification
Description
Keywords
Speaker Verification, Speaker Recognition, Biometric Authentication, Amharic Text-Prompted Speaker Verification, Text-Prompted Speaker Verification