Amharic Text-Prompted Speaker Verification Model

No Thumbnail Available

Date

2011-05

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Speaker Verification Is A Biometric Authentication Model That Takes Speech Signal As An Input To Verify A Claimed Speaker. A Speaker Verification Model Extracts Speaker Dependent Characteristics From The Speech Wave Signal So As To Create The Voiceprint Of The Speaker. The Researcher Has Implemented The Logic In An English Speaker Verification Model [45] For Amharic. But The Average Accuracy Obtained For Ten Amharic Words Is 92.93%. The Research Work Is Initiated From This Implementation And Its Subsequent Poor Accuracy Performance. In This Thesis Research Work, Amharic Text-Prompted Speaker Verification Model (ATPSVM) Is Designed And Implemented. The ATPSVM Model Applies Frame-Based Processing To The Speech Wave Signals So That All Samples In A Frame Are Processed Simultaneously. It Extracts Speaker Feature Vectors As Mel Frequency Cepstral Coefficients For Use In Speaker Model Construction. Then It Applies The Parameter Domain (Spectral) Normalization Followed By The Min-Max Normalization On The Speaker Feature Vectors So As To Scale The Feature Vector Values In [0, 1]. Finally, It Applies Support Vector Machine Kernel Functions For Modelling Each Speaker. For A Specific Amharic Word Prompted, It Utilizes One-Against-Each SVM Speaker Modeling Strategy To Maintain The Balance Of The Test Speaker Feature Vectors In The Mixed Features. The ATPSVM Model Prototype Is Evaluated Using Ten Amharic Words. Each Amharic Word Is Uttered Ten Times Repeatedly By 5 Men And 5 Women. So That A Total Of 100 Speech Wave Files Are Recorded From Each Speaker. One Utterance Is Iteratively Taken For Testing While The Remaining 9 Are Used For Training The Speaker On Leave-One-Out Basis. It Iteratively Takes One Utterance Of Each Speaker Against Other 9 Speakers For Testing. The Respective Amharic Text-Prompted Speaker Verification Model Page Xvii Utterance Of Other Speaker Is Taken As Impostor Data Set For The Same. The Remaining Respective 9 Utterances From The Two Speakers Are Taken As Training Data Sets. Thus For Each Amharic Word, The Model Is Evaluated Using 900 Training Data Sets, 900 Client Testing Data Sets And Another 900 Impostor Testing Data Sets. In Total, The ATPSVM Model Is Evaluated Using 9,000 Training Data Sets, 9,000 Client (Target) Testing Data Sets, And Another 9,000 Impostor Testing Data Sets. The Evaluation Of The Model Is Done By Varying The Threshold Theta Between 0.0 And 1.0 With 0.1 Differences. Performance Is Measured In Terms Of False Acceptance, False Rejection, True Acceptance And True Rejection. Then It Is Reported As Precision, Accuracy, Recall, False Acceptance Rate, False Rejection Rate And Equal Error Rate (EER). The ATPSVM Model Is Evaluated Using The SVM Linear, Gaussian Radial Basis Network (RBF), Multilayer Perceptron (MLP), And Polynomial Power 3 Kernel Functions. Best Performance Of The ATPSVM Model Is Obtained When The SVM Polynomial Power 3 Kernel Function Is Applied. The Performance Difference Between The Kernel Functions Follows From The Algorithmic Definition Of The Same. Using The SVM Polynomial Power 3 Kernel Function, For The Ten Amharic Words Experimented, An Average Performance Of 0.25% EER, 99.7% Accuracy, 99.8% Recall And 99.7% Precision Is Obtained. For The Same Kernel, The Performance Of The ATPSVM Model For Each Amharic Word Is Also Evaluated Separately. For 70% Of The Amharic Words Experimented, The Performance Of The Model Is 0.00% EER With 100% Accuracy, 100% Precision, And 100% Recall Values. For The Remaining 30%, Its Performance Is Slightly Lower Than These Values. By Selecting More Discriminative Amharic Words Of Similar Nature To The Seven Words, It Is Possible To Get The Desired Highest Performance From The ATPSVM Model. Keywords: Speaker Verification, Speaker Recognition, Biometric Authentication, Amharic Text-Prompted Speaker Verification, Text-Prompted Speaker Verification

Description

Keywords

Speaker Verification, Speaker Recognition, Biometric Authentication, Amharic Text-Prompted Speaker Verification, Text-Prompted Speaker Verification

Citation