Triple Point Geometric Hashing based Audio Fingerprinting

dc.contributor.advisorSurafel, Lemma (PhD)
dc.contributor.authorEfriem, Desalew
dc.date.accessioned2020-10-09T09:54:57Z
dc.date.accessioned2023-11-04T15:14:42Z
dc.date.available2020-10-09T09:54:57Z
dc.date.available2023-11-04T15:14:42Z
dc.date.issued2020-06-09
dc.description.abstractAudio fingerprinting is a technique used for exact identification of an audio by extracting perceptually relevant audio features and transforming them into condensed reproducible formats. Different approaches are proposed to develop audio fingerprinting system. Based on their baseline assumption, these approaches can be grouped into three categories: Philips, Image Processing and Shazam approach. These audio fingerprinting systems, however, are not usually effective when the audio is distorted. Distortion in an audio might come from different modifications such as additive noise, speed change, pitch shifting, time stretching and others. Of these modifications, this thesis focuses on handling the problem of linear speed change in Shazam based audio fingerprinting system. Linear speed change is a common audio modification which occurs when the audio is played faster or slower with a constant rate. In this thesis, a Shazam based audio fingerprinting system which is robust to linear speed change is proposed. The proposed approach employs triple point geometric hashing to handle the effect of linear speed change on audio fingerprints. The proposed approach is evaluated using 29,600 query audios, and compared with the baseline work, Shazam and recent Shazam based work, Panako. Evaluation results show that the proposed approach is robust to linear speed change in a range from 30% to 22%. This is a significant improvement compared to Panako, which is robust to linear speed change between -12% to 6%, and Shazam which failed to handle 2% linear speed change. In addition to speed change, the proposed approach is evaluated in terms of robustness to additive noise, time stretching and pitch shifting. The results show that the proposed approach is robust to: i) additive noise in a range from -5dB to 20dB, comparable robustness is also exhibited by Shazam and Panako; ii) time stretching in a range from -10% to 8%. This is also an improvement compared to Shazam and Pankao, which are robust to time stretching between -4% to 4%; and, iii) pitch shifting in a range from -4% to 4%, which is comparable robustness with Panako, where Shazam failed to handle 2% pitch shifting.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/123456789/22636
dc.language.isoen_USen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectAudio Fingerprintingen_US
dc.subjectAudio Identificationen_US
dc.subjectGeometric Hashingen_US
dc.subjectLinear Speed Changeen_US
dc.titleTriple Point Geometric Hashing based Audio Fingerprintingen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Efriem Desalew.pdf
Size:
2.28 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: