Triple Point Geometric Hashing based Audio Fingerprinting

Efriem, Desalew

Triple Point Geometric Hashing based Audio Fingerprinting

Files

Efriem Desalew.pdf (2.28 MB)

Date

2020-06-09

Authors

Efriem, Desalew

Publisher

Addis Ababa University

Abstract

Audio ﬁngerprinting is a technique used for exact identiﬁcation of an audio by extracting perceptually relevant audio features and transforming them into condensed reproducible formats. Different approaches are proposed to develop audio ﬁngerprinting system. Based on their baseline assumption, these approaches can be grouped into three categories: Philips, Image Processing and Shazam approach. These audio ﬁngerprinting systems, however, are not usually effective when the audio is distorted. Distortion in an audio might come from different modiﬁcations such as additive noise, speed change, pitch shifting, time stretching and others. Of these modiﬁcations, this thesis focuses on handling the problem of linear speed change in Shazam based audio ﬁngerprinting system. Linear speed change is a common audio modiﬁcation which occurs when the audio is played faster or slower with a constant rate. In this thesis, a Shazam based audio ﬁngerprinting system which is robust to linear speed change is proposed. The proposed approach employs triple point geometric hashing to handle the effect of linear speed change on audio ﬁngerprints. The proposed approach is evaluated using 29,600 query audios, and compared with the baseline work, Shazam and recent Shazam based work, Panako. Evaluation results show that the proposed approach is robust to linear speed change in a range from 30% to 22%. This is a signiﬁcant improvement compared to Panako, which is robust to linear speed change between -12% to 6%, and Shazam which failed to handle 2% linear speed change. In addition to speed change, the proposed approach is evaluated in terms of robustness to additive noise, time stretching and pitch shifting. The results show that the proposed approach is robust to: i) additive noise in a range from -5dB to 20dB, comparable robustness is also exhibited by Shazam and Panako; ii) time stretching in a range from -10% to 8%. This is also an improvement compared to Shazam and Pankao, which are robust to time stretching between -4% to 4%; and, iii) pitch shifting in a range from -4% to 4%, which is comparable robustness with Panako, where Shazam failed to handle 2% pitch shifting.

Keywords

Audio Fingerprinting, Audio Identiﬁcation, Geometric Hashing, Linear Speed Change

URI

http://etd.aau.edu.et/handle/123456789/22636

Collections

Computer Engineering

Full item page

Triple Point Geometric Hashing based Audio Fingerprinting

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections