Triple Point Geometric Hashing based Audio Fingerprinting

No Thumbnail Available

Date

2020-06-09

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Audio fingerprinting is a technique used for exact identification of an audio by extracting perceptually relevant audio features and transforming them into condensed reproducible formats. Different approaches are proposed to develop audio fingerprinting system. Based on their baseline assumption, these approaches can be grouped into three categories: Philips, Image Processing and Shazam approach. These audio fingerprinting systems, however, are not usually effective when the audio is distorted. Distortion in an audio might come from different modifications such as additive noise, speed change, pitch shifting, time stretching and others. Of these modifications, this thesis focuses on handling the problem of linear speed change in Shazam based audio fingerprinting system. Linear speed change is a common audio modification which occurs when the audio is played faster or slower with a constant rate. In this thesis, a Shazam based audio fingerprinting system which is robust to linear speed change is proposed. The proposed approach employs triple point geometric hashing to handle the effect of linear speed change on audio fingerprints. The proposed approach is evaluated using 29,600 query audios, and compared with the baseline work, Shazam and recent Shazam based work, Panako. Evaluation results show that the proposed approach is robust to linear speed change in a range from 30% to 22%. This is a significant improvement compared to Panako, which is robust to linear speed change between -12% to 6%, and Shazam which failed to handle 2% linear speed change. In addition to speed change, the proposed approach is evaluated in terms of robustness to additive noise, time stretching and pitch shifting. The results show that the proposed approach is robust to: i) additive noise in a range from -5dB to 20dB, comparable robustness is also exhibited by Shazam and Panako; ii) time stretching in a range from -10% to 8%. This is also an improvement compared to Shazam and Pankao, which are robust to time stretching between -4% to 4%; and, iii) pitch shifting in a range from -4% to 4%, which is comparable robustness with Panako, where Shazam failed to handle 2% pitch shifting.

Description

Keywords

Audio Fingerprinting, Audio Identification, Geometric Hashing, Linear Speed Change

Citation