Triple Point Geometric Hashing based Audio Fingerprinting
No Thumbnail Available
Date
2020-06-09
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Audio fingerprinting is a technique used for exact identification of an audio by extracting
perceptually relevant audio features and transforming them into condensed reproducible
formats. Different approaches are proposed to develop audio fingerprinting
system. Based on their baseline assumption, these approaches can be grouped into
three categories: Philips, Image Processing and Shazam approach. These audio fingerprinting
systems, however, are not usually effective when the audio is distorted.
Distortion in an audio might come from different modifications such as additive noise,
speed change, pitch shifting, time stretching and others. Of these modifications, this
thesis focuses on handling the problem of linear speed change in Shazam based audio
fingerprinting system. Linear speed change is a common audio modification which
occurs when the audio is played faster or slower with a constant rate. In this thesis,
a Shazam based audio fingerprinting system which is robust to linear speed change is
proposed. The proposed approach employs triple point geometric hashing to handle
the effect of linear speed change on audio fingerprints.
The proposed approach is evaluated using 29,600 query audios, and compared with
the baseline work, Shazam and recent Shazam based work, Panako. Evaluation results
show that the proposed approach is robust to linear speed change in a range from 30%
to 22%. This is a significant improvement compared to Panako, which is robust
to linear speed change between -12% to 6%, and Shazam which failed to handle 2%
linear speed change. In addition to speed change, the proposed approach is evaluated
in terms of robustness to additive noise, time stretching and pitch shifting. The results
show that the proposed approach is robust to: i) additive noise in a range from -5dB to
20dB, comparable robustness is also exhibited by Shazam and Panako; ii) time stretching
in a range from -10% to 8%. This is also an improvement compared to Shazam and
Pankao, which are robust to time stretching between -4% to 4%; and, iii) pitch shifting
in a range from -4% to 4%, which is comparable robustness with Panako, where
Shazam failed to handle 2% pitch shifting.
Description
Keywords
Audio Fingerprinting, Audio Identification, Geometric Hashing, Linear Speed Change