Amharic Language Criminals Keyword Spotting Using Deep Learning Model
No Thumbnail Available
Date
2023-11
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
The task of automatic speech recognition (ASR) presents a significant challenge for many low-resource languages. Amharic is one of the Afro-Asiatic languages spoken in Ethiopia, and it is one of the languages that require special attention for speech recognition research. Amharic has distinct phonetics and supra-segmental features that make it distinct from other languages. Therefore, developing an accurate and efficient speech recognition system for Amharic requires specialized techniques.
In this thesis, we propose an Amharic language keyword spotting based on the integration of the Gramian Angular Field (GAF) representation with Convolutional Neural Network-Long Short-Term Memory Network (CNN-LSTM) architecture. Amharic Speech Commands (ASC) dataset is initially prepared. This ASC is prepared from a comprised of 30 individuals. A diverse set of 40 criminal keywords are selected and each selected criminal word is recorded 10 times from a single person. Ultimately a comprehensive dataset totaling of 12,000 audio files is prepared. The GAF transformation converts raw Amharic audio signals into visual representations, capturing both temporal and spatial patterns inherent in the data. The CNN-LSTM architecture combines the power of convolutional neural networks in learning spatial features from GAF images with the ability of LSTM networks to capture temporal dependencies in sequential data. To evaluate the effectiveness of the proposed model, we compiled a carefully annotated Amharic keyword dataset and conducted a systematic search for optimal hyperparameters. The model demonstrated 96.73%, and 89.98%, training and testing accuracy respectively.
In conclusion, the combination of GAF-based representation with the CNN-LSTM model showcases the potential of deep learning techniques in tackling the unique challenges of Amharic language keyword spotting. However, this study does not leverage transfer learning and pre-training approaches on larger multilingual datasets. Future researchers will actively contribute to the advancement of speech technology, specifically focusing on keyword spotting for diverse and large linguistic communities.
Description
Keywords
Amharic Language, Keyword Spotting, Gramian Angular Field (GAF), Convolutional Neural Network (CNN), Long Short-Term Memory Network (LSTM)