Amharic Speech Search Using Text Word Query

No Thumbnail Available

Date

2022-01-20

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

In a world where more than 7000 languages are spoken and the processing and storage power of machines are maximized, many speech data are produced every day using different languages. Amharic is one of the languages spoken in the East African country, Ethiopia. Searching for a particular spoken word with its respective time frame inside a given audio file is a challenge. The main objective of this research was to investigate the development of a system that can search speech and locate utterance with its respective time from the Amharic audio file by using Amharic text word query. In this study, the researchers followed an experimental research methodology. To meet the research objective, we have conducted an experiment to get the optimal segmentation for which we can achieve the lowest WER of the ASR system that decodes the segmented speech. We have also experimented on the use of previously developed speech corpus, which is in a broadcast domain, together with the in-domain speech corpus, which is the Bible domain; we have developed for our research. The performance of the ASR obtained by combining the two different domains shows a better WER than that of using only LVCSR. On the other hand, the comparison of automatically segmented speech with automatic sentence-like segmentations shows closer WER with the manually (by hand) segmented speech. Using the optimal automatic segmentation and LVCSR, the researchers developed a text-based STD which can locate the time interval upon which the query term is located. The text-based STD was developed with ASR having a WER of 53% and 46 % using LVCSR and by combining LVCSR and Bible speech respectively. The developed STD has a Graphical User Interface (GUI) which will make searching easy to use and friendly. We found that the performance of ASR affects the performance of STD since not all terms are fully transcribed.

Description

Keywords

Speech Segmentation, Spoken Term Detection, Automatic Speech Recognition, Manual Speech Segmentation

Citation