Develop an Audio Search Engine for Amharic Speech Web Resources

No Thumbnail Available

Date

10/10/2019

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Most general purpose search engines like Google and Yahoo are designed bearing in mind the English language. As non-resource rich languages have been growing on the web, the number of online non-resource rich speakers is enormously growing. Amharic, which is a morphologically rich language that has strong impact on the effectiveness of information retrieval, is one of the non-resource rich languages with a rapidly growing content on the web in all forma of media like text, speech, and video. With increasing number of online radios, speech based reports and news, retrieving Amharic speech from the web is becoming a challenge that needs attention. As a result, the need to develop speech search engine that handles the specific characteristics of the users’ Amharic language query and retrieves Amharic languages speech web documents becomes more apparent. In this research work, we develop an Audio Search Engine for Amharic speech Web Resources that enables web users for finding the speech information they need in Amharic languages. In doing so, we have enhanced the existing crawler for the Amharic speech web resources, transcribed the Amharic speech, indexed the transcribed speech and developed query preprocessing components for user text based query. As base line tools, We have used open source tools (JSpider, and Datafari) for web document crawling, parsing, indexing, ranking and retrieving and sphinx for speech recognition and transcription. To evaluate the effectiveness of our Amharic speech search engine, precision/recall measures were conducted on the retrieved speech web documents. The experimental results showed that the Amharic speech retrieval engine performed 80% precision on the top 10 results and a recall of 92% of its corresponding retrieval engine. The overall evaluation results of the system are found to be promising.

Description

Keywords

Audio Search Engines, Audio Information Retrieval, Information Retrieval in Amharic Language, Speech Crawler, Amharic Speech Identification

Citation

Collections