Develop an Audio Search Engine for Amharic Speech Web Resources
No Thumbnail Available
Date
2019-10-10
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Most general purpose search engines like Google and Yahoo are designed bearing in mind the English language. As non-resource rich languages have been growing on the web, the number of online non-resource rich speakers is enormously growing. Amharic, which is a morphologically rich language that has strong impact on the effectiveness of information retrieval, is one of the non-resource rich languages with a rapidly growing content on the web in all forma of media like text, speech, and video. With increasing number of online radios, speech based reports and news, retrieving Amharic speech from the web is becoming a challenge that needs attention. As a result, the need to develop speech search engine that handles the specific characteristics of the users’ Amharic language query and retrieves Amharic languages speech web documents becomes more apparent.
In this research work, we develop an Audio Search Engine for Amharic speech Web Resources that enables web users for finding the speech information they need in Amharic languages. In doing so, we have enhanced the existing crawler for the Amharic speech web resources, transcribed the Amharic speech, indexed the transcribed speech and developed query preprocessing components for user text based query. As base line tools, We have used open source tools (JSpider, and Datafari) for web document crawling, parsing, indexing, ranking and retrieving and sphinx for speech recognition and transcription.
To evaluate the effectiveness of our Amharic speech search engine, precision/recall measures were conducted on the retrieved speech web documents. The experimental results showed that the Amharic speech retrieval engine performed 80% precision on the top 10 results and a recall of 92% of its corresponding retrieval engine. The overall evaluation results of the system are found to be promising.
Description
Keywords
Audio Search Engines, Audio Information Retrieval, Information Retrieval in Amharic Language, Speech Crawler, Amharic Speech Identification