Amharic-English Bilingual Search Engine
No Thumbnail Available
Date
2010-11
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
As non-English languages have been growing exponentially on the Web with the expansion of
multilingual World Wide Web, the number of online non-English speakers who realizes the
importance of finding information in different languages is enormously growing. However, the
major general purpose search engines such as Google, Yahoo, etc have been lagging behind in
providing indexes and search features to handle non-English languages. Hence, documents that
are published in non-English languages are more likely to be missed or improperly indexed by
major search engines. Amharic, which is the family of Semitic languages and the official
working language of the federal government of Ethiopia, is one of these languages with a rapidly
growing content on the Web. As a result, the need to develop bilingual search engine that
handles the specific characteristics of the users’ native language query (Amharic) and retrieves
documents in both Amharic and English languages becomes more apparent.
In this research work, we designed a model for an Amharic-English Search Engine and
developed a bilingual Web search engine based on the model that enables Web users for finding
the information they need in Amharic and English languages. In doing so, we have identified
different language dependent query preprocessing components for query translation. We have
also developed a bidirectional dictionary-based translation system which incorporates a
transliteration component to handle proper names which are often missing in bilingual lexicons.
We have used an Amharic search engine and an open source English search engine (Nutch) as
our underlying search engines for Web document crawling, indexing, searching, ranking and
retrieving.
To evaluate the effectiveness of our Amharic-English bilingual search engine, precision
measures were conducted on the top 10 retrieved Web documents. The experimental results
showed that the Amharic-English cross-lingual retrieval engine performed 74.12% of its
corresponding English monolingual retrieval engine and the English-Amharic cross-lingual
retrieval engine performed 78.82% of its corresponding Amharic monolingual retrieval engine.
The bilingualism advantage of the system is also evaluated by comparing its results with general
purpose search engines. The overall evaluation results of the system are found to be promising.
Key Words: Bilingual search engines, cross-lingual information retrieval, query preprocessing,
query translation, transliteration.
Description
Keywords
Bilingual Search Engines; Cross-Lingual Information Retrieval; Query Preprocessing; Query Translation; Transliteration