Teteyeq (ተጠየቅ): Amharic Question Answering System for Factoid Questions

No Thumbnail Available

Date

2009-06

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Amharic documents on the Web are increasing as many newspaper publishers started their services electronically. People were relying on IR systems to satisfy their information needs but it has been criticized for lack of delivering “readymade” information to the user, so that the QA systems emerge as best solution to get the required information to the user with the help of information extraction techniques. QA systems in other languages have been extensively researched and have shown reasonable outcomes, while it is the first work for Amharic. Amharic is a less-resourced language and developing a QA system was not done before. A number of techniques and approaches were used in developing the Amharic QA system. The language specific issues in Amharic are extensively studied and hence, document normalization was found very crucial for the performance of our QA system. Experiment has showed that documents normalized bear higher performance than the un-normalized ones. A distinct technique was used to determine the question types, possible question focuses, and expected answer types as well as to generate proper IR query, based on our language specific issue investigations. An approach in document retrieval focused on retrieving three types of documents (Sentence, paragraph, and file). The file based document retrieval is found more important than the other two techniques, i.e., taking the advantages of concept distribution over sentences and less populous answer particles found in a file based retrieval techniques. An algorithm has been developed for sentence/paragraph re-ranking and answer selection. The named entity (gazetteer) and pattern based answer pinpointing algorithms developed help locating possible answer particles in a document. The evaluation of our system, being the first Amharic QA system, has shown promising performance. The rule based question classification module classified about 89% of the question correctly. The document retrieval component showed greater coverage of relevant document retrieval (97%) while the sentence based retrieval has the least (93%) which contributed to the better recall of our system. The gazetteer based answer selection using a paragraph answer selection technique answers 72% of the questions correctly which can be considered as promising. The file based answer selection technique exhibits better recall (0.909) which indicates that most relevant documents which are thought to have the correct answer are returned. The pattern based answer selection technique has better accuracy for person names using paragraph based answer selection technique while the sentence based answer selection technique has outperformed in numeric and date question types. In general, our algorithms and tools have shown good performance compared with high-resourced language QA systems such as English.

Description

Keywords

Amharic, Question

Citation