Teteyeq (ተጠየቅ): Amharic Question Answering System for Factoid Questions
No Thumbnail Available
Date
2009-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Amharic documents on the Web are increasing as many newspaper publishers started their services
electronically. People were relying on IR systems to satisfy their information needs but it has been
criticized for lack of delivering “readymade” information to the user, so that the QA systems emerge
as best solution to get the required information to the user with the help of information extraction
techniques. QA systems in other languages have been extensively researched and have shown
reasonable outcomes, while it is the first work for Amharic. Amharic is a less-resourced language and
developing a QA system was not done before. A number of techniques and approaches were used in
developing the Amharic QA system. The language specific issues in Amharic are extensively studied
and hence, document normalization was found very crucial for the performance of our QA system.
Experiment has showed that documents normalized bear higher performance than the un-normalized
ones. A distinct technique was used to determine the question types, possible question focuses, and
expected answer types as well as to generate proper IR query, based on our language specific issue
investigations. An approach in document retrieval focused on retrieving three types of documents
(Sentence, paragraph, and file). The file based document retrieval is found more important than the
other two techniques, i.e., taking the advantages of concept distribution over sentences and less
populous answer particles found in a file based retrieval techniques. An algorithm has been developed
for sentence/paragraph re-ranking and answer selection. The named entity (gazetteer) and pattern
based answer pinpointing algorithms developed help locating possible answer particles in a document.
The evaluation of our system, being the first Amharic QA system, has shown promising performance.
The rule based question classification module classified about 89% of the question correctly. The
document retrieval component showed greater coverage of relevant document retrieval (97%) while
the sentence based retrieval has the least (93%) which contributed to the better recall of our system.
The gazetteer based answer selection using a paragraph answer selection technique answers 72% of the
questions correctly which can be considered as promising. The file based answer selection technique
exhibits better recall (0.909) which indicates that most relevant documents which are thought to have
the correct answer are returned. The pattern based answer selection technique has better accuracy for
person names using paragraph based answer selection technique while the sentence based answer
selection technique has outperformed in numeric and date question types. In general, our algorithms
and tools have shown good performance compared with high-resourced language QA systems such as
English.
Description
Keywords
Amharic, Question