Amharic Question Answering for Definitional, Biographical and Description Questions

No Thumbnail Available

Date

2013-11

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

There are enormous amounts of Amharic text data on the World Wide Web. Since Question Answering (QA) can go beyond the retrieval of relevant documents, it is an option for efficient information access to such text data. The task of QA is to find the accurate and precise answer to a natural language question from a source text. The existing Amharic QA systems handle fact-based questions that usually take named entities as the answers. In this thesis, we focused on a different type of Amharic QA— Amharic non-factoid QA (NFQA) to deal with more complex information needs. The goal of this study is to propose approaches that tackle important problems in Amharic non-factoid QA, specifically in biography, definition, and description questions. The proposed QA system comprises of document preprocessing, question analysis, document analysis, and answer extraction components. Rule based and machine learning techniques are used for the question classification. The approach in the document analysis component retrieves relevant documents and filters the retrieved documents using filtering patterns for definition and description questions and for biography questions a retrieved document is only retained if it contains all terms in the target in the same order as in the question. The answer extraction component works in type-by-type manner. That is, the definition-description answer extractor extracts sentences using manually crafted answer extraction patterns. The extracted sentences are scored and ranked, and then the answer selection algorithm selects top 5 non-redundant sentences from the candidate answer set. Finally the sentences are ordered to keep their coherence. On the other hand, the biography answer extractor summarizes the filtered documents by merging them, and then the summary is displayed as an answer after it is validated. We evaluated our QA system in a modular fashion. The n fold cross validation technique is used to evaluate the two techniques utilized in the question classification. The SVM based classifier classifies about 83.3% and the rule based classifier classifies about 98.3% of the test questions correctly. The document retrieval component is tested on two data sets that are analyzed by a stemmer and morphological analyzer. The F-score on the stemmed documents is 0.729 and on the other data it set is 0.764. Moreover, the average F-score of the answer extraction component is 0.592. viii Keywords: Amharic definitional, biographical and description question answering, Rule based question classification, SVM based question classification, Document Analysis, Answer Extraction, Answer Selection.

Description

Keywords

Amharic Definitional, Biographical and Description Question Answering, Rule Based Question Classification, SVM Based Question Classification, Document Analysis, Answer Extraction, Answer Selection

Citation

Collections