Amharic Question Answering for Definitional, Biographical and Description Questions

Abedissa, Tilahun

Amharic Question Answering for Definitional, Biographical and Description Questions

dc.contributor.advisor	Libsie, Mulugeta (PhD)
dc.contributor.author	Abedissa, Tilahun
dc.date.accessioned	2018-06-26T05:42:22Z
dc.date.accessioned	2023-11-29T04:05:41Z
dc.date.available	2018-06-26T05:42:22Z
dc.date.available	2023-11-29T04:05:41Z
dc.date.issued	2013-11
dc.description.abstract	There are enormous amounts of Amharic text data on the World Wide Web. Since Question Answering (QA) can go beyond the retrieval of relevant documents, it is an option for efficient information access to such text data. The task of QA is to find the accurate and precise answer to a natural language question from a source text. The existing Amharic QA systems handle fact-based questions that usually take named entities as the answers. In this thesis, we focused on a different type of Amharic QA— Amharic non-factoid QA (NFQA) to deal with more complex information needs. The goal of this study is to propose approaches that tackle important problems in Amharic non-factoid QA, specifically in biography, definition, and description questions. The proposed QA system comprises of document preprocessing, question analysis, document analysis, and answer extraction components. Rule based and machine learning techniques are used for the question classification. The approach in the document analysis component retrieves relevant documents and filters the retrieved documents using filtering patterns for definition and description questions and for biography questions a retrieved document is only retained if it contains all terms in the target in the same order as in the question. The answer extraction component works in type-by-type manner. That is, the definition-description answer extractor extracts sentences using manually crafted answer extraction patterns. The extracted sentences are scored and ranked, and then the answer selection algorithm selects top 5 non-redundant sentences from the candidate answer set. Finally the sentences are ordered to keep their coherence. On the other hand, the biography answer extractor summarizes the filtered documents by merging them, and then the summary is displayed as an answer after it is validated. We evaluated our QA system in a modular fashion. The n fold cross validation technique is used to evaluate the two techniques utilized in the question classification. The SVM based classifier classifies about 83.3% and the rule based classifier classifies about 98.3% of the test questions correctly. The document retrieval component is tested on two data sets that are analyzed by a stemmer and morphological analyzer. The F-score on the stemmed documents is 0.729 and on the other data it set is 0.764. Moreover, the average F-score of the answer extraction component is 0.592. viii Keywords: Amharic definitional, biographical and description question answering, Rule based question classification, SVM based question classification, Document Analysis, Answer Extraction, Answer Selection.	en_US
dc.identifier.uri	http://etd.aau.edu.et/handle/123456789/3387
dc.language.iso	en	en_US
dc.publisher	Addis Ababa University	en_US
dc.subject	Amharic Definitional	en_US
dc.subject	Biographical and Description Question Answering	en_US
dc.subject	Rule Based Question Classification	en_US
dc.subject	SVM Based Question Classification	en_US
dc.subject	Document Analysis	en_US
dc.subject	Answer Extraction	en_US
dc.subject	Answer Selection	en_US
dc.title	Amharic Question Answering for Definitional, Biographical and Description Questions	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Tilahun Abedissa.pdf
Size:: 1.88 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Environmental Science