Amharic Question Classification System Using Deep Learning Approach

dc.contributor.advisorAssabie, Yaregal (PhD)
dc.contributor.authorHabtamu, Saron
dc.date.accessioned2021-08-03T11:50:52Z
dc.date.accessioned2023-11-29T04:06:30Z
dc.date.available2021-08-03T11:50:52Z
dc.date.available2023-11-29T04:06:30Z
dc.date.issued2021-04-14
dc.description.abstractQuestions are used in different applications such as Question Answering (QA), Dialog System (DS), and Information Retrieval (IR). However, some questions might be too complex to be analyzed and processed. As a result, systems are expected to have a good feature extraction and analysis mechanism to linguistically understand these questions. The retrieval of wrong answers, inaccuracy of IR, and crowding the search space with irrelevant candidate answers are some of the challenges that are caused due to the inability to appropriately process and analyze questions. Question Classification (QC) aims to solve this issue by extracting the relevant features from the questions and by assigning them to the correct class category. Even though QC has been studied for various languages, it was hardly studied for the Amharic language. This research studies Amharic QC focusing on designing hierarchical question taxonomy, preparing Amharic question dataset by labeling the sample questions into their respective classes, and implementing Amharic QC (AQC) model using Convolutional Neural Network (CNN) which is part of the DL approach. The AQC uses a multilabel question taxonomy that integrates coarse and fine grain categories. This multilabel class helps us to be more accurate in retrieving answers compared to the flat taxonomy. We constructed the taxonomy by analyzing our AQ dataset and also adopting the standard taxonomies that were previously studied. We have prepared the AQs in three forms: Surface, Stemmed, and Lemmatised forms. We train and test these datasets using a word vectorizer trained on surface words noticing that most interrogative words appear to be similar even when they are stemmed and lemmatized. As a result, we have achieved 97% and 90% training and validation accuracy for Surface AQs. Scoring 40% for the stemmed AQs. However, the word2vec model could not represent the lemmatized AQs appropriately. As a result, no results were obtained during training. we also tried to extract features from AQs by using different filters separately. This gave us an accuracy of 86% while requiring an increasing number of training epochs.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/123456789/27559
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectAmharic Question Classificationen_US
dc.subjectDeep Learningen_US
dc.subjectCnn, Fine Grainen_US
dc.subjectCoarse Grain Hierarchical Taxonomiesen_US
dc.subjectWord2vecen_US
dc.titleAmharic Question Classification System Using Deep Learning Approachen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Saron Habtamu 2021.pdf
Size:
1.19 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: