Semantic-Aware Amharic Text Classification Using Deep Learning Approach

No Thumbnail Available

Date

10/10/2020

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Now we are at the age of information era, information is stored, extracted, and used in different formats. Text can be an extremely rich source of information, but extracting insights from it can be hard and time-consuming due to its unstructured nature. Text classification is one of the methods used to organize massively available textual information in a meaningful context to maximize the utilization of information. Amharic text classification is done using the classical and traditional machine learning approaches with a limitation of semantic representation and use of high engineered feature extraction. However, the newly emerged deep learning approach and the use of word embedding improves the performance of text classification through extracting features automatically and represent words semantically in sparse vector. Thus, we develop the LSTM model to train our data and to make the classification process. The classification of the Amharic text documents using the LSTM pass through the process of; preprocessing, word-embedding, deep network building, output determination, training the model, and classification. The semantics of document is done using word2vec, to map similar words in to a single vector using neural network architecture. Thus, the vector representations of words are used as the input for the dep network building component. The model is evaluated using accuracy and loss by training, testing, and validation dataset and resulted 92.13 testing accuracy, and 86.71 validation accuracy.

Description

Keywords

Text Classification, Deep Learning, Rnn, Lstm, and Word-Embedding

Citation

Collections