Hybrid Model for Amharic Sentiment Classification

No Thumbnail Available

Date

2022-01

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Amharic is a less-resourced language, which lacks a standard dictionary, stemmer, language detector, subjectivity detection, negation handling, Amharic sentiment lexicon and annotated Amharic corpora to carry out sentiment classification in social media texts. This research focuses on sentiment analysis of Amharic texts. The first part of the research is to generate most of these required resources and the second part of the research deals with enhancing performance of sentiment classification using proposed approaches. In this research, four categories of corpora are prepared: general corpora (category I), annotated corpora (category II), lexical resources (category III) and pre-trained models (category IV). The annotated corpora, such as Facebook comments of GCAO (2,871), PMO (6,637), EBC (2,444) and Zemen YouTube Comments(1,440) are used for building and evaluating Amharic sentiment classification approaches. To remedy the problems of sentiment analysis of an under-resourced language (Amharic in this case), Amharic sentiment lexicons are generated using dictionary based and corpus based approaches. Using a dictionary based approach, SO-CAL (5,681) and SWN (13,677) are generated from English sentiment lexicons (using category III). Using corpus based approach, Amharic sentiment seeds are expanded to generate Amharic sentiment lexicon from Amharic corpora (using category I). At the threshold of 500, the generated lexicon has a size of 8,132. The generated lexicons are evaluated in terms of subjectivity detection, coverage, agreement and accuracy by comparison with the manual lexicon (baseline). The generated lexicons are used for subjectivity detection and negation handling. For sentiment classification (SC) of text on a topic, supervised, ensemble methods and BERT transfer learning are proposed, built, tuned and evaluated under small labeled observations (using category II). Finally, for enhancing the performance of Amharic sentiment classification, a hybrid model (i.e. voting, averaging and blender) is developed that combines the top performing classifiers of the earlier approaches. Experiments on the proposed hybrid models were done using category II annotated sentiment data sets. The results show that the proposed hybrid model (blender) has achieved performance gain as compared to SVM model (baseline) using the data sets. The complete sentiment classification system is showed in real-time and offline applications for detecting language, topics, subjectivity and prediction of sentiments of comments.

Description

Keywords

Amharic Sentiment Analysis . Subjectivity Detection . Amharic Sentiment Lexicon . Machine Learning . Negation Handling . Ensemble Learning . Transfer Learning . Hybrid Model

Citation