Hybrid Model for Amharic Sentiment Classification
No Thumbnail Available
Date
2022-01
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Amharic is a less-resourced language, which lacks a standard dictionary, stemmer, language
detector, subjectivity detection, negation handling, Amharic sentiment lexicon and annotated
Amharic corpora to carry out sentiment classification in social media texts. This research focuses
on sentiment analysis of Amharic texts. The first part of the research is to generate most
of these required resources and the second part of the research deals with enhancing performance
of sentiment classification using proposed approaches.
In this research, four categories of corpora are prepared: general corpora (category I), annotated
corpora (category II), lexical resources (category III) and pre-trained models (category
IV). The annotated corpora, such as Facebook comments of GCAO (2,871), PMO (6,637),
EBC (2,444) and Zemen YouTube Comments(1,440) are used for building and evaluating
Amharic sentiment classification approaches. To remedy the problems of sentiment analysis
of an under-resourced language (Amharic in this case), Amharic sentiment lexicons are generated
using dictionary based and corpus based approaches. Using a dictionary based approach,
SO-CAL (5,681) and SWN (13,677) are generated from English sentiment lexicons (using category
III). Using corpus based approach, Amharic sentiment seeds are expanded to generate
Amharic sentiment lexicon from Amharic corpora (using category I). At the threshold of 500,
the generated lexicon has a size of 8,132. The generated lexicons are evaluated in terms of subjectivity
detection, coverage, agreement and accuracy by comparison with the manual lexicon
(baseline). The generated lexicons are used for subjectivity detection and negation handling.
For sentiment classification (SC) of text on a topic, supervised, ensemble methods and BERT
transfer learning are proposed, built, tuned and evaluated under small labeled observations (using
category II). Finally, for enhancing the performance of Amharic sentiment classification, a
hybrid model (i.e. voting, averaging and blender) is developed that combines the top performing
classifiers of the earlier approaches. Experiments on the proposed hybrid models were done
using category II annotated sentiment data sets. The results show that the proposed hybrid
model (blender) has achieved performance gain as compared to SVM model (baseline) using the
data sets. The complete sentiment classification system is showed in real-time and offline applications
for detecting language, topics, subjectivity and prediction of sentiments of comments.
Description
Keywords
Amharic Sentiment Analysis . Subjectivity Detection . Amharic Sentiment Lexicon . Machine Learning . Negation Handling . Ensemble Learning . Transfer Learning . Hybrid Model