Effect of Preprocessing on Long Short-Term Memory Based Sentiment Analysis for Amharic Language
No Thumbnail Available
Date
2020-07-04
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
This paper presents effect of preprocessing on Long Short Term Memory (LSTM) based
sentiment analysis for Amharic language. Sentiment analysis or opinion mining is an
approach used to analyze user generated textual contents to a way that is important
for decision making. User generated textual contents are found everywhere such as,
social media posts, product reviews blogs and form. Developing sentiment analysis is
a challenging task due to different writing styles and variation of word meanings. To
analyze the sentiment of these textual contents, several approaches use labeled lexicons.
In the preprocessing step of the approaches, Emojis are removed and words are stemmed.
However, Emojis are usually used to express opinions.
In this research, we propose to use Emojis to automatically label texts for sentiment
analysis. In addition, we investigate the impact of using unstemmed words on sentiment
analysis. To evaluate the proposed labeling scheme on sentiment anaslysis, we conducted
an experiment using 9,138 Amharic textual comments. The results show that integrating
Emojis with lexicons for labeling gives 0.55% higher accuracy than using only lexicons.
To investigate the effect of using stemming as part of preprocessing strategy, LSTM
based Amharic sentiment analysis with and without stemming is conducted using 1077
comments. Result shows that applying stemming drops the accuracy of the sentiment
analysis by 6.43% while using long short-term memory based sentiment analysis, and
0.43% while using bi-gram multinomial naive bayes.
Keyword: - Amharic sentiment analysis, Emoji, Natural Language Processing (NLP),
Sentiment analysis, Stemming.
Description
Keywords
Long Short-Term Memory, Amharic Language, Preprocessing, Sentiment Analysis