Effect of Preprocessing on Long Short-Term Memory Based Sentiment Analysis for Amharic Language

No Thumbnail Available

Date

2020-07-04

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

This paper presents effect of preprocessing on Long Short Term Memory (LSTM) based sentiment analysis for Amharic language. Sentiment analysis or opinion mining is an approach used to analyze user generated textual contents to a way that is important for decision making. User generated textual contents are found everywhere such as, social media posts, product reviews blogs and form. Developing sentiment analysis is a challenging task due to different writing styles and variation of word meanings. To analyze the sentiment of these textual contents, several approaches use labeled lexicons. In the preprocessing step of the approaches, Emojis are removed and words are stemmed. However, Emojis are usually used to express opinions. In this research, we propose to use Emojis to automatically label texts for sentiment analysis. In addition, we investigate the impact of using unstemmed words on sentiment analysis. To evaluate the proposed labeling scheme on sentiment anaslysis, we conducted an experiment using 9,138 Amharic textual comments. The results show that integrating Emojis with lexicons for labeling gives 0.55% higher accuracy than using only lexicons. To investigate the effect of using stemming as part of preprocessing strategy, LSTM based Amharic sentiment analysis with and without stemming is conducted using 1077 comments. Result shows that applying stemming drops the accuracy of the sentiment analysis by 6.43% while using long short-term memory based sentiment analysis, and 0.43% while using bi-gram multinomial naive bayes. Keyword: - Amharic sentiment analysis, Emoji, Natural Language Processing (NLP), Sentiment analysis, Stemming.

Description

Keywords

Long Short-Term Memory, Amharic Language, Preprocessing, Sentiment Analysis

Citation