Impact of Normalization and Informal Opinionated Features on Amharic Sentiment Analysis
No Thumbnail Available
Date
2024-01
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Sentiment analysis is the computational study of people’s ideas, attitudes, and
feelings concerning an object via social media networks. To analyze the sentiment
of these textual contents, previous study relied on formal lexicon and emoji with
semantic and syntactic information as a feature. However, informal language is
now being used to express opinions the majority of the time. It is challenging to
create embedding features from unlabeled Amharic text files due to morphological
difficulties of the informal and unstructured nature of Amharic informal texts.
Despite the fact that normalization algorithms have been developed to convert
informal language into its standard form, their impact on tasks such as sentiment
analysis remains unknown.
To address the challenge of Amharic sentiment analysis, we apply state-of-the-art
solutions to problems, such as utilizing normalization and embedding Amharic
informal text contains opinionated with lowered word frequency parameters as automatic
features on CNN-Bi-LSTM approaches. Using a combination of word and
character n-gram embedding, potential information is generated as word vectors
from unlabeled Amharic informal text files. In the studies, the maximum recall
was 91.67 percent. When compared to state-of-the-art approaches using formal
lexicon and emoji as a feature on Bi-LSTM, an average recall improvement of 2.8
was attained. According to the results, labeling with a mix of informal, formal
lexicons, and emoji achieves 1.9 better accuracy than labeling with just formal
lexicons and emoji.
Description
Keywords
Amharic, Sentiment analysis, Informal, CNN-Bi-LSTM.