Impact of Normalization and Informal Opinionated Features on Amharic Sentiment Analysis

No Thumbnail Available

Date

2024-01

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Sentiment analysis is the computational study of people’s ideas, attitudes, and feelings concerning an object via social media networks. To analyze the sentiment of these textual contents, previous study relied on formal lexicon and emoji with semantic and syntactic information as a feature. However, informal language is now being used to express opinions the majority of the time. It is challenging to create embedding features from unlabeled Amharic text files due to morphological difficulties of the informal and unstructured nature of Amharic informal texts. Despite the fact that normalization algorithms have been developed to convert informal language into its standard form, their impact on tasks such as sentiment analysis remains unknown. To address the challenge of Amharic sentiment analysis, we apply state-of-the-art solutions to problems, such as utilizing normalization and embedding Amharic informal text contains opinionated with lowered word frequency parameters as automatic features on CNN-Bi-LSTM approaches. Using a combination of word and character n-gram embedding, potential information is generated as word vectors from unlabeled Amharic informal text files. In the studies, the maximum recall was 91.67 percent. When compared to state-of-the-art approaches using formal lexicon and emoji as a feature on Bi-LSTM, an average recall improvement of 2.8 was attained. According to the results, labeling with a mix of informal, formal lexicons, and emoji achieves 1.9 better accuracy than labeling with just formal lexicons and emoji.

Description

Keywords

Amharic, Sentiment analysis, Informal, CNN-Bi-LSTM.

Citation