Hate Speech Detection and Classification System in Amharic Text with Deep Learning

dc.contributor.advisorAssabie, Yaregal (PhD)
dc.contributor.authorMinale, Samuel
dc.date.accessioned2022-03-22T11:46:22Z
dc.date.accessioned2023-11-18T12:48:23Z
dc.date.available2022-03-22T11:46:22Z
dc.date.available2023-11-18T12:48:23Z
dc.date.issued2021-08-19
dc.description.abstractSocial media is becoming the main source of information intake, allowing users to share their views freely and widely. However, the unregulated nature of this information access is making social media platforms a ground for the proliferation of hate speech and fake news. It is evident that online hate speech could materialize to an offline impact beyond its psychological effects on victims. More particularly, for multi-nation and multireligious as well as less democratic countries like Ethiopia, hate speech is causing drastic consequences by triggering or igniting conflicts. Detecting hate speeches for resourceful languages like English is getting better due to the availability of trained models and enough moderators. In the case of Ethiopia, except for the new declaration of hate speech proclamation, there are no automated hate speech detection mechanisms for the local languages, including Amharic which is the official working language of Ethiopia. One of the mitigation efforts to decrease the effect of hate speech in Ethiopia was to shut down Internet connections, which happened several times in the past. The development of a hate speech detection system for Amharic will be a solution in many aspects. 1) The system helps policymakers and peacemakers to automatically detect and act when hate speech comments are circulating on the Internet. 2) It also will help social media platform owners such as Facebook and Twitter to automatically flagging hate speech comments before it reaches larger audiences. Even if hate speech is a global issue, the systems which are developed for English or other languages cannot be directly applied to detect hate speeches in Amharic. So, we need to have a new home-grown solution. Taking this into consideration, we developed a system that can detect and classify text into four categories. The system is developed using Stacked Bidirectional Long Short Term Memory Networks (SBi-LSTM) which is a variety of Deep Learning based machine learning methods. This system is compared against two of our baseline detection systems which are developed using dummy classifiers and classical machine learning approaches. The deep learning system has achieved a greater accuracy result than the other systems. This deep learning system has shown a promising result by achieving a 94.8% F1-score accuracy result using fastText word embedding for vector representation. For the development of the system, we have collected and annotated 5,000 Amharic corpus data into racial, religion, gender and normal speech categories using our own custom annotation tool using 100 annotators. Our system has enabled multi-label categorical classification of hate speech which is useful to get statistical information for any responsible organization to focus on the vulnerable group, society or religion. Having a hate speech classification system development for Amharic is challenging due to the unavailability of an annotated dataset, morphologically richness of Amharic and there was no Amharic hate speech classification study that could be used as a baseline. Our system can be improved by having a more dataset and by adding more other training layers in addition to the SBi-LSTM layer.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/12345678/30770
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectAmharic Hate Speech Detection and Classificationen_US
dc.subjectAmharic Post and Comment Dataseten_US
dc.subjectDeep Learningen_US
dc.subjectMachine Learningen_US
dc.subjectRNNen_US
dc.subjectStacked Bidirectional LSTMen_US
dc.titleHate Speech Detection and Classification System in Amharic Text with Deep Learningen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Samuel Minale 2021.pdf
Size:
1.18 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: