Content Based Multi-Class Amharic Short Message Service Classification using Machine Learning

dc.contributor.advisorRosa Tsegaye (PhD)
dc.contributor.authorAlem Tedla
dc.date.accessioned2023-12-14T14:49:01Z
dc.date.available2023-12-14T14:49:01Z
dc.date.issued2023-08-31
dc.description.abstractShort message services are one of the most common communication methods. With the increased usage of SMS, a variety of Amharic-content spam and smishing messages have also increased. Spam SMS messages are unwanted messages received on a mobile phone. Examples of spam messages are advertisements, promotions, and information from organizations, whereas smishing messages critically harm users and service providers. Free fees, rewards, fake lottery tickets, and malicious links are among the types of smishing messages. Both types of SMS cause poor customer experiences and reduced revenue for operators. Therefore, many studies have been conducted to classify short message service using a foreign language SMS dataset to keep and win customers and avoid revenue loss. However, the features they used are not relevant for Amharic SMS classification due to the diversity of SMS characteristics. This paper studies a model that classifies Amharic SMS using a machine learning technique. The model classifies Amharic SMS texts into three classes that are ham, spam, and smishing. The model was trained on 1844 labeled messages. The features have been prepared as follows: Two relevant features have been selected from English spam detection approaches; three new features have been created; and 162 keywords have been extracted from the dataset and vectorized using TF-IDF. Then the Random Forest classifier has been trained using the prepared features, using 10-fold cross-validation. Finally, the prepared features with RF outperformed the existing approaches that have been done for foreign languages by 6% and achieved a 0.99 F1-score to classify Amharic SMS.
dc.identifier.urihttp://etd.aau.edu.et/handle/123456789/959
dc.language.isoen_US
dc.publisherAddis Ababa University
dc.titleContent Based Multi-Class Amharic Short Message Service Classification using Machine Learning
dc.typeThesis

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Alem Tedla.pdf
Size:
1.5 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: