Automatic Categorization of Amharic News Text: a Machine Learning Approach

dc.contributor.advisorBekele Rahel (W/ro)
dc.contributor.advisorAbdela Woinshet (W/ro)
dc.contributor.advisorLamnew Workshet (Ato)
dc.contributor.authorTeklu Surafel
dc.date.accessioned2018-11-30T11:35:37Z
dc.date.accessioned2023-11-18T12:44:07Z
dc.date.available2018-11-30T11:35:37Z
dc.date.available2023-11-18T12:44:07Z
dc.date.issued2003-07
dc.description.abstractCurrently newspaper companies and news agencies in Ethiopia are implementing a manual categorization system to categorize Amharic news articles in their day-to-day activities (although they are using computer system to store and dispatch information). The objective of this research was to investigate the application of machine learning techniques to automatic categorization of Amharic news items. 11, 024 news articles were used to do this research. To come up with good results text preparation and preprocessing was done. Stop-word and words that occur in 3 or less documents were removed from the collection. Thirty-three percent of the data was used for testing purposes. Machine learning techniques, Naïve Bayes and k Nearest Neigbor classifiers, were used to categorize the Amharic news items. The result of this research indicated that such classifiers are applicable to automatically classify Amharic news items. However, the classifiers work well when the categories contain almost evenly distributed news items. The best result obtained by the naïve Bayes and kNN classifiers is on three categories data (95.80% vs. 89.61%) and the least performance is shown on the 16 categories (78.48% vs. 64.50%) respectively. The 16 categories contain unevenly distributed data than the three categories and it is learnt that unevenly distributed numbers of documents over the categories decreases the performance of both classifiers; K nearest Neighbor dramatically decreases than naïve Bayes. This research indicated that Naïve Bayes is more applicable to automatic categorization of Amharic news items. The result of this research is promising. Nevertheless, additional works are recommended in order to come up with good result. Keywords: Text categorization, machine Learning, naïve Bayes, K Nearest Neigboren_US
dc.identifier.urihttp://etd.aau.edu.et/handle/12345678/14758
dc.language.isoenen_US
dc.subjectText categorization,en_US
dc.subjectmachine Learning,en_US
dc.subjectnaïve Bayes,en_US
dc.subjectK Nearest Neigboren_US
dc.titleAutomatic Categorization of Amharic News Text: a Machine Learning Approachen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Surafel Teklu.pdf
Size:
649.31 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: