Unsupervised Corpus Based Approach for Word Sense Disambiguation to Afaan Oromo Words

dc.contributor.advisorAbebe Ermias (Ato)
dc.contributor.authorGemechu Feyisa
dc.date.accessioned2018-11-10T16:52:57Z
dc.date.accessioned2023-11-18T12:43:53Z
dc.date.available2018-11-10T16:52:57Z
dc.date.available2023-11-18T12:43:53Z
dc.date.issued2015-06
dc.description.abstractThis thesis presents a research work on Word Sense Disambiguation for Afaan Oromo Language. A corpus based approach to disambiguation is employed where unsupervised machine learning techniques are applied to a corpus of Afaan Oromo language, to acquire disambiguation information automatically. We tested five clustering algorithms (simple k means, hierarchical agglomerative: Single, Average and complete link and Expectation Maximization algorithms) in the existing implementation of Weka 3.6.11 package. “Cluster via classification” evaluation mode was used to learn the selected algorithms in the preprocessed dataset. Due to lack of sense annotated text to be able to do these types of studies; a total of 1500 Afaan Oromo sense examples were collected for selected seven ambiguous words namely sanyii, karaa, horii, sirna and qoqhii, ulfina, ifa. Different preprocessing activities like tokenization, stop word removal and stemming were applied on the sense example sentences to make it ready for experimentation. Hence, these sense examples were used as a corpus for disambiguation. A standard approach to WSD is to consider the context of the ambiguous word and use the information from its neighboring or collocation words. The contextual features used in this thesis were co-occurrence feature which indicate word occurrence within some number of words to the left or right of the ambiguous word. For the purpose of evaluating the system, a training dataset was applied using standard performance evaluation matrics. The achieved result was encouraging, because clustering algorithms were achieved better in terms of accuracy of supervised machine learning approaches on the some dataset similar. But, further experiments for other ambiguous words and using different approaches will be needed for a better natural language understanding of Afaan Oromo language.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/12345678/14138
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectNatural Language Processing, Word Sense Disambiguation, Unsupervised Learningen_US
dc.titleUnsupervised Corpus Based Approach for Word Sense Disambiguation to Afaan Oromo Wordsen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Feyisa Gemechu.pdf
Size:
1.6 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: