Unsupervised Corpus Based Approach  for Word Sense Disambiguation to  Afaan Oromo Words

Gemechu Feyisa

Unsupervised Corpus Based Approach for Word Sense Disambiguation to Afaan Oromo Words

dc.contributor.advisor	Abebe Ermias (Ato)
dc.contributor.author	Gemechu Feyisa
dc.date.accessioned	2018-11-10T16:52:57Z
dc.date.accessioned	2023-11-18T12:43:53Z
dc.date.available	2018-11-10T16:52:57Z
dc.date.available	2023-11-18T12:43:53Z
dc.date.issued	2015-06
dc.description.abstract	This thesis presents a research work on Word Sense Disambiguation for Afaan Oromo Language. A corpus based approach to disambiguation is employed where unsupervised machine learning techniques are applied to a corpus of Afaan Oromo language, to acquire disambiguation information automatically. We tested five clustering algorithms (simple k means, hierarchical agglomerative: Single, Average and complete link and Expectation Maximization algorithms) in the existing implementation of Weka 3.6.11 package. “Cluster via classification” evaluation mode was used to learn the selected algorithms in the preprocessed dataset. Due to lack of sense annotated text to be able to do these types of studies; a total of 1500 Afaan Oromo sense examples were collected for selected seven ambiguous words namely sanyii, karaa, horii, sirna and qoqhii, ulfina, ifa. Different preprocessing activities like tokenization, stop word removal and stemming were applied on the sense example sentences to make it ready for experimentation. Hence, these sense examples were used as a corpus for disambiguation. A standard approach to WSD is to consider the context of the ambiguous word and use the information from its neighboring or collocation words. The contextual features used in this thesis were co-occurrence feature which indicate word occurrence within some number of words to the left or right of the ambiguous word. For the purpose of evaluating the system, a training dataset was applied using standard performance evaluation matrics. The achieved result was encouraging, because clustering algorithms were achieved better in terms of accuracy of supervised machine learning approaches on the some dataset similar. But, further experiments for other ambiguous words and using different approaches will be needed for a better natural language understanding of Afaan Oromo language.	en_US
dc.identifier.uri	http://etd.aau.edu.et/handle/12345678/14138
dc.language.iso	en	en_US
dc.publisher	Addis Ababa University	en_US
dc.subject	Natural Language Processing, Word Sense Disambiguation, Unsupervised Learning	en_US
dc.title	Unsupervised Corpus Based Approach for Word Sense Disambiguation to Afaan Oromo Words	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Feyisa Gemechu.pdf
Size:: 1.6 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Information Sciences