Use of Part of Speech Tagging for Afaan Oromo Word Sense Modeling
dc.contributor.advisor | Teferra, Solomon (PhD) | |
dc.contributor.author | Daniel, Lalise | |
dc.date.accessioned | 2019-05-07T09:08:43Z | |
dc.date.accessioned | 2023-11-18T12:47:35Z | |
dc.date.available | 2019-05-07T09:08:43Z | |
dc.date.available | 2023-11-18T12:47:35Z | |
dc.date.issued | 2019-02-02 | |
dc.description.abstract | Word sense induction (WSI) is the task of automatically discovering all senses of an ambiguous word in a corpus. Induced senses can lead researchers in machine translation and information retrieval to improved performance. In this thesis we have investigated the application of POS tagging to increase the performance of Word Sense Disambiguation for Afaan Oromo by word sense modeling. In order to conduct the study the untagged corpus was taken from yehuwalashet [1]. We prepared annotated corpus by implementing POS tagging on the data. A total corpus of 424397 words for WSM and 29845 words for POS tagging with 20 ambiguous words were used to test the system. For POS tagging purpose NLTK and Python Programming were used and to run the WSM system Java Neatbean were used. Different preprocessing tasks such as Tokenization, stop word removal and normalization were applied on both unannotated and POS tagged annotated corpus to make them ready for the experiment. The experiments were done with two clustering algorithms: EM and K-means and one to three context window size. Experiment results show that using annotated corpus for both approach improved the performance of the system. ML approach with EM algorithm achieved 74.85% for annotated corpus and 70.35% for unannotated one. Hybrid approach with k-means algorithm scored 79.1% for annotated corpus and 74.85% for unannotated corpus. EM algorithm generated error results for hybrid approach. The result showed that using annotated corpus improves the WSM system of Afaan Oromo Words and hybrid approach of WSM system performed good using POS annotated corpus for Afaan Oromo words . | en_US |
dc.identifier.uri | http://etd.aau.edu.et/handle/12345678/18198 | |
dc.language.iso | en | en_US |
dc.publisher | Addis Ababa University | en_US |
dc.subject | Speech Tagging | en_US |
dc.subject | Afaan Oromo Word | en_US |
dc.subject | Sense Modeling | en_US |
dc.subject | Methodology | en_US |
dc.subject | Data/Corpus Preparation | en_US |
dc.title | Use of Part of Speech Tagging for Afaan Oromo Word Sense Modeling | en_US |
dc.type | Thesis | en_US |