AAU Institutional Repository

Use of Part of Speech Tagging for Afaan Oromo Word Sense Modeling

Show simple item record

dc.contributor.advisor Teferra, Solomon (PhD)
dc.contributor.author Daniel, Lalise
dc.date.accessioned 2019-05-07T09:08:43Z
dc.date.available 2019-05-07T09:08:43Z
dc.date.issued 2019-02-02
dc.identifier.uri http://localhost:80/xmlui/handle/123456789/18198
dc.description.abstract Word sense induction (WSI) is the task of automatically discovering all senses of an ambiguous word in a corpus. Induced senses can lead researchers in machine translation and information retrieval to improved performance. In this thesis we have investigated the application of POS tagging to increase the performance of Word Sense Disambiguation for Afaan Oromo by word sense modeling. In order to conduct the study the untagged corpus was taken from yehuwalashet [1]. We prepared annotated corpus by implementing POS tagging on the data. A total corpus of 424397 words for WSM and 29845 words for POS tagging with 20 ambiguous words were used to test the system. For POS tagging purpose NLTK and Python Programming were used and to run the WSM system Java Neatbean were used. Different preprocessing tasks such as Tokenization, stop word removal and normalization were applied on both unannotated and POS tagged annotated corpus to make them ready for the experiment. The experiments were done with two clustering algorithms: EM and K-means and one to three context window size. Experiment results show that using annotated corpus for both approach improved the performance of the system. ML approach with EM algorithm achieved 74.85% for annotated corpus and 70.35% for unannotated one. Hybrid approach with k-means algorithm scored 79.1% for annotated corpus and 74.85% for unannotated corpus. EM algorithm generated error results for hybrid approach. The result showed that using annotated corpus improves the WSM system of Afaan Oromo Words and hybrid approach of WSM system performed good using POS annotated corpus for Afaan Oromo words . en_US
dc.language.iso en en_US
dc.publisher Addis Ababa University en_US
dc.subject Speech Tagging en_US
dc.subject Afaan Oromo Word en_US
dc.subject Sense Modeling en_US
dc.subject Methodology en_US
dc.subject Data/Corpus Preparation en_US
dc.title Use of Part of Speech Tagging for Afaan Oromo Word Sense Modeling en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search AAU-ETD


Browse

My Account