Use of Part of Speech Tagging for Afaan Oromo Word Sense Modeling

dc.contributor.advisorTeferra, Solomon (PhD)
dc.contributor.authorDaniel, Lalise
dc.date.accessioned2019-05-07T09:08:43Z
dc.date.accessioned2023-11-18T12:47:35Z
dc.date.available2019-05-07T09:08:43Z
dc.date.available2023-11-18T12:47:35Z
dc.date.issued2019-02-02
dc.description.abstractWord sense induction (WSI) is the task of automatically discovering all senses of an ambiguous word in a corpus. Induced senses can lead researchers in machine translation and information retrieval to improved performance. In this thesis we have investigated the application of POS tagging to increase the performance of Word Sense Disambiguation for Afaan Oromo by word sense modeling. In order to conduct the study the untagged corpus was taken from yehuwalashet [1]. We prepared annotated corpus by implementing POS tagging on the data. A total corpus of 424397 words for WSM and 29845 words for POS tagging with 20 ambiguous words were used to test the system. For POS tagging purpose NLTK and Python Programming were used and to run the WSM system Java Neatbean were used. Different preprocessing tasks such as Tokenization, stop word removal and normalization were applied on both unannotated and POS tagged annotated corpus to make them ready for the experiment. The experiments were done with two clustering algorithms: EM and K-means and one to three context window size. Experiment results show that using annotated corpus for both approach improved the performance of the system. ML approach with EM algorithm achieved 74.85% for annotated corpus and 70.35% for unannotated one. Hybrid approach with k-means algorithm scored 79.1% for annotated corpus and 74.85% for unannotated corpus. EM algorithm generated error results for hybrid approach. The result showed that using annotated corpus improves the WSM system of Afaan Oromo Words and hybrid approach of WSM system performed good using POS annotated corpus for Afaan Oromo words .en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/12345678/18198
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectSpeech Taggingen_US
dc.subjectAfaan Oromo Worden_US
dc.subjectSense Modelingen_US
dc.subjectMethodologyen_US
dc.subjectData/Corpus Preparationen_US
dc.titleUse of Part of Speech Tagging for Afaan Oromo Word Sense Modelingen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Lalise Daniel 2019.pdf
Size:
1.2 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: