Word Sense Disambiguation for Afaan Oromo Language

No Thumbnail Available

Date

2013-11

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

This thesis presents a research work on Word Sense Disambiguation for Afaan Oromo Language. A corpus based approach to disambiguation is employed where supervised machine learning techniques are applied to a corpus of Afaan Oromo language, to acquire disambiguation information automatically. It also applied Naïve Baye‟s theorem to find the prior probability and likelihood ratio of the sense in the given context. Due to lack of sense annotated text to be able to do these types of studies; a total of 1240 Afaan Oromo sense examples were collected for selected five ambiguous words namely sanyii, karaa, horii, sirna and qoqhii. The sense examples were also manually tagged with their correct senses and preprocessed to make it ready for experimentation. Hence, these sense examples were used as a corpus for disambiguation. A standard approach to WSD is to consider the context of the ambiguous word and use the information from its neighboring or collocation words. The contextual features used in this thesis were co-occurrence feature which indicate word occurrence within some number of words to the left or right of the ambiguous word. For the purpose of evaluating the system, a statistical technique called k-fold cross-validation was applied using standard performance evaluation metrics. The achieved result was encouraging, but further experiments for other ambiguous words and using different approaches will be needed for a better natural language understanding of Afaan Oromo language. Keywords: Natural Language Processing, Word Sense Disambiguation, Supervised Learning Method, Naïve Baye‟s theorem

Description

Keywords

Natural Language Processing, Word Sense Disambiguation, Supervised Learning Method, Naïve Baye‟S Theorem

Citation