A Generic Approach towards All Words Amharic Word Sense Disambiguation

No Thumbnail Available

Date

2017-02-05

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Sense disambiguation is an “intermediate task” which is helpful in other NLP tasks like machine translation, information retrieval and hypertext navigation, content and thematic analysis, grammatical analysis, speech processing and text processing. This study attempts to explore a more general approach to develop a WSD for Amharic language. To this end, a WSD system that identifies a sense of an Amharic ambiguous word by using information from tagged example sentences and Word-Net is developed. The system identifies the sense by measuring similarity between the input sentence and tagged example sentences. Two similarity measures are explored: Cosine similarity and Jaccard Coefficient similarity measure. We have collected 100 example sentences for each sense of the selected Amharic ambiguous words. The Word-Net is composed of words with their sysnonyms and gloss definition. The performance of the system is tested using 9 nouns, 3 verbs, 3 adjectives and 2 adverbs, a total 17 words which are selected randomly. The experiments were done for disambiguating one target word in a given text.The experimental step is designed in such a way that, first the performance of Cosine similarity and Jaccard coefficient are checked individually for WSD, next Lesk algorithm is tested on the third experiment and then experiments were conducted to check the performance of the two similarity measures as combined with Lesk algorithm. The result showed that Jaccard coefficient combined with Lesk algorithm come up with the highest result, which is 89.83% accuracy. The major challenge during the disambiguation process is that for those words that are frequently collocated with similar words in their different senses the system come up with a least accuracy.

Description

Keywords

All Words Amharic Word Sense

Citation