Afaan Oromo Word Sense Disambiguation Using Wordnet
No Thumbnail Available
Date
11/2/2017
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
All human languages have words that can mean different things in different contexts. In the natural language processing community, Word Sense Disambiguation (WSD) has been described as the task which selects the appropriate meaning (sense) to a given word in a text or discourse where this meaning is distinguishable from other senses potentially attributable to that word.
One of the several approaches proposed in the past is Michael Lesk’s 1986 algorithm.
This algorithm is based on two assumptions. First, when two words are used in close proximity in a sentence, they must be talking of a related topic and second, if one sense each of the two words can be used to talk of the same topic, then their dictionary definitions must use some common words. For example, when the words ”pine cone” occur together, they are talking of ”evergreen trees”, and indeed one meaning each of these two words has the words ”evergreen” and ”tree” in their definitions. Thus we can disambiguate neighboring words in a sentence by comparing their definitions and picking those senses whose definitions have the most number of common words. The main drawback of this algorithm is that dictionary definitions are often very short and just do not have enough words for this algorithm to work well. To overcome this problem Satanjeev Banerjee 2002 deal with this problem by adapting Lesk algorithm to the semantically organized lexical database called WordNet. Besides storing words and their meaning like a normal dictionary, WordNet also ”connects” related words together.
To this end, we have developed a WSD system that identifies a sense of an Afaan Oromo ambiguous word by using information from Afaan Oromo WordNet. The system identifies the sense by checking different types of sense relationships between words that will help to identify the sense of a word, The conventional WordNet organizes nouns, verbs, adjectives and adverbs together into sets of synonyms called synsets each expressing a different concept. In contrast to the structure of conventional WordNet, we used a clue word based model of WordNet. The related words for each sense of a polysemy word are referred to as the clue words. These clue words are used to disambiguate the correct meaning of the polysemy word in the given context using knowledge based Word Sense Disambiguation (WSD) algorithms. The clue word can be a
noun, verb, adjective or adverb which can solve limitation of English WordNet which has limited number of cross pos relation(relation not between single part of speech ).
The performance of the system is tested using 50 polysemy Afaan Oromo ambiguous words which are selected randomly. The performance of the WSD based on clue word based WordNet achieved 92%.
Description
Keywords
Word Sense Disambiguation, Wordnet, Clue Word, Sense Relationships