Afaan Oromo Word Sense Disambiguation Using Wordnet

Tesfaye, Birhane

Afaan Oromo Word Sense Disambiguation Using Wordnet

Files

Birhane Tesfaye 2017.pdf (1.69 MB)

Date

11/2/2017

Authors

Tesfaye, Birhane

Publisher

Addis Ababa University

Abstract

All human languages have words that can mean different things in different contexts. In the natural language processing community, Word Sense Disambiguation (WSD) has been described as the task which selects the appropriate meaning (sense) to a given word in a text or discourse where this meaning is distinguishable from other senses potentially attributable to that word. One of the several approaches proposed in the past is Michael Lesk’s 1986 algorithm. This algorithm is based on two assumptions. First, when two words are used in close proximity in a sentence, they must be talking of a related topic and second, if one sense each of the two words can be used to talk of the same topic, then their dictionary definitions must use some common words. For example, when the words ”pine cone” occur together, they are talking of ”evergreen trees”, and indeed one meaning each of these two words has the words ”evergreen” and ”tree” in their definitions. Thus we can disambiguate neighboring words in a sentence by comparing their definitions and picking those senses whose definitions have the most number of common words. The main drawback of this algorithm is that dictionary definitions are often very short and just do not have enough words for this algorithm to work well. To overcome this problem Satanjeev Banerjee 2002 deal with this problem by adapting Lesk algorithm to the semantically organized lexical database called WordNet. Besides storing words and their meaning like a normal dictionary, WordNet also ”connects” related words together. To this end, we have developed a WSD system that identifies a sense of an Afaan Oromo ambiguous word by using information from Afaan Oromo WordNet. The system identifies the sense by checking different types of sense relationships between words that will help to identify the sense of a word, The conventional WordNet organizes nouns, verbs, adjectives and adverbs together into sets of synonyms called synsets each expressing a different concept. In contrast to the structure of conventional WordNet, we used a clue word based model of WordNet. The related words for each sense of a polysemy word are referred to as the clue words. These clue words are used to disambiguate the correct meaning of the polysemy word in the given context using knowledge based Word Sense Disambiguation (WSD) algorithms. The clue word can be a noun, verb, adjective or adverb which can solve limitation of English WordNet which has limited number of cross pos relation(relation not between single part of speech ). The performance of the system is tested using 50 polysemy Afaan Oromo ambiguous words which are selected randomly. The performance of the WSD based on clue word based WordNet achieved 92%.

Keywords

Word Sense Disambiguation, Wordnet, Clue Word, Sense Relationships

URI

http://etd.aau.edu.et/handle/123456789/17852

Collections

Computer Science

Full item page

Afaan Oromo Word Sense Disambiguation Using Wordnet

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections