Word Sense Disambiguation using Semantic Similarity for Query Expansion in Amharic Information Retrieval
No Thumbnail Available
Date
2014-10
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Query expansion is an effective technique to control the effect of polysmous and synonymous nature of words, thereby improving the performance of information retrieval system. The source of the expansion terms is an important issue in query expansion and determining the sense of each query term is essential for effective retrieval. This study attempts to extend the application of query expansion using semantic similarity measure towards designing an effective word sense disambiguation. Word sense disambiguation is one of the problems involved with context based query expansion. How to use sense information to expand the query is another issue when dealing with query expansion.
This study presents approaches to determine the senses of words in queries by using Amharic lexical resource. The lexical resource like WordNet is the first component that is used as knowledge base. Word Sense Disambiguation is second, which is used to identify the sense of the given query using semantic similarity measure from the knowledge base. Using the idea of lesk algorithm, word sense disambiguation is performed with two methods; gloss to gloss and synset to gloss by comparing information associated with its synonyms and gloss definition with reference to Amharic WordNet. The third one is Query reformulation which helped to expand the query with the identified sense using word sense disambiguation from the knowledge base. The combination of the two disambiguation methods formulates the modified query and used for expanding the original query. Finally, the query expansion module is integrated with Information Retrieval system to show the enhancement of Amharic IR system performance.
This study shows an effective use of WSD using semantic similarity for identifying the sense and to form the new query. As the experimental result show, the method using synset for query expansion register performance of 59% F-measure. This method registered an improvement of 6% from original query. The number of information associated to each terms is limited because of the lack of resource. Therefore, the use of similarity measure and the use of query expansion terms are limited based on the information available on the lexical resource.
Keywords: Word Sense Disambiguation, Semantic similarity, Information Retrieval
Description
Keywords
Word Sense Disambiguation, Semantic similarity, Information Retrieval