Text Retrieval Using Self-Organised Document Map: The Case of Ilri Digital Library
No Thumbnail Available
Date
2002-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
emphasises the need for intelligent information retrieval techniques. Especially in the rapidly
growing digital libraries and distributed access, it is important to have automatic methods for
exploring document collections. In this study, the WEBSOM method is used with a quarter of
century of research publications maintained by the International Livestock Research
Institute for 'this task. The Self-Organizing Map (SOM), also known as Kohonen's feature
map (a means for automatically arranging high-dimensional statistical data), is used to
position encoded documents onto a map that provides a general view into the text collection.
The general view visualises similarity relations between the documents on a two-dimensional
map display, which can be utilised in exploring the material rather than having to rely on
traditional search expressions. Similar documents become mapped close to each other
providing an intuitive mechanism and ease of access for maximising the institute's digital
information and knowledge resources particularly for users with limited domain knowledge.
This study also sheds some light on the power of the SOM in solving problems of highdimensional
data. The trained SOM and the user interface are now usable to both browse the
collection and to automatically map new documents. It can successfully make a distinction
between the various types of documents and efficiently clusters similar publications to near
by locations. It is quite evident that the WEBSOM can effectively visualize the results and is
thus especially suitable for exploration tasks without the need to come up with search
expressions, which may be difficult even with a rather clear idea of the desired information.
The method is a major breakthrough with respect to the much harder problem, for which
search methods are usually not even expected to offer much support, encountered when there
exists only a vague idea of the object of interest. The same hold true if and when the area of
interest resides at the outer edges of one's current knowledge.
This full-fledged report presents most of the situations that may be encountered in a
project that explores the practical application of a WEBSOM method to solve the basic
problem of devising a suitable search expression, which could neither leave out relevant
documents, nor produce long listings of irrelevant hits. The report also provides the general
context of text retrieval and a detailed discussion on the actual method used in this research
in the various sections. The step-by-step procedures and functions used in both encoding the
document collection (preprocessing), computation of the Kohonen feature map and the
development of the web-based map interface as well as a discussion of the essential results
together with the codes used are included in the report.
Description
Keywords
Information Science