Text Retrieval using Self-organised Document Map: The Case of ILRI Digital Library
No Thumbnail Available
Date
2002-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The current availability of large collections of full-text documents in electronic form
emphasises the need for intelligent information retrieval techniques. Especially in
the rapidly growing digital libraries and distributed access, it is important to have
automatic methods for exploring document collections. In this study, the
WEBSOM method is used with a quarter of century of research publications
maintained by the International Livestock Research Institute for this task. The Self-
Organising Map (SOM), also known as Kohonen’s feature map (a means for
automatically arranging high-dimensional statistical data), is used to position
encoded documents onto a map that provides a general view into the text
collection. The general view visualises similarity relations between the documents
on a two-dimensional map display, which can be utilised in exploring the material
rather than having to rely on traditional search expressions. Similar documents
become mapped close to each other providing an intuitive mechanism and ease of
access for maximising the institute’s digital information and knowledge resources
particularly for users with limited domain knowledge.
This study also sheds some light on the power of the SOM in solving problems of
high-dimensional data. The trained SOM and the user interface are now usable to
both browse the collection and to automatically map new documents. It can
successfully make a distinction between the various types of documents and
efficiently clusters similar publications to near by locations. It is quite evident that
the WEBSOM can effectively visualize the results and is thus especially suitable for
exploration tasks without the need to come up with search expressions, which may
be difficult even with a rather clear idea of the desired information. The method is
a major breakthrough with respect to the much harder problem, for which search
methods are usually not even expected to offer much support, encountered when
there exists only a vague idea of the object of interest. The same hold true if and
when the area of interest resides at the outer edges of one’s current knowledge.
This full-fledged report presents most of the situations that may be encountered in
a project that explores the practical application of a WEBSOM method to solve the
basic problem of devising a suitable search expression, which could neither leave
out relevant documents, nor produce long listings of irrelevant hits. The report also
provides the general context of text retrieval and a detailed discussion on the actual
method used in this research in the various sections. The step-by-step procedures
and functions used in both encoding the document collection (preprocessing),
computation of the Kohonen feature map and the development of the web-based
map interface as well as a discussion of the essential results together with the codes
used are included in the report.
Description
Keywords
Text Retrieval