Addis Ababa University Libraries Electronic Thesis and Dissertations: AAU-ETD! >
School of Information Science and Computer Science >
Thesis - Information Science >
Please use this identifier to cite or link to this item:
|Title: ||THE APPLICATION OF WEBSOM FOR AMHARIC TEXT RETRIEVAL|
|Authors: ||MAMUYE, BIZUNEH|
|Advisors: ||W/rt Saba Amsalu|
Ato Tesfaye Biru
W/ro Wonishet Abdela
|Copyright: ||2003 |
|Date Added: ||9-May-2008 |
|Publisher: ||Addis Ababa University|
|Abstract: ||This research explored the applicability of WEBSOM (Web Based Self Organizing map) for retrieving texts
written in Amharic language. The method applies a neural network's self organizing algorithm for
generating the map display. The map display detects complex relationships among given documents, and
reveals the relationships based on the arrangements of terms abstracted from the documents.
To conduct the experiment, 330 Amharic news articles of three classes were collected from the Ethiopian
News Agency. 248 of the news articles were taken as a training set and the remaining as a test set. For the
purpose of document representation, the Vector Space Model was used. Non-content bearing terms were
removed from the lists of terms identified from the headline and slug parts of the news articles and
suffix/prefix-stripping technique was applied on the remaining list. After changing terms having different
writing forms in to one common form, terms with a total frequency of above 70 and below 3 were discarded
from the list. Then, a matrix both for the training and test set were constructed on the remaining 142 terms.
A normalized weight was assigned to each term in a given news article based on TF-IDF (Term Frequency-
Inverse Document Frequency) weighting technique and the vector matrix were prepared in appropriate
format for the tool to be used.
Using Nenet (Neural Network Tool), the SOM map was trained with the 248 articles in the training set and
tested with three test sets selected from the three classes of news articles. From the distribution of these
articles on the map, it was observed that the map placed similar articles near to each other. The results
obtained from the three tests made, indicated that the clustering capability of the SOM for Amharic
documents is promising.
Lastly, a map was constructed for the entire (330) news articles and an HTML based prototype browsing
interface map was developed and labled with descriptive terms that convey properties of the area. A link
was also made with the actual database through the Active Server Pages created so that users can browse
on the map for relevant articles.|
|Description: ||A thesis submitted to the school of Graduate Studies of Addis Ababa University in Partial fulfillment for the Degree of Master of Science in Information Science|
|Appears in:||Thesis - Information Science|
Items in the AAUL Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.