Addis Ababa University Libraries Electronic Thesis and Dissertations: AAU-ETD! >
School of Information Science and Computer Science >
Thesis - Information Science >

Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/1009

Advisors: W/rt Saba Amsalu
Ato Tesfaye Biru
W/ro Wonishet Abdela
Copyright: 2003
Date Added: 9-May-2008
Publisher: Addis Ababa University
Abstract: This research explored the applicability of WEBSOM (Web Based Self Organizing map) for retrieving texts written in Amharic language. The method applies a neural network's self organizing algorithm for generating the map display. The map display detects complex relationships among given documents, and reveals the relationships based on the arrangements of terms abstracted from the documents. To conduct the experiment, 330 Amharic news articles of three classes were collected from the Ethiopian News Agency. 248 of the news articles were taken as a training set and the remaining as a test set. For the purpose of document representation, the Vector Space Model was used. Non-content bearing terms were removed from the lists of terms identified from the headline and slug parts of the news articles and suffix/prefix-stripping technique was applied on the remaining list. After changing terms having different writing forms in to one common form, terms with a total frequency of above 70 and below 3 were discarded from the list. Then, a matrix both for the training and test set were constructed on the remaining 142 terms. A normalized weight was assigned to each term in a given news article based on TF-IDF (Term Frequency- Inverse Document Frequency) weighting technique and the vector matrix were prepared in appropriate format for the tool to be used. Using Nenet (Neural Network Tool), the SOM map was trained with the 248 articles in the training set and tested with three test sets selected from the three classes of news articles. From the distribution of these articles on the map, it was observed that the map placed similar articles near to each other. The results obtained from the three tests made, indicated that the clustering capability of the SOM for Amharic documents is promising. x Lastly, a map was constructed for the entire (330) news articles and an HTML based prototype browsing interface map was developed and labled with descriptive terms that convey properties of the area. A link was also made with the actual database through the Active Server Pages created so that users can browse on the map for relevant articles.
Description: A thesis submitted to the school of Graduate Studies of Addis Ababa University in Partial fulfillment for the Degree of Master of Science in Information Science
URI: http://hdl.handle.net/123456789/1009
Appears in:Thesis - Information Science

Files in This Item:

File Description SizeFormat
BIZUNEH MAMUYE.pdf1.04 MBAdobe PDFView/Open

Items in the AAUL Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.


  Last updated: May 2010. Copyright © Addis Ababa University Libraries - Feedback