Browsing by Author "Amsalu, Saba"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Amharic Text retrieval: an Experiment using Latent Semantic Indexing (LSI) with Singular Value Decomposition (SVD(2003-07) Hailemeskel, Tewodros; Amsalu, Saba; Lisanu, KiburThe increase in the amount of electronic information has caused increasing need for efficient information retrieval techniques. Most techniques to retrieving textual materials from databases depend on exact term match between terms in user’s query and terms by which documents are indexed. However, since there are usually many ways to express the same concept, the terms in the user’s query may not appear in a relevant document. Alternatively, many words can have more than one meaning. Due to these facts term matching methods are likely to miss relevant documents and also retrieve irrelevant ones (Dumais, 1992; Berry, Dumais & Letsche, 1995). The Latent Semantic Indexing (LSI) technique of information retrieval can partially handle these problems by organizing terms and documents into a “semantic” structure more appropriate for information retrieval. This is done by modeling the inherent higher-order pattern in the association of terms with documents. In this thesis, the potential of LSI approach in Amharic text retrieval is investigated. 206 Amharic documents and 25 queries were used to test the approach. Automatic indexing of the documents resulted in 9256 unique terms which are not in the stop-word list used for the research. A 110-factor SVD of the term by document matrix is used for indexing and retrieval. Finally, the performance of the LSI approach is compared with the standard vector space. Except at one standard recall level (0.80) precision of the LSI approach was above that of the standard vector space.Item Application of Case-based Reasoning for Amharic Legal Precedent Retrieval: A Case Study with the Ethiopian Labor Law(Addis Ababa University, 2002-07) Tadesse, Ethiopia; Biru, Tesfaye; Amsalu, SabaThis Research is concerned with the development of a Case-Based Reasoning (CBR) based precedent retrieval system in the domain of Ethiopian Labor Law. The requirement for the system was to build a knowledge base in which complete decided cases could be entered and then recalled when similar cases arose again. Standard case representation to the original knowledge source (legal cases) has been used to store legal cases. Legal cases have a predefined case structure with a number of features. The features are extracted to reflect the important aspects of a legal case. Given a new case, the feature values are used to do the search for a similar case from the casebase. Content based matching mechanism is used in the retrieval process. Content based matching matches the equivalent parts of the target and the source cases and calculates the degree of similarity according to the number of features matched, and feature weights. To increase the retrieval effectiveness, a mechanism for feature importance value (weight) assignment was required. The approach adopted takes into account domain experts' opinions to assign weights to the features. A Case-Based Reasoning prototype has been implemented by using the CBR-Works toolkit. To facilitate the insertion of additional cases and searching, an online interface has also been included.Item The Application of Information Retrieval Techniques to Amharic Documents on the Web(Addis Ababa University, 2001-07) Amsalu, Saba; Teferi, Dereje (PhD); Meshesha, Million (PhD)The World Wide Web is an escalating mass of interconnected data that stretches from computer to computer across the world. Information retrieval systems on the Web provide users with relevant information without human intervention, saving time, labor and money. The Web contains documents of diverse content in different languages. Making those documents accessible to users has become a difficult task with the fast growth of the Web. Hence developing information retrieval systems to cope with inherent features of Web data has been a research area of tile time in information science. In this study an attempt is made to explore the possibilities of applying some information retrieval techniques for Amharic documents on the Web. To back tile research, literature review on related works has been made. Different information retrieval techniques and algorithms used on other languages have been reviewed to determine the possibilities of applying them to Amharic documents on the Web. A database that stores Amharic Web page data, suffix list and index files has been designed. Web page submission form was developed to allow the submission of Web page data into the database. Designing an Amharic •query input interface was also part of the research. Automatic indexing and searching techniques have been applied on a collection of 313 Web pages of Amharic documents taken from Walta Information Center news publications. Word and stem inverted index options were explored. An Amharic search interface was then created to handle Amharic data on the Web using ColdFusion Studio and ColdFusion Server 4.0 on Windows NT 4.0 Operating System and Internet Information Server (liS). The searching algorithm that was implemented is Expended Boolean model, which is a Boolean model with a vector functionality that allowed to rank retrieved documents. To measure tile performance of the prototype system, retrieval experiments have been conducted for twenty-two queries and an average recall-precision graph is drawn. Using terms with suffixes and prefixes removed resulted in a better performance than using words Finally, conclusions are drawn based on the test results obtained and recommendations are made as 10 what further researches could be done for the development of Amharic information retrieval systems on the Web.Item Text Retrieval using Self-organised Document Map: The Case of ILRI Digital Library(2002-06) Bayeh, Mulugeta; Teferi Saba, Dereje; Amsalu, SabaThe current availability of large collections of full-text documents in electronic form emphasises the need for intelligent information retrieval techniques. Especially in the rapidly growing digital libraries and distributed access, it is important to have automatic methods for exploring document collections. In this study, the WEBSOM method is used with a quarter of century of research publications maintained by the International Livestock Research Institute for this task. The Self- Organising Map (SOM), also known as Kohonen’s feature map (a means for automatically arranging high-dimensional statistical data), is used to position encoded documents onto a map that provides a general view into the text collection. The general view visualises similarity relations between the documents on a two-dimensional map display, which can be utilised in exploring the material rather than having to rely on traditional search expressions. Similar documents become mapped close to each other providing an intuitive mechanism and ease of access for maximising the institute’s digital information and knowledge resources particularly for users with limited domain knowledge. This study also sheds some light on the power of the SOM in solving problems of high-dimensional data. The trained SOM and the user interface are now usable to both browse the collection and to automatically map new documents. It can successfully make a distinction between the various types of documents and efficiently clusters similar publications to near by locations. It is quite evident that the WEBSOM can effectively visualize the results and is thus especially suitable for exploration tasks without the need to come up with search expressions, which may be difficult even with a rather clear idea of the desired information. The method is a major breakthrough with respect to the much harder problem, for which search methods are usually not even expected to offer much support, encountered when there exists only a vague idea of the object of interest. The same hold true if and when the area of interest resides at the outer edges of one’s current knowledge. This full-fledged report presents most of the situations that may be encountered in a project that explores the practical application of a WEBSOM method to solve the basic problem of devising a suitable search expression, which could neither leave out relevant documents, nor produce long listings of irrelevant hits. The report also provides the general context of text retrieval and a detailed discussion on the actual method used in this research in the various sections. The step-by-step procedures and functions used in both encoding the document collection (preprocessing), computation of the Kohonen feature map and the development of the web-based map interface as well as a discussion of the essential results together with the codes used are included in the report.