Browsing by Author "Biru, Tesfaye"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Application of Case-based Reasoning for Amharic Legal Precedent Retrieval: A Case Study with the Ethiopian Labor Law(Addis Ababa University, 2002-07) Tadesse, Ethiopia; Biru, Tesfaye; Amsalu, SabaThis Research is concerned with the development of a Case-Based Reasoning (CBR) based precedent retrieval system in the domain of Ethiopian Labor Law. The requirement for the system was to build a knowledge base in which complete decided cases could be entered and then recalled when similar cases arose again. Standard case representation to the original knowledge source (legal cases) has been used to store legal cases. Legal cases have a predefined case structure with a number of features. The features are extracted to reflect the important aspects of a legal case. Given a new case, the feature values are used to do the search for a similar case from the casebase. Content based matching mechanism is used in the retrieval process. Content based matching matches the equivalent parts of the target and the source cases and calculates the degree of similarity according to the number of features matched, and feature weights. To increase the retrieval effectiveness, a mechanism for feature importance value (weight) assignment was required. The approach adopted takes into account domain experts' opinions to assign weights to the features. A Case-Based Reasoning prototype has been implemented by using the CBR-Works toolkit. To facilitate the insertion of additional cases and searching, an online interface has also been included.Item Automatic Morphological Analyzer for Amharic an Experiment Employing Unsupervised Learning and Autosegmental Analysis Approaches(Addis Ababa University, 2002-06) Bayu, Tesfaye; Biru, Tesfaye; Leyew, Zelealem(PhD); Getachew, MesfinAutomatic understanding of natural languages requires a set of language processing tools. A morphological analyzer, which parses words into their morphemic components, is one of these tools. This thesis reports an attempt intended to develop such a tool for Amharic. Word formation in Amharic involves three levels of morphological operations – stem formation, affixation and cliticization. Since affixation and cliticization are similar with those in Indio-European languages, a language independent system tested in these languages is used. The system, called Linguistica2001, creates morphological dictionary (called signature) by extracting prefixes, stems and suffixes from a given corpus. The system uses the modified version of Harris’s Algorithm of Successor Frequency to detect plausible word break points. Additional heuristics are used to improve the word breaks produced. Minimum Description Length (MDL) test serves as a benchmark to accept a signature as part of the morphology of a given language. For the stem internal operations, another approach based on the principle of autosegmental Phonology is used. This principle represents phonemic features of a word in different tiers and uses association lines to maintain their relationships. This approach is used to design algorithms and data structures required for extraction and representation of stem components. A prototype system, called Amharic Stems Morphological Analyzer (ASMA), is developed to test the algorithms. Though the two systems are tested separately, ASMA is designed to work in an integrated manner by accepting as its input stems identified by Linguistica2001. The experiment is conducted using corpuses prepared in this study. The experimental result obtained is encouraging. Linguistica2001 parses successfully 87% of words of the test data (433 of 500 words). This result corresponds to a precision of 95% and a recall of 90%. The second system analyses 241 (or 94%) of the255 sample stems correctly.Item Hidden Markov Model Based Large Vocabulary, Speaker Independent, Continuous Amharic Speech Recognition(Addis Ababa University, 2003-06) Seifu, Zegaye; Biru, Tesfaye; Birhanu, Solomon; Taddesse, KnfeThis study investigated the possibility of developing large vocabulary, speaker independent and continuous speech recognizer for Amharic language. The recognizer was developed using Hidden Markov Model; and the Hidden Markov Modeling Toolkit was used to implement it. In the process, a corpus developed by Solomon Tefera was used to get the required data for training and testing the models. It was a database comprised of 8000 utterances that were used for training and 500 plus sentences for development and evaluation. The data was preprocessed in line with the requirements of the HTK toolkit. In order to support the acoustic models, a bigram language model was constructed. In addition, pronunciation dictionary was prepared and used as an input. Since the recognizer was meant to recognize large vocabulary and continuous speech, phonemes were opted as the basic unit of recognition. However phonemes are known to be context independent units, given that the environment in which a sound is put makes a difference in the way it is pronounced. Thus after the monophone based speech recognizer was built, it was promoted to triphone based system in which the left and right contexts were considered and modeled. Besides, the mixture components of the states of the models were incremented in view of optimizing the performance of the recognizers.Item Incorporation of Relevance Data in the Term Discrimination Value(Addis Ababa University, 1987-09) Biru, TesfayeIndexing in information retrieval is used to obtain a suitable vocabulary of index terms and optimum assignment of these terms to documents for increasing the effectiveness and efficiency of the the retrieval system. A great many automatic indexing models have been developed over the years in an effort to produce indexing methods that are both effective and usable in practice. One of the most elegant approaches for automatic selection and weighting of index terms is the term discrimination value that has been developed by Salton and his co-workers. This model ranks the index terms in accordance with how well they are able to discriminate the documents of a collection from each other; that is, the value of an index term depends on how much the average separation between individual documents changes when the given term is assigned for content identification. It is suggested that the most useful index terms, those which achieve greatest separation, are the medium frequency terms. Since the basic requirement in effective retrieval is the separation between documents which are relevant to a given query and documents which are not relevant to that query, a more complete picture of a term behavior may be obtained by the consideration of its ability to effect greater separation between relevant and non-relevant documents while at the same time moving relevant documents close to each other. This study was aimed at testing the extent to which the discrimination value model considers relevance characteristics of documents in ranking the index terms. An over-view of the more important ideas current in automatic indexing is provided. The term discrimination value model is discussed in greater detail. An efficient technique for computing exact term discrimination values for relevant - non-relevant document distinction is introduced. The study is conducted using the KEEN, CRANFIELD, EVANS, HARDING and LISA document collections and their associated queries and relevance judgments While some of the results are consistent with those derived by previous workers, in some cases, specially in the case of relevant - relevant discrimination, the results obtained appear to be in complete disagreement with that of Slaton’s theory: that the medium frequency terms are not the most useful terms.