Addis Ababa University Libraries Electronic Thesis and Dissertations: AAU-ETD! >
School of Information Science and Computer Science >
Thesis - Information Science >

Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/1010

Advisors: W/rt Atelach Alemu
Ato Mesfin Getachew
Dr. Abebe G/Tsadik
Copyright: 2003
Date Added: 9-May-2008
Publisher: Addis Ababa University
Abstract: Natural language processing is a research area which is becoming increasingly popular each day for both academic and commercial reasons. Higher NLP systems (e.g., machine translation) are materialized only when the lower ones (e.g., part-of-speech tagger, syntactic parser) are successfully built. This functional dependency exists even among the lower NLP systems. A morphological analyzer can be an important component for a partof- speech (POS) tagger particularly in dealing with unknown words. A POS tagger, which is a system that uses various sources of information to assign possibly unique POSs to words, in turn, can be used as an input to a syntactic parser. Writers in the area of NLP argue that if the POS tagger is accurate, this method is an excellent one. This thesis can be taken as an attempt to integrate ideas and outputs of previously attempted Amharic NLP prototypes towards solving a bit further problem in the NLP of the language, i.e. automatic Amharic complex sentence parsing. Syntactic parsing underlies most of the applications in natural language processing. Parsers are already being used extensively in a number of disciplines such as in computer science (for compiler construction, database interfaces, artificial intelligence, etc), and in linguistics (for text analysis, corpora analysis, machine translation, etc.). Although there have been some comprehensive studies of Amharic syntax from a linguistic perspective, attempts for investigating it from a computational point of view is a very recent story. In this thesis, Amharic word and phrase classes, sentence formalisms, morphological properties peculiar to complex sentence formation in the language, and attempts to extract such features that enable implementation of automatic Amharic complex sentence parser is presented. The sample data used in this study has been taken from references that are widely used in the teaching-learning process of the language. This data has also been manually analyzed, tagged, parsed, and then used as a corpus to extract the grammar rules and to assign probabilities. Algorithms that can use the morphological, lexical and syntactic properties of the language have been customized and modified. Experiments have been conducted in this study using the training set and test set. The first experiment was conducted on the part-of-speech tagger to see the state of its performance when a morphological analysis is embedded in it. The result of this experiment showed that the tagger attained 98.7% and 94% of accuracies on the training set and the test set, respectively. The experiments on complex sentence parsing showed 89.6% accuracy result on the training set and 81.6% accuracy result on the test set prepared for this purpose.
Description: A thesis submitted to the School of Graduate Studies of Addis Ababa University in partial fulfillment of the requirements for the Degree of Master of Science in Information Science.
URI: http://hdl.handle.net/123456789/1010
Appears in:Thesis - Information Science

Files in This Item:

File Description SizeFormat

Items in the AAUL Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.


  Last updated: May 2010. Copyright © Addis Ababa University Libraries - Feedback