AAU-ETD :: Browsing by Author "Getachew, Mesfin"

Browsing by Author "Getachew, Mesfin"

Now showing 1 - 5 of 5

Automatic Morphological Analyzer for Amharic an Experiment Employing Unsupervised Learning and Autosegmental Analysis Approaches
(Addis Ababa University, 2002-06) Bayu, Tesfaye; Biru, Tesfaye; Leyew, Zelealem(PhD); Getachew, Mesfin
Automatic understanding of natural languages requires a set of language processing tools. A morphological analyzer, which parses words into their morphemic components, is one of these tools. This thesis reports an attempt intended to develop such a tool for Amharic. Word formation in Amharic involves three levels of morphological operations – stem formation, affixation and cliticization. Since affixation and cliticization are similar with those in Indio-European languages, a language independent system tested in these languages is used. The system, called Linguistica2001, creates morphological dictionary (called signature) by extracting prefixes, stems and suffixes from a given corpus. The system uses the modified version of Harris’s Algorithm of Successor Frequency to detect plausible word break points. Additional heuristics are used to improve the word breaks produced. Minimum Description Length (MDL) test serves as a benchmark to accept a signature as part of the morphology of a given language. For the stem internal operations, another approach based on the principle of autosegmental Phonology is used. This principle represents phonemic features of a word in different tiers and uses association lines to maintain their relationships. This approach is used to design algorithms and data structures required for extraction and representation of stem components. A prototype system, called Amharic Stems Morphological Analyzer (ASMA), is developed to test the algorithms. Though the two systems are tested separately, ASMA is designed to work in an integrated manner by accepting as its input stems identified by Linguistica2001. The experiment is conducted using corpuses prepared in this study. The experimental result obtained is encouraging. Linguistica2001 parses successfully 87% of words of the test data (433 of 500 words). This result corresponds to a precision of 95% and a recall of 90%. The second system analyses 241 (or 94%) of the255 sample stems correctly.
Automatic Part of Speech Tagging For, Amharic Language an Experiment Using Stochastic Hidden Markov (Hmm) Approach
(Addis Ababa University, 2001-06) Getachew, Mesfin; Amare, Getahum (Prof.)
Natural Language processing, as a field of scientific inquiry, plays an important role in increasing computers capability to understand natural languages. Part of speech (POS) tagging is one effort in the task of understanding natural language, the language by which most human knowledge is recorded. The task of POS tagging is to assign unique part of speech tags to sentences that are presented as a linear string of words. POS tagging systems, which annotate corpora written in various languages (e.g. English), are used as components in many applications including phrase recognition, word sense disambiguation, grammatical function assigmnents and many others.Today, taggers of different kinds have been developed for languages, which have relatively wider use nationally and/or internationally. The same story is not true for Amharic, the working language of the Federal Government of Ethiopia, and one of the major languages of Ethiopia (Bender, 1976) for there are no systems (taggers of any sort) that all Notate corpora written in this language.
An Automatic Sentence Parser for Oromo Language using Supervised Learning Technique
(Addis Ababa University, 2002-06) Megersa, Diriba; Getachew, Mesfin; Meshesha, Million; Engdashet, Haile Eyesus
The goal of Information Retrieval has been to reduce human language complexities and as a result serve users in the most efficient way. The decisive tool in achieving such end is the Natural language Processing (NLP). NLP has many components in serving such purpose. Parsing is one of such components in NLP in improving precision and recall which is the goal of Information Retrieval Systems. Moreover, parsing is also used in the effort towards machine translation which is one of the heart of Natural Language Processing. Today, different kinds of parsers have been developed for languages, which have relatively wider use nationally and/or internationally since the 1960s. Unfortunately Oromo has not captured the advantage of such system being the working language of the State Government of Oromiya, and one of the major languages in Ethiopia and Africa (Abebe 2002) for there are no systems (parsers of any sort) that parse written texts in this language. This study is, therefore, an attempt to develop a simple automatic sentence parser for Oromo language. In the study, the chart algorithm was used with some modification. A module for morphological analyzer, which splits words into root form and their corresponding morpheme, was also developed in order to facilitate the preparation of texts in a file to be parsed with appropriate lexical categories. In addition, the unsupervised learning algorithm was designed to guide the parser in predicting unknown and ambiguous words in a sentence. Grammar rules, lexicon, morphological rules and contextual information were also designed on the basis of the review made on the linguistic properties of Oromo grammatical categories. This system, in fact, is the first in its kind for this language. The study adopts an intelligent (Rule-Based+ learning module) approach to develop a prototype, which is a simple Oromo parser for the language. The thesis, in short, describes processes of automated sentence parsing of Free Texts. That is, it is aimed at developing a prototype and conducting an experiment with it. The result obtained (95% on the training test and 88.5% on the test set) using the small manually parsed sentences encourage further research to be launched, especially with the aim of developing a full-fledged Oromo sentence parser.
Development of Stemming Algorithm for Wolaytta Text
(Addis Ababa University, 2003-06) Lessa, Lemma; Getachew, Mesfin; Alemu, Atelach; Engdashet, Haile Eyesus(PhD)
This study describes the design of a stemming algorithm for Wolaytta language. To give a solid background for the thesis, literatures on conflation in general and stemming algorithms in particular were reviewed. Since it is the nature and characteristics of affixation that guide the development of stemmer, the Wolaytta language morphology was studied and described in order to model the language and develop an automatic procedure for conflation. The inflectional and derivational morphologies of the language are discussed. It is indicated that suffixation is the main word formation process in Wolaytta language. It is also attempted to show that the language is morphologically complex and uses extensive concatenation of suffixes. The result of the study is a prototype context sensitive iterative stemmer for Wolaytta language. Error counting technique was employed to evaluate the performance of this stemmer. The stemmer was trained on 3537 words (80% of the sample text) and the improved version reveals an accuracy of 90.6% on the training set. The number of over stemmed and understemmed words on the training set were 8.6% (304 words) and 0.8% (28 words) respectively. When the stemmer runs on the unseen sample of 884 words (20% of the sample text), it performed with an accuracy of 86.9%. The percentage of errors recorded as understemmed and overstemmed on this unseen (test set) were 9% and 4.1%, respectively. Moreover, a dictionary reduction of 38.92% was attained on the test set. The major sources of errors are also reported with possible recommendations to further improve the performance of the stemmer and also for further research.
Examination of Abyei Arbitration Award
(Addis Ababa,University, 2017-06) Getachew, Mesfin
South Sudan attained independence on 9 July 2011, from the Sudan, through the Referendum. Unfortunately, the Referendum left some issues unresolved. The status of the Abyei Area was one of such unresolved issues and, consequently, it has still remained to be a bone of contention between the new born the Republic of South Sudan and the Republic of Sudan. Historically, northern and southern Sudan was administered by one colonial authority with distinct geographical, cultural entities, despite the fact that the Sudan existed as one single state. Thus, the Abyei Area had been one of the areas of contentious during the two bloody intra-state conflicts that took place after independence of the Sudan in 1956. After the formal separation and attainment of independent statehood by the South Sudan in July 2011, the long-standing dispute over the Abyei Area became an inter-state dispute. It may be worth mentioning that the Comprehensive Peace Agreement (CPA) signed between the GoS and SPLM/A in 2005, consisting of six protocols, including the one intended to serve as a basis for subsequent settlement of the outstanding dispute of over the Abyei Area. Thus, the CPA had provided the mechanisms how to ensure conclusive delimitation and demarcation of the Abyei Area between South Sudan and the Sudan. Accordingly, the Abyei Protocol established Abyei Boundary Commission (ABC) to define and demarcate the contested Abyei Area. The ABC has delivered its final and binding report in 2005. However, the GoS had rejected the Report of ABC on the ground that the ABC had exceeded its mandate, whereas the Southern Sudan, as represented by the SPLM/A, had accepted the Report as final and binding. The dispute over this Report was lasted more than three years and resulted in eruption of war in the Abyei Area that caused massive displacement and loss of innocent lives. In an effort to avoid further conflict, the Parties agreed to take their dispute to the Permanent Court of Arbitration (PCA) at The Hague for final and binding decision. On July 22, 2009, the PCA has issued its final and binding decision over the intra-state boundary dispute. However, the decision of Abyei Arbitration Award not yet been enforced and the final status of Abyei Area is not yet determined. Therefore, this thesis examines why Abyei Arbitration Award fail to be enforced and proposes an alternative solution to enforce the Award and determine the final status of Abyei Area.

Browsing by Author "Getachew, Mesfin"

Results Per Page

Sort Options