Automatic Morphological Analyzer for Amharic an Experiment Employing Unsupervised Learning and Autosegmental Analysis Approaches

Bayu, Tesfaye

Automatic Morphological Analyzer for Amharic an Experiment Employing Unsupervised Learning and Autosegmental Analysis Approaches

dc.contributor.advisor	Biru, Tesfaye
dc.contributor.advisor	Leyew, Zelealem(PhD)
dc.contributor.advisor	Getachew, Mesfin
dc.contributor.author	Bayu, Tesfaye
dc.date.accessioned	2018-11-30T12:10:31Z
dc.date.accessioned	2023-11-18T12:44:08Z
dc.date.available	2018-11-30T12:10:31Z
dc.date.available	2023-11-18T12:44:08Z
dc.date.issued	2002-06
dc.description.abstract	Automatic understanding of natural languages requires a set of language processing tools. A morphological analyzer, which parses words into their morphemic components, is one of these tools. This thesis reports an attempt intended to develop such a tool for Amharic. Word formation in Amharic involves three levels of morphological operations – stem formation, affixation and cliticization. Since affixation and cliticization are similar with those in Indio-European languages, a language independent system tested in these languages is used. The system, called Linguistica2001, creates morphological dictionary (called signature) by extracting prefixes, stems and suffixes from a given corpus. The system uses the modified version of Harris’s Algorithm of Successor Frequency to detect plausible word break points. Additional heuristics are used to improve the word breaks produced. Minimum Description Length (MDL) test serves as a benchmark to accept a signature as part of the morphology of a given language. For the stem internal operations, another approach based on the principle of autosegmental Phonology is used. This principle represents phonemic features of a word in different tiers and uses association lines to maintain their relationships. This approach is used to design algorithms and data structures required for extraction and representation of stem components. A prototype system, called Amharic Stems Morphological Analyzer (ASMA), is developed to test the algorithms. Though the two systems are tested separately, ASMA is designed to work in an integrated manner by accepting as its input stems identified by Linguistica2001. The experiment is conducted using corpuses prepared in this study. The experimental result obtained is encouraging. Linguistica2001 parses successfully 87% of words of the test data (433 of 500 words). This result corresponds to a precision of 95% and a recall of 90%. The second system analyses 241 (or 94%) of the255 sample stems correctly.	en_US
dc.identifier.uri	http://etd.aau.edu.et/handle/12345678/14762
dc.language.iso	en	en_US
dc.publisher	Addis Ababa University	en_US
dc.subject	Natural language processing	en_US
dc.title	Automatic Morphological Analyzer for Amharic an Experiment Employing Unsupervised Learning and Autosegmental Analysis Approaches	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Tesfaye Bayu.pdf
Size:: 599.22 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Information Sciences