Automatic Morphological Analyzer for Amharic an Experiment Employing Unsupervised Learning and Autosegmental Analysis Approaches

dc.contributor.advisorBiru, Tesfaye
dc.contributor.advisorLeyew, Zelealem(PhD)
dc.contributor.advisorGetachew, Mesfin
dc.contributor.authorBayu, Tesfaye
dc.date.accessioned2018-11-30T12:10:31Z
dc.date.accessioned2023-11-18T12:44:08Z
dc.date.available2018-11-30T12:10:31Z
dc.date.available2023-11-18T12:44:08Z
dc.date.issued2002-06
dc.description.abstractAutomatic understanding of natural languages requires a set of language processing tools. A morphological analyzer, which parses words into their morphemic components, is one of these tools. This thesis reports an attempt intended to develop such a tool for Amharic. Word formation in Amharic involves three levels of morphological operations – stem formation, affixation and cliticization. Since affixation and cliticization are similar with those in Indio-European languages, a language independent system tested in these languages is used. The system, called Linguistica2001, creates morphological dictionary (called signature) by extracting prefixes, stems and suffixes from a given corpus. The system uses the modified version of Harris’s Algorithm of Successor Frequency to detect plausible word break points. Additional heuristics are used to improve the word breaks produced. Minimum Description Length (MDL) test serves as a benchmark to accept a signature as part of the morphology of a given language. For the stem internal operations, another approach based on the principle of autosegmental Phonology is used. This principle represents phonemic features of a word in different tiers and uses association lines to maintain their relationships. This approach is used to design algorithms and data structures required for extraction and representation of stem components. A prototype system, called Amharic Stems Morphological Analyzer (ASMA), is developed to test the algorithms. Though the two systems are tested separately, ASMA is designed to work in an integrated manner by accepting as its input stems identified by Linguistica2001. The experiment is conducted using corpuses prepared in this study. The experimental result obtained is encouraging. Linguistica2001 parses successfully 87% of words of the test data (433 of 500 words). This result corresponds to a precision of 95% and a recall of 90%. The second system analyses 241 (or 94%) of the255 sample stems correctly.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/12345678/14762
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectNatural language processingen_US
dc.titleAutomatic Morphological Analyzer for Amharic an Experiment Employing Unsupervised Learning and Autosegmental Analysis Approachesen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Tesfaye Bayu.pdf
Size:
599.22 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: