A Deterministic Approach to Tri-Radical Amharic Verb Derivatives Generation
dc.contributor.advisor | Yalemzewd Negash (PhD) | |
dc.contributor.author | Samrawit Kassaye | |
dc.date.accessioned | 2025-03-05T09:31:27Z | |
dc.date.available | 2025-03-05T09:31:27Z | |
dc.date.issued | 2022-02 | |
dc.description.abstract | Morphological synthesis or generation is a process of returning one or more surface forms from a sequence of underlying (lexical) forms. Today, synthesizers of different kinds have been developed for languages that have relatively wider use internationally. Amharic is the second most populous Semitic languages after Arabic. But it is not exploited in the digital world. In this research paper, a rule-based approach to morphologically derive or generate Amharic words from tri-radical verbs to finally generate rich Amharic lexicon is elaborated. This work utilizes two data sources namely Amharic word list and tri-radical verbs. The Amharic word list file contains more than 450,000 unique Amharic words. The tri-radical verb data source contains more than 350 unique tri-radical verbs. The proposed method works towards identifying rules from existing Amharic words after analyzing with tri-radical verbs. The new feature identified is applying index changing the letter of tri-radical verbs. Index changing (adding vowel to consonant letters) is one of the approaches used in morphological derivation of Amharic words from root(stem) words. After index changing of the tri-radical verbs, the index changed words will be searched in the Amharic word list file. If the index changed words are found directly or part of word with prefix and/or suffix, the pattern of the word with respect to the root verb and index changed words will be captured. From the pattern captured morphemes are extracted and rules are identified. 85,115 unique rules are identified. While identifying rules, the frequency of every rule is recorded in order to evaluate the efficiency of each rule. A memory-based machine learning approach applied to evaluate the frequency of the rules. From the 85,115 rules, the prefix of 29,776 rules and the suffix of 32,401 rules are wrong, and 11,390 rules are discarded by wrong index changing process. The rules identified showed the accuracy of 0.99, average precision of 0.88 and average recall of 0.85. Based on these rules, a comprehensive set of derivatives for tri-radical Amharic verbs were generated and end up having a rich Amharic Lexicon. | |
dc.identifier.uri | https://etd.aau.edu.et/handle/123456789/4454 | |
dc.language.iso | en_US | |
dc.publisher | Addis Ababa University | |
dc.subject | Morphological synthesis | |
dc.subject | Natural Language processing | |
dc.subject | Memory-based approach | |
dc.subject | Tri-radical verbs | |
dc.subject | Amharic Lexicon | |
dc.subject | Rule-based approach | |
dc.subject | Morpheme. | |
dc.title | A Deterministic Approach to Tri-Radical Amharic Verb Derivatives Generation | |
dc.type | Thesis |