Amharic Spelling Error Detection and Correction System: Morphology-Based Approach

dc.contributor.advisorMelese, Michael (PhD)
dc.contributor.authorShimelis, Mariawit
dc.date.accessioned2021-02-12T06:44:15Z
dc.date.accessioned2023-11-18T12:47:11Z
dc.date.available2021-02-12T06:44:15Z
dc.date.available2023-11-18T12:47:11Z
dc.date.issued2020-09-10
dc.description.abstractNowadays, it is a common practice to notice spelling errors in typed Amharic documents. As Amharic is the official working language of Ethiopia and the second most spoken Semitic language in the world, the need to have an Amharic spelling error detection and correction system is evident. Previous attempts are made to develop a spell checker for Amharic. These works have attempted to use existing approaches likes metaphone and edit distance algorithms. However, these approaches are designed for languages which have a simple morphology like English. Moreover, unlike Amharic, in other languages distance between words is not dependent on the family and order of characters. As a result, using these approaches for a language with complex morphology like Amharic will not give the anticipated result. Similarly, using existing morphological analyzers for computational morphology is attempted in other works. Though the analyzers were reported to work with reasonable accuracy for valid words, their output for misspelled words is not clear. Accordingly, this study attempts to investigate the possibility of using morphology-based approach to design and develop an Amharic typing error detection and correction system for non-word errors. Design science methodology is employed in this study. It involves six activities namely, problem identification and motivation, defining objectives, design and development, demonstration, evaluation and communication. To carry out the experiment 717 morphological rules are defined, 2398 stem words selected from each category of root words are stored and the system has been tested with 1724 words selected from different derivational and inflectional categories. A prototype is developed using python programming language and it uses three knowledge bases which are stored in csv format. To evaluate the system, evaluation metrics precision, recall and predictive accuracy are used. The experimental results show 96% Lexical Recall, 89% Error Recall, 99% Lexical Precision, 70% Error Precision, 95% Predictive Accuracy and 70% corrections are generated for correctly identified invalid words. This shows that the system has high accuracy in flagging words as valid/ invalid and needs some improvement in suggestion generation. The system gives a good accuracy for selected words (with complex morphology) which are representative of words in the languages. Accordingly, it is concluded that the system is capable of detecting and correcting errors as long as the correct rule definition is defined and the corresponding stem is found inside the dictionary. It is also concluded that by applying the algorithms proposed for morphological analysis and distance calculation, morphology-based approach is suitable for Amharic spell checking than other approaches. For future works, improving rule definitions by including word classes, handling exceptions, including additional spell checker functionalities, expanding the work to include real-word errors and applying the proposed architecture for Amharic-English translation systems are recommended.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/12345678/25083
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectSpell Checkeren_US
dc.subjectMorphology-Based Approachen_US
dc.subjectN-Gramen_US
dc.subjectMorphologyen_US
dc.titleAmharic Spelling Error Detection and Correction System: Morphology-Based Approachen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Mariawit Shimelis 2020.pdf
Size:
3.39 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: