Grammar Checker for Amharic Language Using Probabilistic and Rule-Based Approach

dc.contributor.advisorMartha Yifiru
dc.contributor.authorTsedeniya Kinfe
dc.date.accessioned2025-09-05T21:23:26Z
dc.date.available2025-09-05T21:23:26Z
dc.date.issued2019-07-01
dc.description.abstractGrammar checker is an natural language processing (NLP) application which validates a sentence grammatically based on a predefined rule. Grammar checkers have been developed using different techniques for different languages. Investigation on the development of Amharic grammar checker was conducted by [1] using two approaches ( i.e. rule based and statistical approach) independently to handle simple and complex Amharic sentence respectively. The rule-based approach performs better compared to the statistical method in detecting grammatical error for simple sentences. Purpose of this research is, therefore, to investigate the application of a probabilistic method with rule-based approach in the development of Amharic grammar checker. In addition, in the previous study bi-gram and trigram LM were used for handling complex sentences. Tri-gram LM achieved better performance but it fails to detect long-distance disagreement within a sentence. Therefore, this paper investigated a long distance agreement by building higher order n-gram LM i.e. 4-gram and 5-gram LM. Moreover, the use of dependency parsing (DP) in Amharic grammar checking has also been investigated. To conduct the experiment, POS tagged data was used from [2] and this corpus is used for investigating the probabilistic method with rule-based approach and higher order n-gram LM. We have developed automatic POS tagger and chunkier to easily identify the subject, object, and verb of a sentence. SRILM toolkit is used to develop tri-, Quadra-, and pent-gram LM. The Treebank corpus from [3] is used to investigate dependency parser. To validate and optimize the Treebank Malt optimizer is used. To train and parse new sentences Malt Parser toolkit is used. Each parsed sentence is grammatically analyzed based on the crafted rules. To evaluate the parsed sentence LAS, LA, and UAS metrics are used and are found in Malt Val toolkit. The result achieved using probabilistic method with rule-based approach is 72% of accuracy while the higher order n-gram method resulted in 65%, 67% and 65.6% for tri-, Quadra-, pent-gram language models, respectively. Whereas dependency parser scores 81.5%, 94.2% and 84.4% in LAS, UAS, and LA respectively. The overall accuracy achieved by DP is 84.7% in detecting grammatical errors.
dc.identifier.urihttps://etd.aau.edu.et/handle/123456789/7341
dc.language.isoen
dc.publisherAddis Ababa University
dc.subjectGrammar checke
dc.subjectRule based grammar checker
dc.subjectNatural Language Processing
dc.titleGrammar Checker for Amharic Language Using Probabilistic and Rule-Based Approach
dc.typeThesis

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Tsedeniya Kinfe.pdf
Size:
55.74 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: