Grammar Checker for Amharic Language Using Probabilistic and Rule-Based Approach
No Thumbnail Available
Date
2019-07-01
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Grammar checker is an natural language processing (NLP) application which validates a sentence grammatically based on a predefined rule. Grammar checkers have been developed using different techniques for different languages. Investigation on the development of Amharic grammar checker was conducted by [1] using two approaches ( i.e. rule based and statistical approach) independently to handle simple and complex Amharic sentence respectively. The rule-based approach performs better compared to the statistical method in detecting grammatical error for simple sentences. Purpose of this research is, therefore, to
investigate the application of a probabilistic method with rule-based approach in the development of Amharic grammar checker. In addition, in the previous study bi-gram and trigram LM were used for handling complex sentences. Tri-gram LM achieved better performance but it fails to detect long-distance disagreement within a sentence. Therefore, this paper investigated a long distance agreement by building higher order n-gram LM i.e. 4-gram and 5-gram LM. Moreover, the use of dependency parsing (DP) in Amharic grammar checking has also been investigated.
To conduct the experiment, POS tagged data was used from [2] and this corpus is used for investigating the probabilistic method with rule-based approach and higher order n-gram LM. We have developed automatic POS tagger and chunkier to easily identify the subject, object, and verb of a sentence. SRILM toolkit is used to develop tri-, Quadra-, and pent-gram LM. The Treebank corpus from [3] is used to investigate dependency parser. To validate and optimize the Treebank Malt optimizer is used. To train and parse new sentences Malt Parser toolkit is used. Each parsed sentence is grammatically analyzed based on the crafted rules. To evaluate the parsed sentence LAS, LA, and UAS metrics are used and are found in Malt Val toolkit.
The result achieved using probabilistic method with rule-based approach is 72% of accuracy while the higher order n-gram method resulted in 65%, 67% and 65.6% for tri-, Quadra-, pent-gram language models, respectively. Whereas dependency parser scores 81.5%, 94.2% and 84.4% in LAS, UAS, and LA respectively. The overall accuracy achieved by DP is 84.7% in detecting grammatical errors.
Description
Keywords
Grammar checke, Rule based grammar checker, Natural Language Processing