Design and Development of Amharic Grammar Checker
No Thumbnail Available
Date
2013-03
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Most human knowledge is recorded in natural language. The records are kept in computers or on
paper to be manipulated and reserved for use in the future. Natural Language processing plays an
important role in increasing computers capability to understand natural languages. Designing and
implementing computer programs that can understand natural language is the aim of the works in
the area of Natural Language Processing. In order to communicate through natural languages
grammatical correctness is very crucial. Therefore, natural language processing applications
should be enabled to recognize the grammatical errors of natural language texts. This process is
known as grammar checking. This work introduces development and design of Amharic
grammar checker.
Two grammar checker approaches have been used in this research. The first approach is a rulebased
and it is tested for simple sentences. The rules are constructed manually and matched
against the patterns of the sentence to be checked. The second approach is statistical approach
and tested for both simple and complex sentences. In the statistical Amharic grammar checker, ngram
and probabilistic methods are used to check grammatical errors of Amharic sentence. The
patterns and the corresponding probabilities of occurrence are automatically extracted from the
training corpus and stored in a repository. Sentence probability can be calculated using these
patterns and probabilities. Then, probability of the sentence and specified threshold are used to
determine the correctness of the sentence. The corpus, both for training and test set, is prepared
from a manually part-of-speech text of the language.
The evaluation is made in two test cases. The first case is done on simple sentences. In this test
case, 92.45% precision and 94.03% recall is obtained for the rule-based Amharic grammar
checker. On the same test case, the statistical Amharic grammar checker (trigram) shows
precision and recall of 67.14% and 90.38% respectively. The statistical Amharic grammar
checker is tested using complex sentences in the second test case. In this test case, 63.76% of the
errors are detected. The evaluation result shows that each approach is capable of detecting
multiple errors from a sentence. The false alarms are due to the incomplete grammatical rules
and quality of the statistical data. The accuracy of morphological analyzer also affects the
grammar checking result in both approaches.
Keywords: Statistical grammar checker, rule-based grammar checker, n-gram, POS tag sequence
Description
Keywords
Statistical Grammar Checker; Rule-Based Grammar Checker; N-Gram, POS Tag Sequence