Development of Tigrigna Grammar Checker Using Hybrid Approach
No Thumbnail Available
Date
6/3/2018
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Grammar checker is one of the Natural language processing studies that is used to determine whether users‟ input text is grammatically correct or not. These days, human beings produce lots of texts to communicate through mobile, internet applications, etc. To meet the objective of the communication, we require grammatically correct texts. It can be hard to manually check grammatical error in the text without delay. It also leads to unwanted expenses like time and cost. Thus, to solve the problem several grammar checker systems have been developed for several languages such as Amharic, Afaan-Oromo, English, Swedish, and others. However, when we come to Tigrigna language there is no study attempted to develop grammar checker. Thus, the main objective of this thesis is to present the development of Tigrigna grammar checker.
The proposed Tigrigna grammar checker system is developed using a hybrid of statistical and rule based approach to identify grammar errors in Tigrigna sentences by categorizing grammar errors within a sentence into five grammar errors such as Subject-Verb Agreement, Object-Verb Agreement, Adverb-Verb Agreement, Noun-Modifier Agreement, and Word Sequence Agreement. The system has five main modules namely Preprocessing, Tag sequence gathering, Rule based grammar checker, Statistical grammar checker, and Grammar error filtering. The preprocessing module is used to tokenize and tag tokens within the input text sentence. In the tag sequence gathering module we perform tag splitting, tag sequence extraction, and tag sequence probability calculation to fill the language model with statistical data of the Tigrigna corpus. In the rule based grammar checker module, Subject-Verb Agreement, Object-Verb Agreement, Adverb-Verb Agreement and Noun-Modifier Agreement, grammatical error of the input text sentence is identified by matching patterns against the manually hand written 114 agreement grammatical rules. In the statistical grammar checker module, the WSA grammar error of the input text sentence is determined according to unique tag sequence and probability language model.
We implement the system using C# programming language, SharpNLP as tools and 4645 part of speech annotated sentences as corpus. The evaluation of the system is conducted in four experiments using manually prepared 300 grammatical correct and incorrect sentences. From the ii
experiments result, the system performs average results of 87.9% precision, 87.5% recall, and 87.6% f-measure.
Description
Keywords
Grammar Checker, Natural Language Processing, Rule Based Grammar Checker, Statistical Based Grammar Checker, Hybrid Grammar Checker, Adverb-Verb, Noun-Modifier, Word Sequence, Object-Verb, Subject-Verb