AAU-ETD :: Browsing by Author "Mideksa Desalegn"

Browsing by Author "Mideksa Desalegn"

Now showing 1 - 1 of 1

Statistical Afaan Oromo Grammar Checker
(Addis Ababa University, 2015-02-05) Mideksa Desalegn; Abebe Ermias (Ato)
Natural Language Processing (NLP) is a research area that focuses on developing systems that allow computers to communicate with people using everyday language. In order to communicate through natural languages, grammatical correctness of a language is very significant. Therefore, it is very important to have natural language processing applications that recognize the grammatical errors that may occur in natural language texts. The natural language processing application that recognizes the grammatical error of a language is called grammar checker. Different approaches can be used to develop a grammar checker for a language. These are rule based, statistical and hybrid approaches. In this study statistical Afaan Oromo grammar checker is developed and tested using a prepared dataset. In the statistical approaches of grammar checking two techniques can be used for detecting the grammatical correctness of a given sentences. The first one is token n-gram, in which sequence of token are extracted and the second is tag n-gram, in which sequence of tag are extracted. In this study these two techniques of statistical approach are used and their performance is tested on 85 Afaan Oromo sentences. The evaluation results show that the performance of token n-gram in identifying incorrect sentence is a recall 100%, precision of 78.1% and F-measure of 89.0% and the performance of tag n-gram technique in identifying incorrect sentences is a recall of 86%, precision of 82.6% and F-measure of 84.3%. On the other hand, the performance of token n-gram technique in identifying correct sentences is a recall 60%, precision of 100% and F-measure of 80% and the performance of tag n-gram technique in identifying correct sentence is a recall of 74.2%, precision of 78.2% and F-measure of 76.4%. There are also some reasons that lead to the low performance of the two techniques. The first one is the issue related to the performance of sentence boundary detector, word splitter, POS tagger and morphological analyzer modules. Another reason is for the low performance of the two techniques is related to the quality of the corpus (spelling error, the spacing error). As a result this study recommends the following recommendation in order to increase the performance of the grammar checker. The first one is using spelling checker in order to increase the performance of POS tagger and Morphological analyzer. The other is using good quality corpus and good performing POS tagger and Morphological analyzer.

Browsing by Author "Mideksa Desalegn"

Results Per Page

Sort Options