Statistical Afaan Oromo Grammar Checker

Mideksa Desalegn

Statistical Afaan Oromo Grammar Checker

dc.contributor.advisor	Abebe Ermias (Ato)
dc.contributor.author	Mideksa Desalegn
dc.date.accessioned	2018-11-08T07:42:53Z
dc.date.accessioned	2023-11-18T12:49:32Z
dc.date.available	2018-11-08T07:42:53Z
dc.date.available	2023-11-18T12:49:32Z
dc.date.issued	2015-02-05
dc.description.abstract	Natural Language Processing (NLP) is a research area that focuses on developing systems that allow computers to communicate with people using everyday language. In order to communicate through natural languages, grammatical correctness of a language is very significant. Therefore, it is very important to have natural language processing applications that recognize the grammatical errors that may occur in natural language texts. The natural language processing application that recognizes the grammatical error of a language is called grammar checker. Different approaches can be used to develop a grammar checker for a language. These are rule based, statistical and hybrid approaches. In this study statistical Afaan Oromo grammar checker is developed and tested using a prepared dataset. In the statistical approaches of grammar checking two techniques can be used for detecting the grammatical correctness of a given sentences. The first one is token n-gram, in which sequence of token are extracted and the second is tag n-gram, in which sequence of tag are extracted. In this study these two techniques of statistical approach are used and their performance is tested on 85 Afaan Oromo sentences. The evaluation results show that the performance of token n-gram in identifying incorrect sentence is a recall 100%, precision of 78.1% and F-measure of 89.0% and the performance of tag n-gram technique in identifying incorrect sentences is a recall of 86%, precision of 82.6% and F-measure of 84.3%. On the other hand, the performance of token n-gram technique in identifying correct sentences is a recall 60%, precision of 100% and F-measure of 80% and the performance of tag n-gram technique in identifying correct sentence is a recall of 74.2%, precision of 78.2% and F-measure of 76.4%. There are also some reasons that lead to the low performance of the two techniques. The first one is the issue related to the performance of sentence boundary detector, word splitter, POS tagger and morphological analyzer modules. Another reason is for the low performance of the two techniques is related to the quality of the corpus (spelling error, the spacing error). As a result this study recommends the following recommendation in order to increase the performance of the grammar checker. The first one is using spelling checker in order to increase the performance of POS tagger and Morphological analyzer. The other is using good quality corpus and good performing POS tagger and Morphological analyzer.	en_US
dc.identifier.uri	http://etd.aau.edu.et/handle/12345678/13956
dc.language.iso	en	en_US
dc.publisher	Addis Ababa University	en_US
dc.subject	Natural Language Processing	en_US
dc.title	Statistical Afaan Oromo Grammar Checker	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Abebe Mideksa.pdf
Size:: 600.47 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Information Sciences