Automatic Syntactic Parser For Afaan Oromo Complex Sentence Using Context Free Grammar
No Thumbnail Available
Date
2016-10-07
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
The primary purpose of innovating computer was calculating complex mathematical and arithmetic operations, intuitively: additions, subtractions, divisions and multiplications. Apart from these, the advancement of computer technology is allowing computers to deal with human language which is governed by the discipline natural language processing, even though the media between the two entities varies i.e., computer uses binary digits (0s & 1s) and human uses natural language. When we say processing natural language, the all about of the language couldn‟t be computerized overnight because language has complex and deep features. Instead, it has knowledge levels: phonology, morphology, syntax, semantics, pragmatics and discourse, so that the processing could be done phase by phase. Once processing human language became promising and possible, it is the role of researchers to process and automate one‟s language at least through one of the knowledge levels. Although Afaan Oromo is spoken by large number of people, it is yet one of the computationally infant languages because none of its feature is practically functioning for public service except some academicians (students) tried to touch as the result of their academic evaluations. This computational infancy brought unavailability of the basic tools like parser for the research community. Thus, the purpose of this study was to syntactically parse Afaan Oromo complex sentence using context-free grammar, which is used as a component in grammar checking, information extraction, question-answering, semantic analysis and machine translation. Parsing is hierarchically categorized under syntax, and is used to generate valid parse tree for a sentence given grammar and lexical rules. It was investigated using rule based approach and bottom-up parsing strategy. Chart parsing algorithm was selected as a result of its efficiency for managing ambiguities encountered during parsing. Moreover, NLTK and python programming language were selected due to their selectivity for dealing with linguistic data. 250 Afaan Oromo complex sentences constructed from minimum of three and maximum of seven word lengths constituting one independent and one dependent clause with either closed or disclosed single subject. Then, seven different experiments were done by defining grammar and lexical rules of each sentence. The parser encountered parsing errors which were corrected manually by modifying grammar rules. Finally, the parser scored a promising average accuracy of 94.31% which could attract anyone who wants to continue investigating syntactic parser on Afaan Oromo complex sentence.
Description
Keywords
syntax, parsing, syntactic parsing, Natural language processing, Chart parser, context-free grammar parsing, Afaan Oromo, constituency, parsing complex-sentence.