Development of Automatic Parser for Tigrigna Sentences Using Bottom-Up Probabilistic Chart Parser

No Thumbnail Available



Journal Title

Journal ISSN

Volume Title


Addis Ababa University


Automatic parsing is the process of dividing a given sentence to its grammatical structure. Parsing is useful for improving the performance of many NLP applications. There are many research works done on automatic parsing for different languages. The aim of this research work is to design and develop automatic parser for Tigrigna sentences using bottom-up probabilistic chart parser. We proposed the architecture of the designed system to the identified problem. The architecture has two parts: The learning and parsing. The learning part contains components from which the supervised learning is accomplished. The corpus collected from the different sources is preprocessed by developing simple preprocessing component. The preprocessed sample corpus is manually tagged by two language experts in the language. The tagged corpus is then parsed manually by the linguists. From the parsed sentences Probabilistic Context Free Grammar (PCFGs) are extracted. From the tagged corpus, lexicon was generated using the lexicon generation component. The parsing part contains components which perform the task of parsing given an input sentence such as sentence tokenization, morphological analysis and the PCFG parsing. The first two components make the input sentence suitable to the PCFG chart parsing component. We then conducted several experiments for both simple and complex Tigrigna sentences. Experimental findings were attained and the solution to the identified problems was addressed and suggested. The experiments were conducted in three parts. The first test was from the training set and the second test was done on test sets from the sample corpora. The third set was different from the two sets which was not from the sample corpora used in the study. The accuracy found on the first test set, second test set and third test set was 95%, 94% and 85%, respectively for the simple Tigrigna sentences. For the complex Tigrigna sentences the result achieved on the three test sets was 91%, 90% and 80%, respectively.



Tigrigna, PCFG, Inside Algorithm, Viterbi Algorithm, PCFG Chart Parser, Bottom-Up Parsing