Development of Automatic Parser for Tigrigna Sentences  Using Bottom-Up Probabilistic Chart Parser

Medhin, Yaynshet

Development of Automatic Parser for Tigrigna Sentences Using Bottom-Up Probabilistic Chart Parser

Files

Yaynshet Medhin 2017.pdf (1.94 MB)

Date

2017-10-04

Authors

Medhin, Yaynshet

Publisher

Addis Ababa University

Abstract

Automatic parsing is the process of dividing a given sentence to its grammatical structure. Parsing is useful for improving the performance of many NLP applications. There are many research works done on automatic parsing for different languages. The aim of this research work is to design and develop automatic parser for Tigrigna sentences using bottom-up probabilistic chart parser. We proposed the architecture of the designed system to the identified problem. The architecture has two parts: The learning and parsing. The learning part contains components from which the supervised learning is accomplished. The corpus collected from the different sources is preprocessed by developing simple preprocessing component. The preprocessed sample corpus is manually tagged by two language experts in the language. The tagged corpus is then parsed manually by the linguists. From the parsed sentences Probabilistic Context Free Grammar (PCFGs) are extracted. From the tagged corpus, lexicon was generated using the lexicon generation component. The parsing part contains components which perform the task of parsing given an input sentence such as sentence tokenization, morphological analysis and the PCFG parsing. The first two components make the input sentence suitable to the PCFG chart parsing component. We then conducted several experiments for both simple and complex Tigrigna sentences. Experimental findings were attained and the solution to the identified problems was addressed and suggested. The experiments were conducted in three parts. The first test was from the training set and the second test was done on test sets from the sample corpora. The third set was different from the two sets which was not from the sample corpora used in the study. The accuracy found on the first test set, second test set and third test set was 95%, 94% and 85%, respectively for the simple Tigrigna sentences. For the complex Tigrigna sentences the result achieved on the three test sets was 91%, 90% and 80%, respectively.

Keywords

Tigrigna, PCFG, Inside Algorithm, Viterbi Algorithm, PCFG Chart Parser, Bottom-Up Parsing

URI

http://etd.aau.edu.et/handle/123456789/18813

Collections

Environmental Science

Full item page

Development of Automatic Parser for Tigrigna Sentences Using Bottom-Up Probabilistic Chart Parser

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections