Development of Automatic Parser for Tigrigna Sentences Using Bottom-Up Probabilistic Chart Parser

dc.contributor.advisorAssabie, Yaregal (PhD)
dc.contributor.authorMedhin, Yaynshet
dc.date.accessioned2019-08-23T10:50:02Z
dc.date.accessioned2023-11-29T04:06:04Z
dc.date.available2019-08-23T10:50:02Z
dc.date.available2023-11-29T04:06:04Z
dc.date.issued2017-10-04
dc.description.abstractAutomatic parsing is the process of dividing a given sentence to its grammatical structure. Parsing is useful for improving the performance of many NLP applications. There are many research works done on automatic parsing for different languages. The aim of this research work is to design and develop automatic parser for Tigrigna sentences using bottom-up probabilistic chart parser. We proposed the architecture of the designed system to the identified problem. The architecture has two parts: The learning and parsing. The learning part contains components from which the supervised learning is accomplished. The corpus collected from the different sources is preprocessed by developing simple preprocessing component. The preprocessed sample corpus is manually tagged by two language experts in the language. The tagged corpus is then parsed manually by the linguists. From the parsed sentences Probabilistic Context Free Grammar (PCFGs) are extracted. From the tagged corpus, lexicon was generated using the lexicon generation component. The parsing part contains components which perform the task of parsing given an input sentence such as sentence tokenization, morphological analysis and the PCFG parsing. The first two components make the input sentence suitable to the PCFG chart parsing component. We then conducted several experiments for both simple and complex Tigrigna sentences. Experimental findings were attained and the solution to the identified problems was addressed and suggested. The experiments were conducted in three parts. The first test was from the training set and the second test was done on test sets from the sample corpora. The third set was different from the two sets which was not from the sample corpora used in the study. The accuracy found on the first test set, second test set and third test set was 95%, 94% and 85%, respectively for the simple Tigrigna sentences. For the complex Tigrigna sentences the result achieved on the three test sets was 91%, 90% and 80%, respectively.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/123456789/18813
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectTigrignaen_US
dc.subjectPCFGen_US
dc.subjectInside Algorithmen_US
dc.subjectViterbi Algorithmen_US
dc.subjectPCFG Chart Parseren_US
dc.subjectBottom-Up Parsingen_US
dc.titleDevelopment of Automatic Parser for Tigrigna Sentences Using Bottom-Up Probabilistic Chart Parseren_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Yaynshet Medhin 2017.pdf
Size:
1.94 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: