Automatic Part-Of-Speech Tagger For Tigrigna Language Using Hybrid Approach

dc.contributor.advisorTefera, Solomon Advisor (PHD)
dc.contributor.authorAtsbaha Sium, Mulugeta
dc.date.accessioned2018-11-09T15:05:00Z
dc.date.accessioned2023-11-18T12:44:24Z
dc.date.available2018-11-09T15:05:00Z
dc.date.available2023-11-18T12:44:24Z
dc.date.issued2016-10-01
dc.description.abstractTagging is a process of associating word class categories markers for corpora contents as additional information. Tagging can be used as pre-processing step for other high level language technology applications, such as to develop stemming algorithm, to prepare annotated corpora, etc. The process of tagging is a challenging task with Tigrigna because of the nature and morphological complexity of the language, resources scarcity and compiling Tigrigna texts. The study uses a corpus containing 3100 sentences, 10000 distinct words and 56,151 total tokens and they are balanced corpus (not a domain specific corpus). A total of 22 Morpho-Syntactic course-grained tag-sets were adapted to prepare the annotated corpus using semi-supervised approach. Because the corpus is normalized, processed and annotated corpora it can be used for other language processing tasks. The entire work describes an experimental study for improving Tigrigna tagger performance by combining outputs of two sequence taggers. Rule based, averaged perceptron taggers, and hybrid of the two taggers are investigated. The hybrid tagger was constructed from the sequence of the two taggers as averaged perceptron tagger followed by rule based tagger. The models are trained in 75% of the corpus and tested on the remaining 25% for their robustness and effectiveness. For each model several different experiments have been conducted. Experimental result shows that reasonable tagger is achieved with modified rule based tagger along to three combined initial state annotator. In this study state-of-the-art tagging accuracy for morphological rich languages particularly Tigrigna with Averaged perceptron tagger is achieved. The Rule based tagger has found 94.8%, while Averaged perceptron tagger achieved 95.5%. Thus, averaged perceptron tagger and rule based tagger achieved comparable performance; however, the hybrid tagger improves the accuracy to 96.3%. The hybrid tagger works as a sequence of averaged perceptron followed with rule based tagger as error detection and correction sequence. In between the trained averaged perceptron and rule based tagger there is output analyzer with a threshold value as output validation and decision maker. Therefore, the hybrid approach based rule based and averaged perceptron tagger creates a reasonable PoS tagger for Tigrigna. 1en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/12345678/14106
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectTigrigna Language Using Hybrid Approachen_US
dc.titleAutomatic Part-Of-Speech Tagger For Tigrigna Language Using Hybrid Approachen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
31. Mulugeta Atsebeha.pdf
Size:
1017.39 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: