AAU Institutional Repository

Development of Part of Speech Tagger for Ge’ez Language

Show simple item record

dc.contributor.advisor Assabie, Yaregal (PhD)
dc.contributor.author Kebede, Mulata
dc.date.accessioned 2019-05-30T11:41:09Z
dc.date.available 2019-05-30T11:41:09Z
dc.date.issued 2017-10-05
dc.identifier.uri http://etd.aau.edu.et/handle/123456789/18347
dc.description.abstract Part of Speech tagging is the process of assigning part of speech or other lexical class markers to each word in a sentence or literature. Most other tasks and applications heavily depend on it. Much of the research in natural language processing has been dedicated to resource rich languages like English, French and other major European and Asian languages. Among the languages for which POS tagger is developed are Tigrigna, Amharic, Kafi-Noonoo, Arabic, Afaan-Oromo, etc. The objective of this research work is to develop POS tagger for Geez using hybrid approach that combines Trigrams 'n' Tags tagger, human written rule, regular expression and unknown word guessing. Among those diverse statistical taggers, we adopt TnT tagger to the hybrid tagger. Because it enables to the tagger to perform morphological analyzer and maintains several internal frequency distribution and conditional frequency distribution instances based on the training data. Even though TnT is preferred tagger among those statistical taggers for Ge’ez language, still it has shortcoming. TnT does not deal with prefix pattern of unknown words. Regular expression can solve slightly the drawback of TnT tagger. However, the combination of TnT and Regular expression tagger is not still sufficient to get acceptable accuracy, because, Ge’ez language is morphologically complex language and follow free grammar which can follow subject-object-verb, object-subject -verb or subject-verb-object order without change the meaning of the sentence. Consequently, human written rules and unknown word guessing are combined to the hybrid tagger. The hybrid tagger performs better than the individual component of the taggers taken alone. There was no readymade standard corpus for Ge’ez language. As a result, 26 broad tag sets were identified and 15,154 words from around 1,305 sentences collected from one genre i.e., holy Bible. Then, those words ware manually tagged by Ge’ez language professionals for training and testing purpose. Different experiments are conducted for the three types of taggers namely the TnT tagger, TnT with Regex tagger and Hybrid tagger. We obtained 77.87%, 82.23% and 94.32% performances for TnT tagger, TnT with Regex tagger and Hybrid taggers respectively. As a result, it is possible to conclude that the hybrid tagger performs better than the TnT tagger and TnT with Regex tagger used individually. en_US
dc.language.iso en en_US
dc.publisher Addis Ababa University en_US
dc.subject Ge’ez en_US
dc.subject POS Tagger For Ge’ez en_US
dc.subject NLP en_US
dc.subject Tnt en_US
dc.subject Hybrid POS Tagger en_US
dc.title Development of Part of Speech Tagger for Ge’ez Language en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search AAU-ETD


Browse

My Account