A Morphosyntactic Tagset for the Annotation of Texts in Tigrinya

dc.contributor.advisorLiyew, Zelalem (PhD)
dc.contributor.authorWoldemariam, Tsegay
dc.date.accessioned2019-02-11T15:58:25Z
dc.date.accessioned2023-11-08T04:33:42Z
dc.date.available2019-02-11T15:58:25Z
dc.date.available2023-11-08T04:33:42Z
dc.date.issued2013-06
dc.description.abstractThe major purpose of this thesis is to identify and develop a morphosyntactic tagset for morphosyntactic annotation of texts in Tigrinya, the Ethio-Semitic language having about seven to nine million speakers in Ethiopia and Eritrea (CSA, 2007; CIA 2012; http://en.wikipedia.org/wiki/Tigrinya_language#cite_ref-2). In relation to what is researched, there is almost no Natural Language Processing (NLP) resource for Tigrinya. The researcher thinks that Tigrinya is lucky to start with a comprehensive morphosyntactic tagset development; because morphosyntactic tagset is the foundation for many NLP applications. We have examined the Morphosyntactic features of Tigrinya words and assign a tag that can be applicable for these words in Tigrinya texts. The thesis focuses only on the development of morphosyntactic tagset based on the morphological and morphosyntactic features of Tigrinya. As a result the developed morphosyntactic tagset for Tigrinya has 18 coarse-grained tags at the higher level, 105 fine-grained tags at the lower level, and even we can extend to more fine-grained features and we get 139 tags. We recommend for researchers to use the 105 tags for their applications, unless and otherwise they have a different purpose which needs the coarse-grained major category 18 tags or the very fine-grained 139 tags, even beyond. The uses and applications of morphosyntactic tagsets provide an important level of linguistic information to a document. It is useful as a preprocessing step of parsing and most of all it is useful to develop a POS tagger, which is the basis for many higher NLP applications. Students, researchers and professionals like computational linguists/computer scientists who are engaged in Natural Language Processing applications like speech recognition, text to speech, natural language parsing, information retrieval, lexicography and machine translation are the beneficiaries of this research.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/123456789/16351
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectEthio-Semitic language having abouten_US
dc.subjectseven to nine million speakers in Ethiopiaen_US
dc.titleA Morphosyntactic Tagset for the Annotation of Texts in Tigrinyaen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Tsegay Woldemariam.pdf
Size:
1.94 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description:

Collections