Amharic Parts-of-Speech Tagger using Neural Word Embeddings as Features

dc.contributor.advisorSurafel, Lemma (PhD)
dc.contributor.authorMequanent, Argaw
dc.date.accessioned2019-03-27T07:25:00Z
dc.date.accessioned2023-11-04T15:14:40Z
dc.date.available2019-03-27T07:25:00Z
dc.date.available2023-11-04T15:14:40Z
dc.date.issued2019-01
dc.description.abstractThe parts-of-speech (POS) tagging for Amharic language is not matured yet to be used as one important component in other natural language processing (NLP) applications. Previous studies done on Amharic POS tagger used hand-crafted features to develop tagging models. In Amharic language, prepositions and conjunctions usually are attached with the other parts-of-speech. This forces the tags to represent more than one basic information and also decrease the total number of instances in the training corpus. In addition, the manual design of features requires longer time, more labor and linguistic background. In this study, automatically generated neural word embeddings are used as features for the development of an Amharic POS tagger. Neural word embeddings are multi-dimensional vector representations of words. The vector representations capture syntactic and semantic information about words. Another additional aspect in this study is, prepositions and conjunctions attached with the other parts-of-speech are segmented using HornMorpho morphological analyzer. Stateof- the-art deep learning algorithms are also used to develop tagging models. Long Short-Term Memory (LSTM) recurrent neural networks and their bidirectional versions (Bi-LSTM RNNs) are used to develop tagging models from the possible deep learning algorithms. The maximum evaluation result observed is 93.67% F-measure obtained from the model developed by using Bi-LSTM recurrent neural network. From the results obtained, it can be observed that word embeddings generated by neural networks can replace manually designed features which is an important advantage. Segmenting prepositions and conjunctions attached with the other parts-of-speech also improved the accuracy of the POS tagger by more than 5%. The accuracy improvement of the POS tagger is obtained from the increased total number of instances and decreased number of tags due to segmentation.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/123456789/17226
dc.language.isoen_USen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectnatural language processingen_US
dc.subjectPOS taggeren_US
dc.subjectneural word embeddingsen_US
dc.subjectsegmentationen_US
dc.subjectdeep learningen_US
dc.subjectrecurrent neural networksen_US
dc.subjectAmharicen_US
dc.titleAmharic Parts-of-Speech Tagger using Neural Word Embeddings as Featuresen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Mequanent Argaw.pdf
Size:
904.32 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: