Amharic Parts-of-Speech Tagger using Neural Word Embeddings as Features

Mequanent, Argaw

Amharic Parts-of-Speech Tagger using Neural Word Embeddings as Features

dc.contributor.advisor	Surafel, Lemma (PhD)
dc.contributor.author	Mequanent, Argaw
dc.date.accessioned	2019-03-27T07:25:00Z
dc.date.accessioned	2023-11-04T15:14:40Z
dc.date.available	2019-03-27T07:25:00Z
dc.date.available	2023-11-04T15:14:40Z
dc.date.issued	2019-01
dc.description.abstract	The parts-of-speech (POS) tagging for Amharic language is not matured yet to be used as one important component in other natural language processing (NLP) applications. Previous studies done on Amharic POS tagger used hand-crafted features to develop tagging models. In Amharic language, prepositions and conjunctions usually are attached with the other parts-of-speech. This forces the tags to represent more than one basic information and also decrease the total number of instances in the training corpus. In addition, the manual design of features requires longer time, more labor and linguistic background. In this study, automatically generated neural word embeddings are used as features for the development of an Amharic POS tagger. Neural word embeddings are multi-dimensional vector representations of words. The vector representations capture syntactic and semantic information about words. Another additional aspect in this study is, prepositions and conjunctions attached with the other parts-of-speech are segmented using HornMorpho morphological analyzer. Stateof- the-art deep learning algorithms are also used to develop tagging models. Long Short-Term Memory (LSTM) recurrent neural networks and their bidirectional versions (Bi-LSTM RNNs) are used to develop tagging models from the possible deep learning algorithms. The maximum evaluation result observed is 93.67% F-measure obtained from the model developed by using Bi-LSTM recurrent neural network. From the results obtained, it can be observed that word embeddings generated by neural networks can replace manually designed features which is an important advantage. Segmenting prepositions and conjunctions attached with the other parts-of-speech also improved the accuracy of the POS tagger by more than 5%. The accuracy improvement of the POS tagger is obtained from the increased total number of instances and decreased number of tags due to segmentation.	en_US
dc.identifier.uri	http://etd.aau.edu.et/handle/123456789/17226
dc.language.iso	en_US	en_US
dc.publisher	Addis Ababa University	en_US
dc.subject	natural language processing	en_US
dc.subject	POS tagger	en_US
dc.subject	neural word embeddings	en_US
dc.subject	segmentation	en_US
dc.subject	deep learning	en_US
dc.subject	recurrent neural networks	en_US
dc.subject	Amharic	en_US
dc.title	Amharic Parts-of-Speech Tagger using Neural Word Embeddings as Features	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Mequanent Argaw.pdf
Size:: 904.32 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Computer Engineering