Afaan Oromo Named Entity Recognition Using Neural Word Embeddings

Kasu, Mekonini

Afaan Oromo Named Entity Recognition Using Neural Word Embeddings

dc.contributor.advisor	Assabie, Yaregal (PhD)
dc.contributor.author	Kasu, Mekonini
dc.date.accessioned	2021-08-05T12:36:41Z
dc.date.accessioned	2023-11-04T12:23:14Z
dc.date.available	2021-08-05T12:36:41Z
dc.date.available	2023-11-04T12:23:14Z
dc.date.issued	10/26/2020
dc.description.abstract	Named Entity Recognition (NER) is one of the canonical examples of sequence tagging that assigns a named entity label to each of a sequence of words. This task is important for a wide range of downstream applications in natural languages processing. Two attempts have been conducted for Afaan Oromo NER that automatically identifies and classifies the proper names in text into predefined semantic types like a person, location, and organizations and miscellaneous. However, their work heavily relied on hand design feature. We proposed a deep neural network architecture for Afaan Oromo Named Entity Recognition, based on context encoder and decoder models using Bi-directional Long Short Term Memory and Conditional Random Fields respectively. In the proposed approach, initially, we generated neural word embeddings automatically using skip-gram with negative subsampling from an unsupervised corpus size of 50,284KB. The generated word embeddings represent words in semantic vectors which are further used as an input feature for encoder and decoder model. Likewise, character level representation is generated automatically using BiLSTM from the supervised corpus size of 768KB. Because of the use of character level representation, the proposed model is robust for the out-of-vocabulary words. In this study, we manually prepared annotated dataset size of 768KB for Afaan Oromo Named Entity Recognition. We split this dataset into 80% for training, 5% for testing and 15% for validation. We prepared totally 12,963 named entities from these 10,370.4 %, 648.15% and 1,944.45% are used for training, validation and test set respectively. Experimental results show that the combination of BiLSTM-CRF algorithms with pre-trained word embedding and character level representation and regularization techniques (dropout) perform better as compared to the other models such as Bi-LSTM, BiLSTM-CRF with only character level representation or word embeddings. Using Bi-LSTM-CRF model with pre-trained word embeddings and character level representation significantly improved Afaan Oromo Named Entity Recognition with an average of 93.26 % F-Score and 98.87 % accuracy.	en_US
dc.identifier.uri	http://etd.aau.edu.et/handle/123456789/27612
dc.language.iso	en	en_US
dc.publisher	Addis Ababa University	en_US
dc.subject	Afaan Oromo NER	en_US
dc.subject	Context Encoder and Tag Decoder	en_US
dc.subject	Distributed Representation	en_US
dc.subject	Deep Neural Networks	en_US
dc.title	Afaan Oromo Named Entity Recognition Using Neural Word Embeddings	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Mekonini Kasu 2020.pdf
Size:: 1.52 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Computer Science