Coreference Resolution for Amharic Text Using Bidirectional Encoder Representation from Transformer

Bantie, Lingerew

Coreference Resolution for Amharic Text Using Bidirectional Encoder Representation from Transformer

dc.contributor.advisor	Assabie, Yaregal (PhD)
dc.contributor.author	Bantie, Lingerew
dc.date.accessioned	2022-07-11T09:45:35Z
dc.date.accessioned	2023-11-29T04:06:43Z
dc.date.available	2022-07-11T09:45:35Z
dc.date.available	2023-11-29T04:06:43Z
dc.date.issued	2022-03-04
dc.description.abstract	Coreference resolution is the process of finding an entity which is refers to the same entity in a text. In coreference resolution similar entities are mention. The task of coreference resolution is clustering all similar mentions in a text based on the index of a word. Coreference resolution is used for several Natural Language Processing (NLP) applications like machine translation, information extraction, name entity recognition, question answering and others to increase their effectiveness. In this work, we have proposed coreference resolution for Amharic text using bidirectional encoder representation from transformer (BERT). This method is a contextual language model that generates the semantic vectors dynamically according to the context of the words. The proposed system model has training and testing phase. The training phase includes preprocessing (cleaning, tokenization and sentence segmentation), word embedding, feature extraction Amharic vocabulary, entity and mention-pair and coref model. Like training phase, testing phase has its own step such as preprocessing (cleaning, tokenization and sentence segmentation) and coreference resolution as well as Amharic predicted mention. The use of word embedding in the proposed model is that it represent each word into a low dimension vector. It is a feature learning technique to obtain new features across domains for coreference resolution in Amharic text. Necessary informations are extracted from word embedding and processed data as well as Amharic characters. After we extract important features from training data we build a coreference model. Moreover, in the model bidirectional encoder representation from transformer is used to obtain basic features from embedding layer by extracting various information from both the left and right direction of the given word. To evaluate the proposed model, we conduct the experiment using Amharic dataset, which is prepared from various reliable sources for this study. The commonly used evaluation metrics for coreference resolution task are MUC, B3, CEAF-m, CEAF-e and BLANC. Experimental result demonstrate that the proposed model outperformed state-of-the-art Amharic model achieving 80%, 85.71%, 90.9%, 88.86% and 81.7% F-measure values respectively on the Amharic dataset.	en_US
dc.identifier.uri	http://etd.aau.edu.et/handle/123456789/32223
dc.language.iso	en	en_US
dc.publisher	Addis Ababa University	en_US
dc.subject	Amharic Coreference Resolution	en_US
dc.subject	Mention	en_US
dc.subject	Bidirectional Encoder Representation from Transformer	en_US
dc.subject	Transformer	en_US
dc.subject	Nlp	en_US
dc.subject	Coreference	en_US
dc.subject	Word Embedding	en_US
dc.title	Coreference Resolution for Amharic Text Using Bidirectional Encoder Representation from Transformer	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Lingerew Bantie 2022.pdf
Size:: 644.47 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Environmental Science