Coreference Resolution for Amharic Text Using Bidirectional Encoder Representation from Transformer

dc.contributor.advisorAssabie, Yaregal (PhD)
dc.contributor.authorBantie, Lingerew
dc.date.accessioned2022-07-11T09:45:35Z
dc.date.accessioned2023-11-29T04:06:43Z
dc.date.available2022-07-11T09:45:35Z
dc.date.available2023-11-29T04:06:43Z
dc.date.issued2022-03-04
dc.description.abstractCoreference resolution is the process of finding an entity which is refers to the same entity in a text. In coreference resolution similar entities are mention. The task of coreference resolution is clustering all similar mentions in a text based on the index of a word. Coreference resolution is used for several Natural Language Processing (NLP) applications like machine translation, information extraction, name entity recognition, question answering and others to increase their effectiveness. In this work, we have proposed coreference resolution for Amharic text using bidirectional encoder representation from transformer (BERT). This method is a contextual language model that generates the semantic vectors dynamically according to the context of the words. The proposed system model has training and testing phase. The training phase includes preprocessing (cleaning, tokenization and sentence segmentation), word embedding, feature extraction Amharic vocabulary, entity and mention-pair and coref model. Like training phase, testing phase has its own step such as preprocessing (cleaning, tokenization and sentence segmentation) and coreference resolution as well as Amharic predicted mention. The use of word embedding in the proposed model is that it represent each word into a low dimension vector. It is a feature learning technique to obtain new features across domains for coreference resolution in Amharic text. Necessary informations are extracted from word embedding and processed data as well as Amharic characters. After we extract important features from training data we build a coreference model. Moreover, in the model bidirectional encoder representation from transformer is used to obtain basic features from embedding layer by extracting various information from both the left and right direction of the given word. To evaluate the proposed model, we conduct the experiment using Amharic dataset, which is prepared from various reliable sources for this study. The commonly used evaluation metrics for coreference resolution task are MUC, B3, CEAF-m, CEAF-e and BLANC. Experimental result demonstrate that the proposed model outperformed state-of-the-art Amharic model achieving 80%, 85.71%, 90.9%, 88.86% and 81.7% F-measure values respectively on the Amharic dataset.en_US
dc.identifier.urihttp://etd.aau.edu.et/handle/123456789/32223
dc.language.isoenen_US
dc.publisherAddis Ababa Universityen_US
dc.subjectAmharic Coreference Resolutionen_US
dc.subjectMentionen_US
dc.subjectBidirectional Encoder Representation from Transformeren_US
dc.subjectTransformeren_US
dc.subjectNlpen_US
dc.subjectCoreferenceen_US
dc.subjectWord Embeddingen_US
dc.titleCoreference Resolution for Amharic Text Using Bidirectional Encoder Representation from Transformeren_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Lingerew Bantie 2022.pdf
Size:
644.47 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: