Semantic Relation Extraction for Amharic Text Using Deep Learning Approach

No Thumbnail Available



Journal Title

Journal ISSN

Volume Title


Addis Ababa University


Relation extraction is an important semantic processing task in the field of natural language processing. The task of relation extraction can be defined as follows. Given a sentence S with a pair of annotated entities e1 and e2, the task is to identify the semantic relation between e1 and e2 following a set of predefined relation types. Semantic relation extraction can support many applications such as text mining, question answering, information extraction, etc. Some state-of-the-art systems in foreign languages still rely on lexical resources such as WordNet and natural language processing tools such as dependency parser and named entity recognizers to get high-level features. Another challenge is that important information can appear at any position in the sentence. To tackle these problems, we propose Amharic semantic relation extraction system using a deep learning approach. From the existing deep learning approaches, the bidirectional long short-term memory network with attention mechanism is used. It enables multi-level automatic feature representation learning from data and captures the most important semantic information in a sentence. The proposed model contains different components. The first is a word embedding that maps each word into a low dimension vector. It is a feature learning techniques to obtain new features across domains for relation extraction in Amharic text. The second is BLSTM that helps to get high-level features from embedding layer by exploiting information from both the past and the future direction. The single direction of relation may not reflect all information in context. The third is attention mechanism that produces a weight vector, and merges wordlevel features from each time step into a sentence-level feature vector, by multiplying the weight vector. To evaluate our model, we conduct experiments on Amharic-RE-Dataset, which is prepared from Amharic text for this thesis. The commonly used evaluation techniques precision, recall, and F-score are used to measure the effectiveness of the proposed system. The proposed attention based bidirectional long short term memory model yields an F1- score of 87.06%. It performs good result with only word embedding as input features, without using lexical resources or NLP systems.



Amharic Text Semantic Relation Extraction, Deep Learning, Word Embedding, Attention Based Bi Directional Long Short Term Memory