Integrating Hierarchical Attention and Context-Aware Embedding For Improved Word Sense Disambiguation Performance Using BiLSTM Model
No Thumbnail Available
Date
2024-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Word Sense Disambiguation is a fundamental task in natural language processing,
aiming to determine the correct sense of a word based on its context.
Word sense ambiguity, such as polysomy, and semantic ambiguity poses significant
challenges in the task of WSD. Recent advancements in research have focused
on utilizing deep contextual models to address these challenges. However, despite
this positive progress, semantical ambiguity remains a challenge, especially when
dealing with polysomy words. This research introduces a new approach that integrates
hierarchical attention mechanisms and BERT embeddings to enhance WSD
accuracy. Our model, incorporating both local and global attention, demonstrates
significant improvements in accuracy, particularly in complex sentence structures.
To the best of our knowledge, our model is the first to incorporate hierarchical
attention mechanisms integrated with contextual embedding. This integration
enhances the model’s performance, especially when combined with the contextual
model BERT as word embeddings. Through extensive experimentation, we
demonstrate the effectiveness of our proposed model. Our research highlights several
key points. First, we showcase the effectiveness of hierarchical attention and
contextual embeddings for WSD. Second, we adapted the model to Amharic word
sense disambiguation, demonstrating strong performance. Despite the lack of a
standard benchmark dataset for Amharic WSD, our model performs 92.4% Accuracy
on a self-prepared dataset. Third, our findings emphasize the importance
of linguistic features in capturing relevant contextual information for WSD. We
also note that Part-of-Speech (POS) tagging has a less significant impact on our
English data, while word embeddings significantly impact model performance.
Furthermore, applying local and global attention leads to better results, with
local attention at the word level showing promising results. Overall, our model
achieves state-of-the-art results in WSD within the same framework. Our results
demonstrate a significant improvement of 1.8% to 2.9% F1 score over baseline
models. We also achieve state-of-the-art performance on the Italian language by
achieving 0.5% to 0.7% F1 score over baseline papers. These findings underscore
the importance of considering contextual information in WSD, paving the way
for more sophisticated and context-aware natural language processing systems.
Description
Keywords
Word Sense Disambiguation · Natural Language Processing · Hierarchical Attention · Contextual Embeddings.