Lexicon-Stance Based Amharic Fake News Detection
No Thumbnail Available
Date
2022-05
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Due to the noisy nature of social media content, and the rapid propagation of false
information, the identification, and detection of fake news become a challenging problem.
In recent years, several studies propose to use text representation techniques from contentbased
approaches to automatically detect fake news on the social media. However, fake
news has a distinct writing pattern, and attempting to capture its distinguishing features
may help us improve detection rather than focusing solely on text representation. In this
study, we propose to combine the stance-based features (page score, headline to article
similarity, and headline to headline similarities) with lexicon-based features from text
representation techniques to enhance the detection performance. To build the detection
model, we used three machine learning algorithms: Logistic regression, Passive Aggressive
and Decision tree. The proposed approach is evaluated using a newly collected Amharic
fake news dataset from Facebook. Our experiment results show that the hybrid features
(lexicon-stance) are capable of improving the previous lexicon-based detection results by
4.1% accuracy, 3% precision, 4% recall, and 4% F1-score. In addition the hybrid feature
improves the area under curve from 0.982 to 0.995 by reducing the false positive rate by
4% and improved the true positive rate by 4.4%. Furthermore, we found that page score,
out of the proposed stance features included, has contributed the most to the improvement
of lexicon-based fake news detection.
Description
Keywords
Content-based detection, Stance based detection, Lexicon-based detection, text representation techniques, Fake news