Lexicon-Stance Based Amharic Fake News Detection

No Thumbnail Available

Date

2022-05

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Due to the noisy nature of social media content, and the rapid propagation of false information, the identification, and detection of fake news become a challenging problem. In recent years, several studies propose to use text representation techniques from contentbased approaches to automatically detect fake news on the social media. However, fake news has a distinct writing pattern, and attempting to capture its distinguishing features may help us improve detection rather than focusing solely on text representation. In this study, we propose to combine the stance-based features (page score, headline to article similarity, and headline to headline similarities) with lexicon-based features from text representation techniques to enhance the detection performance. To build the detection model, we used three machine learning algorithms: Logistic regression, Passive Aggressive and Decision tree. The proposed approach is evaluated using a newly collected Amharic fake news dataset from Facebook. Our experiment results show that the hybrid features (lexicon-stance) are capable of improving the previous lexicon-based detection results by 4.1% accuracy, 3% precision, 4% recall, and 4% F1-score. In addition the hybrid feature improves the area under curve from 0.982 to 0.995 by reducing the false positive rate by 4% and improved the true positive rate by 4.4%. Furthermore, we found that page score, out of the proposed stance features included, has contributed the most to the improvement of lexicon-based fake news detection.

Description

Keywords

Content-based detection, Stance based detection, Lexicon-based detection, text representation techniques, Fake news

Citation