Extrinsic Hybrid Amharic Text Plagiarism Detection for News Articles Posts on Social Media
No Thumbnail Available
Date
2025-01
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
As news posting material develops in Ethiopia, there is rising worry about plagiarism in news items on social media platforms due to potential plagiarism of articles. Plagiarism in news articles will have a significant negative impact on society by fostering a climate of mistrust, particularly regarding the authority and ownership of news organizations, as well as integrity and homogeneity reports that lack originality, which leading to violations of professional norms. In order to identify plagiarism on information that is copied and rephrased without giving credit to the original author, this research proposes a two-layer text plagiarism detection technique tailored for Amharic news items posted on social media by using different methodologies.
The study employed a methodology of purposive sampling to gather data from social media accounts belonging to government news agencies, privately owned news agencies, individuals/bloggers, and journalists, as well as international organizations like BBC Amharic. The criteria for choosing included posting daily news on a range of topics (such as politics, economics, and international news), having an Amharic-language public channel, and having a significant number of followers. Two-layer plagiarism detection has been found to be more effective at identifying semantic meaning, rephrasing, and copy pasting which the traditional detection failed to detect it.
The first layer of our approach compare candidate plagiarism detection techniques, such as fingerprint and n-gram checking, to identify potential cases of plagiarism. The second layer of the approach focuses on more advanced techniques, such as semantic LDA and fuzzy models. Then compared the performance of various mixed approaches like 1-layer fingerprint with 2-layer LDA, 1-layer fingerprint with 2-layer Fuzzy, 1-layer n-gram with 2-layer LDA, and 1-layer n-gram with 2-layer fuzzy for candidate then semantics plagiarism detection methods.
In conclusion, by incorporating both traditional like fingerprint and n-gram and advanced techniques like using LDA and fuzzy semantics, it is found that these performance metrics increase as the results of the various experiments show. The research findings indicate that merged features are better than the individual features for almost all models. Bi-gram with LDA:- Accuracy: 0.96, Recall: 0.909, Precision: 0.93, F1 Score: 0.92; Fingerprint with LDA:- Accuracy: 0.97, Recall: 0.917, Precision:
0.968, F1 Score: 0.94; Fingerprint and Fuzzy:- F1 Score: 0.66, Recall: 0.5, Precision: 1.0, Accuracy: 1.0. Based on the results of the study concluded that fingerprint with LDA outperformed in performance matrix including short time span to detect plagiarism.
Description
Keywords
Plagiarism, Extrinsic Hybrid Amharic Text Plagiarism Detection, Fingerprint and N-Gram and Advanced Techniques Like Using LDA and Fuzzy Semantics