Application of Data Mining Technology to Support Fraud Protection: the Case of Ethiopian Revenue and Custom Authority

No Thumbnail Available



Journal Title

Journal ISSN

Volume Title


Addis Ababa University


Taxes are important sources of public revenue. The existence of collective consumption of goods and services necessitates putting some of our income into government hands. However, collection of tax is the main source of income for the government; it is facing difficulties with fraud. Fraud involves one or more persons who intentionally act secretly to deprive the government income and use for their own benefit. Fraud is as old as humanity itself and can take an unlimited variety of different forms. Fraudulent claims account for a significant portion of all claims received by auditors, and cost billions of dollars annually. This study is initiated with the aim of exploring the potential applicability of the data mining technology in developing models that can detect and predict fraud suspicious in tax claims with a particular emphasis to Ethiopian Revenue and Custom Authority. The research has tried to apply first the clustering algorithm followed by classification techniques for developing the predictive model, K-Means clustering algorithm is employed to find the natural grouping of the different tax claims as fraud and non-fraud. The resulting cluster is then used for developing the classification model. The classification task of this study is carried out using the J48 decision tree and Naïve Bayes algorithms in order to create model that best predict fraud suspicious tax claims. To collect the data the researcher used interview and observation for primary data and database analysis for secondary data. The experiments have been conducted following the six-step Cios et al. (2000) KDD process model. For the experiment, the collected tax payers‟ dataset is preprocessed to remove outliers, fill in ITMD values, select relevant attributes, integrate data and derive attributes. The preprocessing phase of this study really took the highest portion of the study time. In this study, different characteristics of the ERCA customers‟ data were collected from the customs ASYCUDA database. A total of 11080 tax payers‟ records are used for training the models, while a separate 2200 records are used for testing the performance of the model. The model developed using the J48 decision tree algorithm has showed highest classification accuracy of 99.98%. This model is then tested with the 2200 testing dataset and scored a prediction accuracy of 97.19%. The results of this study have showed that the data mining techniques are valuable for tax fraud detection. Hence future research directions are pointed out to come up with an applicable system in the area



Data Mining Technology