Automatic Fraud Detection Model from Customs Data in Ethiopian Revenues and Customs Authority

Hailemariam, Sebsibe(PhD)Muhammed, Meriem2022-02-282023-11-292022-02-282023-11-292013-03http://etd.aau.edu.et/handle/123456789/30375C\.Jstoms, which is one of the three wings in Ethiopian Revenues and Customs Authority (ERCA), is established to secure national revenues by controlling impons and exports as well as coll ecting go~emmental tax and duties. This research focuses on identification, modeling and analysis of various conflicting issues that Ethiopian customs faces. One of the major problems identified during problem understanding is controlling and management of fraudu lent behavior of fo reign traders. The declarams' intent to various types of fraudulent activities which result in the need for serious inspection of declarations and al the same lime, the huge amount of declarations per day demand significant number of human resource and time. Recognizing this critical problem of the government, ERCA adopt Automated System for Customs DAta (ASYCUDA). ASYCUDA attempts to minimize the problems through risk level recommendation to declarations using select ivi ty method that uses five parameters from the decJarants' information. The fundamenta l problem to ASYCUDA risk leveling is, restricting the variables whi ch are used to assign risk level; this may lead to direct the declaration into incorrect channel. This research proposed a machine learning approach to model fraudulent behavior of importers through identification of appropriate parameters from the observed data to improve the quality of service at Customs, ERCA. In this research, the researcher proposed automated fraud detection models which predict fraud behaviors of importing cargos, in which the problem assoc iated with ASYCUDA risk leveling wi ll be minimized. The models have been bui lt through machine learning techniques by using the past data which was collected from customs data of ERCA. The analysis has been done on inspected cargos records having 74,033 instances and 24 attributes. Four different prediction models were proposed. The first model is fraud prediction model, which predicts whether incoming cargo is fraudulent or not. The second model is fraud category prediction model, which identifies the specific type of the fraud category among the ten identified categories. The third model is fraud level prediction model. which class ifies the fraud level as high or low. The last model is fraud ri sk level prediction model which is used to classify the risk level of importing cargos into high. medium or low. x i • • Moreover. from the recommendation of IEEE, four best machine learning approaches have been tested for each of the identified prediction models. These are C4.5, CART. KNN and Naive Bayes. Based on the results which are obtained through various experimental analyses. C4.5 is found to be the best algorithm to build all types of the prediction models. The accuracy obtained in the first, second, third and founh scenarios using C4.5 machine learning algorithms are 93.4%,84.4%, 89.4%, and 86.8% respecti vely. The next best algorithm, Classification and Regression Tree (CART), performed an accuracy of 92.9%,80. 1 %,89.4%,85.3% for the first, second, third and fourth scenarios respectively. The researchers observed that both C4.5 and CART perform better for fraud prediction and fraud level classification compared to fraud category and risk level prediction. Moreover, Naive Bayes statistical approach is found to be very poor. Key words: Fraud prediction, fraud category prediction, fraud level prediction, fraud risk level prediction, classification, machine learning algorithm, ASYCUDA.enAutomatic Fraud Detection Model from Customs Data in Ethiopian Revenues and Customs AuthorityThesis