Automatic Fraud Detection Model from Customs Data in Ethiopian Revenues and Customs Authority
No Thumbnail Available
Date
2013-03
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa, Ethiopia
Abstract
C\.Jstoms, which is one of the three wings in Ethiopian Revenues and Customs Authority
(ERCA), is established to secure national revenues by controlling impons and exports as well as
coll ecting go~emmental tax and duties. This research focuses on identification, modeling and
analysis of various conflicting issues that Ethiopian customs faces. One of the major problems
identified during problem understanding is controlling and management of fraudu lent behavior
of fo reign traders. The declarams' intent to various types of fraudulent activities which result in
the need for serious inspection of declarations and al the same lime, the huge amount of
declarations per day demand significant number of human resource and time.
Recognizing this critical problem of the government, ERCA adopt Automated System for
Customs DAta (ASYCUDA). ASYCUDA attempts to minimize the problems through risk level
recommendation to declarations using select ivi ty method that uses five parameters from the
decJarants' information. The fundamenta l problem to ASYCUDA risk leveling is, restricting the
variables whi ch are used to assign risk level; this may lead to direct the declaration into
incorrect channel.
This research proposed a machine learning approach to model fraudulent behavior of importers
through identification of appropriate parameters from the observed data to improve the quality
of service at Customs, ERCA. In this research, the researcher proposed automated fraud
detection models which predict fraud behaviors of importing cargos, in which the problem
assoc iated with ASYCUDA risk leveling wi ll be minimized. The models have been bui lt
through machine learning techniques by using the past data which was collected from customs
data of ERCA. The analysis has been done on inspected cargos records having 74,033 instances
and 24 attributes.
Four different prediction models were proposed. The first model is fraud prediction model,
which predicts whether incoming cargo is fraudulent or not. The second model is fraud category
prediction model, which identifies the specific type of the fraud category among the ten
identified categories. The third model is fraud level prediction model. which class ifies the fraud
level as high or low. The last model is fraud ri sk level prediction model which is used to
classify the risk level of importing cargos into high. medium or low.
x
i •
•
Moreover. from the recommendation of IEEE, four best machine learning approaches have been
tested for each of the identified prediction models. These are C4.5, CART. KNN and Naive
Bayes. Based on the results which are obtained through various experimental analyses. C4.5 is
found to be the best algorithm to build all types of the prediction models. The accuracy obtained
in the first, second, third and founh scenarios using C4.5 machine learning algorithms are
93.4%,84.4%, 89.4%, and 86.8% respecti vely.
The next best algorithm, Classification and Regression Tree (CART), performed an accuracy of
92.9%,80. 1 %,89.4%,85.3% for the first, second, third and fourth scenarios respectively.
The researchers observed that both C4.5 and CART perform better for fraud prediction and
fraud level classification compared to fraud category and risk level prediction. Moreover, Naive
Bayes statistical approach is found to be very poor.
Key words: Fraud prediction, fraud category prediction, fraud level prediction, fraud risk level
prediction, classification, machine learning algorithm, ASYCUDA.