Mining Insurance Data for Fraud Detection: the Case of Africa Insurance Share Company
No Thumbnail Available
Date
2011-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
The insurance industry has historically been a growing industry. It plays an important
role in insuring the economic well being of one country. But ever since it’s beginning as
a commercial enterprise, the industry is facing difficulties with insurance fraud. Insurance
fraud is very costly and has become a world concern in recent years. Fraudulent claims
account for a significant portion of all claims received by insurers, and cost billions of
dollars annually. Nowadays, great efforts have been made to develop models to identify
potentially fraudulent claims for special investigations using the data mining technology.
This study is initiated with the aim of exploring the potential applicability of the data
mining technology in developing models that can detect and predict fraud suspicious in
insurance claims with a particular emphasis to Africa Insurance Company. The research
has tried to apply first the clustering algorithm followed by classification techniques for
developing the predictive model. K-Means clustering algorithm is employed to find the
natural grouping of the different insurance claims as fraud and non-fraud. The resulting
cluster is then used for developing the classification model. The classification task of this
study is carried out using the J48 decision tree and Naïve Bayes algorithms in order to
create the model that best classify fraud suspicious insurance claims.
The experiments have been conducted following the six-step Cios et al. (2000) process
model. For the experiment, the collected insurance dataset is preprocessed to remove
outliers, fill in missing values, select attributes, integrate data and derive attributes. The
preprocessing phase of this study really took the highest portion of the study time.
A total of 17810 insurance claim records are used for training the models, while a
separate 2210 records are used for testing their performance. The model developed using
the J48 decision tree algorithm has showed highest classification accuracy of 99.96%.
This model is then tested with the 2210 testing dataset and scored a prediction accuracy
of 97.19%. The results of this study have showed that the data mining techniques are
valuable for insurance fraud detection. Hence future research directions are pointed out
to come up with an applicable system in the area.
Description
Keywords
Data Mining