Mining Insurance Data for Fraud Detection: the Case of Africa Insurance Share Company

No Thumbnail Available

Date

2011-06

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

The insurance industry has historically been a growing industry. It plays an important role in insuring the economic well being of one country. But ever since it’s beginning as a commercial enterprise, the industry is facing difficulties with insurance fraud. Insurance fraud is very costly and has become a world concern in recent years. Fraudulent claims account for a significant portion of all claims received by insurers, and cost billions of dollars annually. Nowadays, great efforts have been made to develop models to identify potentially fraudulent claims for special investigations using the data mining technology. This study is initiated with the aim of exploring the potential applicability of the data mining technology in developing models that can detect and predict fraud suspicious in insurance claims with a particular emphasis to Africa Insurance Company. The research has tried to apply first the clustering algorithm followed by classification techniques for developing the predictive model. K-Means clustering algorithm is employed to find the natural grouping of the different insurance claims as fraud and non-fraud. The resulting cluster is then used for developing the classification model. The classification task of this study is carried out using the J48 decision tree and Naïve Bayes algorithms in order to create the model that best classify fraud suspicious insurance claims. The experiments have been conducted following the six-step Cios et al. (2000) process model. For the experiment, the collected insurance dataset is preprocessed to remove outliers, fill in missing values, select attributes, integrate data and derive attributes. The preprocessing phase of this study really took the highest portion of the study time. A total of 17810 insurance claim records are used for training the models, while a separate 2210 records are used for testing their performance. The model developed using the J48 decision tree algorithm has showed highest classification accuracy of 99.96%. This model is then tested with the 2210 testing dataset and scored a prediction accuracy of 97.19%. The results of this study have showed that the data mining techniques are valuable for insurance fraud detection. Hence future research directions are pointed out to come up with an applicable system in the area.

Description

Keywords

Data Mining

Citation