Skip navigation
 

Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/8698
Title: Application of Data Mining Techniques to Predict Customers’churn at Commercial Bank of Ethiopia
???metadata.dc.contributor.*???: Dr. Dereje Teferi
Gebremeskel, Kassahun
Keywords: Ethiopia;Commercial Bank;Data Mining Techniques
Issue Date: Sep-2013
Publisher: AAU
Abstract: Data mining tools and techniques are being used to solve different types of problems in various industries. Predicting customers‘ churn is one of the areas where data mining can be applied. Customers‘ churn, which is the common measure of lost customers, is one of the major problems in industries such as banks where there is a fierce competition. By minimizing the number of churning customers companies can maximize their profit and sustainability. For this reason, customer retention is critical for a good marketing and a customer relationship management strategy. This paper presents the prediction of customers, who are prone to move to a competitor, in Commercial Bank of Ethiopia. The data of 13172 customers with 9 attributes and their corresponding 628,634 transactions with 10 attributes is collected from the bank. The CRISP-DM methodology is followed to conduct the data mining process. After the business is thoroughly analyzed and the goals are clearly identified, successive steps of a data preparation processes are undertaken. A dataset of 6045 instances and 18 attributes is prepared. A WEKA (Waikato Environment for Knowledge Analysis) tool is used for modeling. The dataset is partitioned into different sets of testing and training sets. As the proportion of the churn class is very small as compared to the active (non-churn) class, SMOTE (Synthetic Minority Oversampling Technique) has been applied to minimize the class imbalance problem. Three modeling techniques are used for predicting churn. These are J48, Logistic Regression, and Bagging. The training models are built using cross validation and tested for reliability by separate test sets. The models are evaluated by their F-Measure values (which is the harmonic mean of recall and precision). The results of the study show that J48 modeling technique is the best model with a performance of 94.8% followed by bagging (93.9%) and Logistic Regression (76.6%).
URI: http://hdl.handle.net/123456789/8698
Appears in Collections:Thesis - Information Science

Files in This Item:
File Description SizeFormat 
Kassahun_Gebremeskel_2013_Final_Thesis.pdf1.91 MBAdobe PDFView/Open
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.