Addis Ababa University Libraries Electronic Thesis and Dissertations: AAU-ETD! >
Faculty of Informatics >
Thesis - Information Science >
Please use this identifier to cite or link to this item:
|Title: ||KNOWLEDGE DISCOVERY FOR EFFECTIVE CUSTOMER SEGMENTATION: THE CASE OF ETHIOPIAN REVENUE AND CUSTOMS AUTHO|
|Authors: ||BELETE, Beyazen|
|Advisors: ||Ato Getachew Jemaneh|
|Keywords: ||Information science|
|Copyright: ||Jun-2011 |
|Date Added: ||30-Jul-2012 |
|Abstract: ||CRM is a process by which an organization maximizes customer satisfaction in an effort
to increase loyalty and retain customers‟ business over their lifetimes. On the other hand,
customer segmentation is the grouping of customers into different groups based on their
common attributes and it is the main part of CRM. In order to analyze CRM data, one
needs to explore the data from different angles and look at its different aspects. This
should require application of different types of data mining techniques. Data mining finds
and extracts knowledge hidden in corporate data warehouses.
The aim of this study is to test the applicability of clustering and classification data
mining techniques to support CRM activities for ERCA using the Cios et al. (2000) KDD
process model. In this study, different characteristics of the ERCA customers‟ data were
collected from the customs ASYCUDA database. Once the customers‟ data were
collected, the necessary data preparation steps were conducted on it and finally a dataset
consisting of 46748 records was attained.
To segment customers, the K-means clustering algorithm was used. During the cluster
modeling different experiments have been conducted using different cluster numbers
(K=3, 4, 5, 6) and seed values. From the different experiments, the one which had better
performance has been selected. Hence, the cluster model at K=5 had better performance
and its output was used for the next classification modeling.
The classification modeling was built by using J48 decision tree and multilayerperceptron
ANN algorithms with 10-fold cross-validation and splitting (70% training and 30%
testing) techniques. Among these models, a model which was built using J48 decision
tree algorithm with default 10-fold cross-validation shows better performance which is
99.95% of overall accuracy rate; hence this model was selected.
The results of this research were encouraging as very high classification accuracy has
|Description: ||A Thesis Submitted to the School of Graduate Studies of Addis
Ababa University in Partial Fulfillment of the Requirements for the
Degree of Master of Science in Information Science|
|Appears in:||Thesis - Information Science|
Items in the AAUL Digital Library are protected by copyright, with all rights reserved, unless otherwise indicated.