Exploring the Prevalence of Diarrheal Disease Using Data Mining Technology (A Case of Tikur Anbessa Hospital)
No Thumbnail Available
Date
2011-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
The amount of health related data available to healthcare providing organizations for various diseases is being massive and ongoing to collect from time to time. As a result, huge amount of data is being stored in the health care organizations and facilities. Diarrheal disease is one of those which is being the causes of morbidity and mortality for many children especially under the age of five and from which large amount of data is being collected in both Rural and Urban health facilities of Ethiopia. This data represents a useful resource for making a wide variety of real-time decisions and determinations, from the quality of care delivered to trends in treatment modalities and staffing issues.
The problem is to be able to handle this huge amount of data and information in such a way that they can identify what is important and be able to extract it from the accumulated data. It is too complex and voluminous to be processed and analyzed by traditional methods. Now a days, data mining technology is being used as a tool that provides the techniques to transform these mounds of data into useful information which in turn enables to derive knowledge for decision making. A number of data mining techniques and tools are available to perform this task. The researcher considered selective techniques and tools which were used to explore the prevalence of diarrheal disease and develop classification and prediction models.
Thus, the purpose of this study is to investigate the potential applicability of data mining techniques in exploring the prevalence of diarrheal disease using the data collected from the diarrheal disease control and training center of African sub Region II in Tikur Anbessa Hospital. Patients’ records with age of five years (60 months) and under are included in the study. Two machine learning algorithms from WEKA software such as J48 Decision Trees(DT) and Naïve Bayes(NB) classifiers are adopted to classify diarrheal disease records on the basis of the values of attributes ‘Treatment’ and ‘Type of Diarrhea’. Initially, a total dataset of 5,572 records with 9 attributes were collected for the study. However, the size of class labels for the selected target classes was not balancedand number of records were resample using ‘SMOTE (Synthetic Minority Oversampling TEchnique) from Weka preprocess package. After this process, the number of records used for model building was increased to 13,710 and 16, 460, for ‘Treatment’ and ‘Type of diarrhea’ target classes respectively. This was done in order to decrease biasness or preconception of classifiers in model building process.
Results of the experiments have shown that J48 DT classifier has better classification and accuracy performance as compared to NB classifier. Two consecutive models selected in evaluation performance of these classifiers depicted that J48 DT and NB classified ‘treatment modalities’ and ‘diarrheal types’ with the accuracy of 88.3%, 79.54%, 85.64% and 73.94% respectively. Overall, this study has proved that data mining techniques are valuable to support and scale up the efficacy of health care services provision process.
Description
Keywords
Using Data Mining Technology