Constructing a Predictive Model For Occurrence of Tuberculosis: The Case of Menelik Ii Hospital and St. Peters Tb Specialized Hospital
No Thumbnail Available
Date
2013-05
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Background: Tuberculosis is a disease of poverty affecting mostly young adults in their most
productive years. In Ethiopia, TB is a disease of major public health problem. Early
identification and isolation of TB cases is critical to prevent further transmission, morbidity and
mortality caused by TB. Data mining has a potential to indentify hidden knowledge from huge
datasets. It is possible to use data mining algorithms for analysis and predicting the TB status of
patients.
Objective: The goal of this research was to apply data mining techniques for predicting the TB
status of patients. Specifically, identify the determinant attributes of TB status of patients, build
best prediction model and finally develop a prototype graphical user interface.
Methodology: A hybrid data mining process model that involved six steps is followed. This
study considers a total of 10,031 records from Menelik II and St. Peters TB specialized hospitals
patients’ data and 15 attributes for predicting the TB status. Descriptive data analysis,
visualization and statistical summary were implemented to gain understanding of the data.
Handling of missing values and data transformation were done to prepare the dataset for
experimentation. The mining algorithms used are decision tree, naïve bayes, support vector
machine and artificial neural network. To evaluate the models performance 10-fold cross
validation and confusion matrix are used.
Results: The result of the experiments with all and selected attributes showed that performance
of J48, Sequential minimal optimization and Multilayer perceptron were better with all attributes
than best selected attributes, whereas naïve bayes classifier performance increased with selected
attributes than all attributes. The results of the experiments show the performance of mining
algorithms decreases as the amount of training increases.
The best selected model to predict the TB status of patients in this study was generated by J48
decision tree with all attributes. The accuracy of this model is 95.24%. Graphical user interface
prototype was designed using the ten rules from J48 decision tree.
Conclusion: The results achieved from this research indicate that data mining is useful in
bringing relevant information from large and complex patients’ dataset, and we can use this
information for predicting TB status and decision making. The most important attributes that
determine the TB status of the patients are shortness of breath, chest pain, cough, weight loss,
loss of appetite, night sweats and HIV test results.
Description
Keywords
Constructing a Predictive Model