Constructing a Predictive Model For Occurrence of Tuberculosis: The Case of Menelik Ii Hospital and St. Peters Tb Specialized Hospital

No Thumbnail Available



Journal Title

Journal ISSN

Volume Title


Addis Ababa University


Background: Tuberculosis is a disease of poverty affecting mostly young adults in their most productive years. In Ethiopia, TB is a disease of major public health problem. Early identification and isolation of TB cases is critical to prevent further transmission, morbidity and mortality caused by TB. Data mining has a potential to indentify hidden knowledge from huge datasets. It is possible to use data mining algorithms for analysis and predicting the TB status of patients. Objective: The goal of this research was to apply data mining techniques for predicting the TB status of patients. Specifically, identify the determinant attributes of TB status of patients, build best prediction model and finally develop a prototype graphical user interface. Methodology: A hybrid data mining process model that involved six steps is followed. This study considers a total of 10,031 records from Menelik II and St. Peters TB specialized hospitals patients’ data and 15 attributes for predicting the TB status. Descriptive data analysis, visualization and statistical summary were implemented to gain understanding of the data. Handling of missing values and data transformation were done to prepare the dataset for experimentation. The mining algorithms used are decision tree, naïve bayes, support vector machine and artificial neural network. To evaluate the models performance 10-fold cross validation and confusion matrix are used. Results: The result of the experiments with all and selected attributes showed that performance of J48, Sequential minimal optimization and Multilayer perceptron were better with all attributes than best selected attributes, whereas naïve bayes classifier performance increased with selected attributes than all attributes. The results of the experiments show the performance of mining algorithms decreases as the amount of training increases. The best selected model to predict the TB status of patients in this study was generated by J48 decision tree with all attributes. The accuracy of this model is 95.24%. Graphical user interface prototype was designed using the ten rules from J48 decision tree. Conclusion: The results achieved from this research indicate that data mining is useful in bringing relevant information from large and complex patients’ dataset, and we can use this information for predicting TB status and decision making. The most important attributes that determine the TB status of the patients are shortness of breath, chest pain, cough, weight loss, loss of appetite, night sweats and HIV test results.



Constructing a Predictive Model