Mining Art Data Set to Predict Cd4 Cells Count the Case of Jimma, Bonga and Aman Hospitals

No Thumbnail Available

Date

2013-06

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Background: Recent reports from WHO and UNAIDS indicate that the number of people using ART are increasing from time to time. This number is dramatically increasing in sub Saharan African countries including Ethiopia. According to the report of WHO and UNAIDS, as of the end of 2011, over 8 million people had access to ART in low and middle-income countries. Objective: The purpose of this study is to apply data mining techniques on ART records of patients maintained in Jimma, Bonga and Aman Hospitals ART database to build a model capable of predicting CD4 cells count of patients after six, twelve and eighteen months of treatment. Methodology: The overall activity of this thesis is guided by a Hybrid-DM model which is a six step knowledge discovery process model. The study has used 7,252 instances, ten predicting and three outcome variables to run the experiments. Due to the nature of the problem and attributes contained in the dataset, classification mining task is selected to build the classifier models. The mining algorithms; J48, PART, SMO and MLP are used in all experiments due to their popularity in recent related works. In addition to base classifiers, due to the imbalanced nature of classes in each of the three outcome variables, a boosting algorithm (AdaBoostM1) is used to boost the classifiers predictive performance. Ten-fold cross validation technique is used to train and test the classifier models. Performance of the models is compared using accuracy, TPR, FPR, mean absolute error, F-measure, and the area under the ROC curve. Results: The boosting algorithm has given the base classifier a better predictive accuracy with the PART unprunned decision tree yielding a better model of the sixth and twelfth month CD4 count, and the pruned PART decision tree performed better for the eighteenth month CD4 count. The joined rules of the three models indicated that, baseline CD4 count, drug-regimen, age, family planning usage status, WHO clinical stage, and functional status of a patient are the most determinant attributes used to predict CD4 counts. Conclusion: A promising result is observed in applying data mining techniques to build CD4 count predictive model using socio-demographic, clinical and biological features. Future works can be done both on validating the results using clinical trials and also by doing the same study changing the source data or knowledge discovery techniques used in this work.

Description

Keywords

Citation