Analyzing the Outbreak Surveillance and Response System in Ethiopia using Data Mining Techniques

dc.contributor.advisor Abebe, Ermias (PhD)
dc.contributor.advisor Addisie, Adamu (PhD)
dc.contributor.author Mohammed, Yimer
dc.date.issued 2012-11
dc.description.abstract The aim of this research work was to show the applicability of data mining techniques for the development of descriptive and predictive model to disease outbreak surveillance datasets in Ethiopia. To do that the three data mining applications such as classification, clustering and association rules mining were undertaken to explore the important applications to the datasets of the PHEM sectors from different perspectives. A total of 18600 records were collected and assessed from the data store of the surveillance system from the year 2004-2012G.C. After the preprocessing phase of knowledge discovery in databases of data mining application a total of 8796 records were prepared for data mining algorithms. From the total records filtered and prepared for model preparation 4703 were from the IDSR system dataset and the remaining 4093 records were taken from that of the PHEM dataset from the year 2004- 2008G.C. and 2009-2012G.C. respectively. The researcher analyzed two classification algorithms for the prediction of Epidemic typhus disease cases with decision tree J48 classifiers and Naïve Bayes classifiers. Finally the more performing algorithm has been taken for model development. From the results of the experiments done decision tree algorithm had a better performance to classify the disease cases in place and time setting. The accuracy rate of correctly classifying the Epidemic Typhus disease cases by the use of decision tree J48 algorithm was 87.44% whereas with Naïve Bayes classifier was 83.70%. The sensitivity and specificity test was also done for the two classifiers. The researcher also attempted to analyze the application of association rule mining to find some sort of correlation or patters among disease cases of the surveillance data. The attributes were selected only from the disease cases for the occurrence and nonoccurrence, which were collected in time and place bases. Here, Apriori association rule mining algorithm was run to find interesting patterns among the occurrence and co-occurrence of disease cases which were correlated together. The researcher used 20% for the minimum support and 90% for minimum confidence threshold before the application of the mining algorithm. The researcher took the combined (integrated) datasets for cluster analysis with the total numbers of 8796 records with 9 attributes. Simple K-Means clustering algorithm was used for the combined datasets since; the algorithm showed the grouping of disease cases with respect to time and place. In general data mining techniques were important and applicable in the classification, clustering and association rules model development for emerging and reemerging disease cases. But the datahas to have good quality with the inclusion of important attributes of variables for better prediction and description model development The results of the research, apart from its education purpose, were also used for the planning, preparedness, decision making, and disease control and prevention activities to the domain experts. en_US
dc.title Analyzing the Outbreak Surveillance and Response System in Ethiopia using Data Mining Techniques en_US
