Predicting the Occurrence of Measles Outbreak in Ethiopia Using Data Mining Technology

No Thumbnail Available



Journal Title

Journal ISSN

Volume Title


Addis Ababa University


Measles is a contagious disease caused by measles virus. Measles virus is paramyxovirus of a single serological type. WHO (World Health Organization) expanded program on immunization in 1989 estimates that 1.6 million people die from measles each year in developing countries making it the biggest killer among the six EPI (Expanded Program for Immunizations) target disease. . Furthermore WHO estimates that during 2000–2007, measles deaths declined by 89% in WHO African regions, from approximately 395 000 in 2000 to 45000 in 2007. Although global deaths from measles have decreased markedly in past decades, largely as a result of intensive vaccination efforts, still measles outbreaks continue to occur throughout the regions. The main objective of this study is to design a predictive model using data mining technology that can help predict the occurrence of measles outbreaks in Ethiopia. This can greatly support the effort to control the outbreak of measles, help efficient use of data and also effective utilization of the already scarce resource of Ethiopia. Data mining provides automated pattern recognition and attempts to uncover patterns in data that are difficult to detect with traditional statistical methods. The application of data mining in the health care industry has a long and successful history. Data mining has a greater advantage to raise the quality and efficiency of health-related products and services. The methodology used to achieve the goal of building predictive model using data mining technique for this research was a hybrid six-step Cios KDP. It had six basic steps. These were: problem domain understanding, data understanding, data preparation, data minng, evaluation of the discovered knowledge and use of the discovered knowledge. The required data was collected from WHO measles surveillance database covering the period 2006-2011. Then, data preparation tasks (such as data transformation, deriving of new fields, and handling of missing variables) were undertaken. Naïve bayes and decision tree data mining techniques were employed to build and test the models. Models were built and tested by using a dataset of 15631 records.The researcher used Naïve bayes and decision tree data mining techniques to build the models. To get a better insight in choosing which model produced sound prediction and higher accuracy, 12 experiments were done with J48 algorithm and naïve bayes classifier, by inputting all the records with a 10-fold cross-validation mode, and inputting 70% of the records to train a model and then supply the unseen 30% of the record for testing the performance of the model. The next option used by the researcher to improve the performance of the models were to test if a better model could be obtained by excluding one or more of the input variables and training different models. The J48 algorithm has shown better prediction accuracy. The results from this study were very promising. It proved that applying data mining techniques on measles surveillance data to build a model that predicts the occurrence of measles outbreak in different Ethiopian Regions is possible.



Measles Outbreak in Ethiopia, Using Data Mining Technology