Predicting the Occurrence of Measles Outbreak in Ethiopia Using Data Mining Technology
No Thumbnail Available
Date
2011-07
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Measles is a contagious disease caused by measles virus. Measles virus is paramyxovirus of a
single serological type. WHO (World Health Organization) expanded program on immunization
in 1989 estimates that 1.6 million people die from measles each year in developing countries
making it the biggest killer among the six EPI (Expanded Program for Immunizations) target
disease. . Furthermore WHO estimates that during 2000–2007, measles deaths declined by 89%
in WHO African regions, from approximately 395 000 in 2000 to 45000 in 2007. Although
global deaths from measles have decreased markedly in past decades, largely as a result of
intensive vaccination efforts, still measles outbreaks continue to occur throughout the regions.
The main objective of this study is to design a predictive model using data mining technology
that can help predict the occurrence of measles outbreaks in Ethiopia. This can greatly support
the effort to control the outbreak of measles, help efficient use of data and also effective
utilization of the already scarce resource of Ethiopia.
Data mining provides automated pattern recognition and attempts to uncover patterns in data that
are difficult to detect with traditional statistical methods. The application of data mining in the
health care industry has a long and successful history. Data mining has a greater advantage to
raise the quality and efficiency of health-related products and services.
The methodology used to achieve the goal of building predictive model using data mining
technique for this research was a hybrid six-step Cios KDP. It had six basic steps. These were:
problem domain understanding, data understanding, data preparation, data minng, evaluation of
the discovered knowledge and use of the discovered knowledge. The required data was collected
from WHO measles surveillance database covering the period 2006-2011. Then, data preparation
tasks (such as data transformation, deriving of new fields, and handling of missing variables)
were undertaken. Naïve bayes and decision tree data mining techniques were employed to build
and test the models. Models were built and tested by using a dataset of 15631 records.The researcher used Naïve bayes and decision tree data mining techniques to build the models.
To get a better insight in choosing which model produced sound prediction and higher accuracy,
12 experiments were done with J48 algorithm and naïve bayes classifier, by inputting all the
records with a 10-fold cross-validation mode, and inputting 70% of the records to train a model
and then supply the unseen 30% of the record for testing the performance of the model. The next
option used by the researcher to improve the performance of the models were to test if a better
model could be obtained by excluding one or more of the input variables and training different
models. The J48 algorithm has shown better prediction accuracy.
The results from this study were very promising. It proved that applying data mining techniques
on measles surveillance data to build a model that predicts the occurrence of measles outbreak in
different Ethiopian Regions is possible.
Description
Keywords
Measles Outbreak in Ethiopia, Using Data Mining Technology