Application of Data Mining Technology to Predict Child Mortality Patterns: the Case of Butajira Rural Health Project (BRHP)
No Thumbnail Available
Date
2002-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Traditionally, very simple statistical techniques are used in the analysis of
epidemiological studies. The predominant technique is logistic regression, in which
the effects predictors are linear. However, because of their simplicity, it is difficult to
use these models to discover unanticipated complex relationships, i.e., non-linearities
in the effect of a predictor or interactions between predictors. Specifically, as the
volume of data increases, the traditional methods will become inefficient and
impractical. This in turn calls the application of new methods and tools that can help
to search large quantities of epidemiological data and to discover new patterns and
relationships that are hidden in the data. Recently, to address the problem of
identifying useful information and knowledge to support primary healthcare
prevention and control activities, health care institutions are employing the data
mining approach which uses more flexible models, such as, neural networks and
decision trees, to discover unanticipated features from large volumes of data stored
in epidemiological databases.
Particularly, in the developed world, data mining technology has enabled health
care institutions to identify and search previously unknown, actionable information
from large health care databases and to apply it to improve the quality and efficiency
of primary health care prevention and control activities. However, to the knowledge
of the researcher, no health care institution in Ethiopia has used this state of the art
technology to support health care decision-making.
Thus, this research work has investigated the potential applicability of data mining
technology to predict the risk of child mortality based up on community-based
epidemiological datasets gathered by the BRHP epidemiological study.
The methodology used for this research had three basic steps. These were collecting
of data, data preparation and model building and testing. The required data was
selected and extracted from the ten years surveillance dataset of the BRHP
VIII
epidemiological study. Then, data preparation tasks (such as data transformation,
deriving of new fields, and handling of missing variables) were undertaken. Neural
network and decision tree data mining techniques were employed to build and test
the models. Models were built and tested by using a sample dataset of 1100 records
of both alive and Died children.
Several neural network and decision tree models were built and tested for their
classification accuracy and many models with encouraging results were obtained.
The two data mining methods used in this research work have proved to yield
comparably sufficient results for practical use as far as misclassification rates come
into consideration. However, unlike the neural network models, the results obtained
by using the decision tree approach provided simple rules that can be used by nontechnical
health care professionals to identify cases for which the rule is applicable.
In this research work, the researcher has proved that an epidemiological database
could be successfully mined to identify public health and socio-demographic
determinants (risk factors) that are associated with infant and child mortality in rural
communities.
Description
Keywords
Data Mining