Browsing by Author "Anagaw, Shegaw"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Application of Data Mining Technology to Identify Significant Patterns in Census or Survey Data: the Case of 2001 Child Labor Survey in Ethiopia(Addis Ababa University, 2003-06) Tefera, Helen; Kebede, Gashaw(PhD); Anagaw, ShegawKnowledge and understanding of a problem is always the first step in identifying effective solutions. Child labor is both a sign and cause of poverty that should be eliminated as soon as possible. In Ethiopia, there is no much statistical data on child labor practice. To fill this data gap, the FDRE, CSA carried out country wide child labor survey in 2001. This organization uses very simple statistical tools to show summary figures of different variables involved in 2001 child labor survey database. However traditional statistical methods are not good enough to discover complex relationships from large volume databases. The inefficiency of these tools necessitated the development of more powerful methods and techniques that can be used to study relationships and patterns through the large volumes of data collected for example for census and survey purposes. In developed world, government and non-government organizations which have access to censuses and surveys are making use of the relatively new and modern technology, data mining, to identify important patterns and relationships within the data that is accumulated in large database. The application of data mining techniques to official data such as the 2001 child labor survey has great potential in supporting good public policy. This research focused on identifying relationships between attributes within the 2001 child labor survey database that can be used to clearly understand the nature of child labor problem in Ethiopia. So the goal of the data mining process in this research was identifying interesting patterns and relationships in the 2001 child labor database. After the identification and understanding of the problem domain and the research objectives, the remaining stages of the research project focused on the following three major phases in data mining process. During the first phase, selection of the appropriate data mining tool which can be used to attain the defined data mining goal and the target dataset used in model building were the major tasks. The next phase, data cleaning and preparation, involved identifying and correcting mis-transmitted information, consolidating and combining records, transforming data from one form to another suitable for the selected data mining tool, handling missing attributes and selecting relevant attributes for generating meaningful association rules. As a final step for data preparation, the selected dataset was categorized into five classes using expectation maximization clustering algorithm implemented in knowledge studio version 3.0. A dataset of 2398 records with 63 attributes were used for clustering purpose. Apriori is an association rule algorithm which is implemented in Weka software. In the third phase, model building and evaluation, the apriori algorithm was used to generate association rules from the clustered as well as non-clustered selected dataset. Different attributes were given to apriori in an effort to generate meaningful rules. The results from this study were encouraging, which strengthened the hypothesis that interesting patterns can be generated from census and survey database by applying one of the data mining techniques: association rule mining. Key words:Data mining , knowledge discovery, association rule, apriori algorithm.Item Application of Data Mining Technology to Predict Child Mortality Patterns: the Case of Butajira Rural Health Project (BRHP)(2002-06) Anagaw, Shegaw; Birru, Tesfaye; Worku, Alemayehu(PhD); Teferri, DerejeTraditionally, very simple statistical techniques are used in the analysis of epidemiological studies. The predominant technique is logistic regression, in which the effects predictors are linear. However, because of their simplicity, it is difficult to use these models to discover unanticipated complex relationships, i.e., non-linearities in the effect of a predictor or interactions between predictors. Specifically, as the volume of data increases, the traditional methods will become inefficient and impractical. This in turn calls the application of new methods and tools that can help to search large quantities of epidemiological data and to discover new patterns and relationships that are hidden in the data. Recently, to address the problem of identifying useful information and knowledge to support primary healthcare prevention and control activities, health care institutions are employing the data mining approach which uses more flexible models, such as, neural networks and decision trees, to discover unanticipated features from large volumes of data stored in epidemiological databases. Particularly, in the developed world, data mining technology has enabled health care institutions to identify and search previously unknown, actionable information from large health care databases and to apply it to improve the quality and efficiency of primary health care prevention and control activities. However, to the knowledge of the researcher, no health care institution in Ethiopia has used this state of the art technology to support health care decision-making. Thus, this research work has investigated the potential applicability of data mining technology to predict the risk of child mortality based up on community-based epidemiological datasets gathered by the BRHP epidemiological study. The methodology used for this research had three basic steps. These were collecting of data, data preparation and model building and testing. The required data was selected and extracted from the ten years surveillance dataset of the BRHP VIII epidemiological study. Then, data preparation tasks (such as data transformation, deriving of new fields, and handling of missing variables) were undertaken. Neural network and decision tree data mining techniques were employed to build and test the models. Models were built and tested by using a sample dataset of 1100 records of both alive and Died children. Several neural network and decision tree models were built and tested for their classification accuracy and many models with encouraging results were obtained. The two data mining methods used in this research work have proved to yield comparably sufficient results for practical use as far as misclassification rates come into consideration. However, unlike the neural network models, the results obtained by using the decision tree approach provided simple rules that can be used by nontechnical health care professionals to identify cases for which the rule is applicable. In this research work, the researcher has proved that an epidemiological database could be successfully mined to identify public health and socio-demographic determinants (risk factors) that are associated with infant and child mortality in rural communities.Item Application of Data Mining Technology to Predict Child Mortality Patterns: The Case of Butajira Rural Health Project (Brhp)(Addis Ababa University, 2002-06) Anagaw, Shegaw; Biru, Tesfaye (PhD); Teferri, Dereje (PhD)Traditionally, very simple statistical techniques are used in the analysis of epidemiological studies. The predominant technique is logistic regression, in which the effects predictors are linear. However, because of their simplicity, i.t is difficult to use these models to discover unanticipated complex relationships, i.e., non-linearities in the effect of a predictor or interactions between predictors. Specifically, as the volume qj data increases, the traditional methods will become inefficient and impractical. This in turn calls the application of new methods and tools that can help to search large quantities of epidemiological data and to discover new patterns and relationships that are hidden in the data. Recently, to address the problem of identifying useful information and knowledge to support primary healthcare prevention and control activities, health care institutions are employing the data mining approach which uses more flexible models, such as, neural networks and decision trees, to discover unanticipated features from large volumes of data stored in epidemiological databases.Particularly, in the developed world, data mining technology has enabled health care institutions to identify and search previously unknown, actionable information from large health care databases and to apply it to improve the quality and efficiency of primary health care prevention and control activities. However, to the knowledge of the researcher, no health care institution in Ethiopia has used this state of the art technology to support health care decision-making.Thus, this research work has investigated the potential applicability of data mining technology to predict the risk of child mortality based up on community-based epidemiological datasets gathered by the BRHP epidemiological study. The methodology used for this research had three basic steps. These were collecting of data, data preparation and model building and testing. The required data was selected and extracted from the ten yea rs surveillance dataset of the BRHP epidemiological study. Then, data preparation tasks (such as data transformation, deriving of new fields, and handling of missing variables) were undertaken. Neural network and decision tree data mining techniques were employed to build and test the models. Models were built and tested by using a sample dataset of 1100 records of both alive and Died children.Several neural network and decision tree models were built and tested for their classification accuracy and many models with encouraging results were obtained. The two data mining methods used in this research work have proved to yield comparably sufficient results for practical use as far as misclassification rates come into consideration. However, unlike the neural network models, the results obtained by using the decision tree approach provided simple rules that can be used by nontechnical health care professionals to identify cases for which the rule is applicable.In this research work, the researcher has proved that an epidemiological database could be successfully mined to identify public health and sociology-demographic determinants (risk factors) that are associated with infant and child mortality in rural communitiesItem Application of KDD on Crime Data to Support the Advocacy and Awareness Raising Program of Forum on Street Children Ethiopia(Addis Ababa University, 2003-07) Kifle, Woldekidan; Kebede, Gashaw(PhD); Anagaw, ShegawThis thesis work gives an account of the process followed to determine the application of KDD to support the advocacy and awareness raising program of FSCE and Addis Ababa Police Commission, and the potential of a data mining learning scheme to discover regularities that underlie the crime dataset. The KDD process as described by Fayyad, Piatetsky-Shapiro and Gregory (1996) that consists of five major phases, namely understanding of the problem domain, data selection, data preprocessing, data mining, and discussion and interpretation was adopted. The discovery task was run on the crime database that consists of 10,878 records/tuples in 17 tables describing a total of 25 attributes. Association rule mining, an exploratory data mining technique was applied to accomplish the goal of the research. To this effect, the Apriori algorithm, which is an implementation of the Association rule in the Weka software, was used. The KDD process can be applied on the crime database to good effect since it can result in rules that can serve as input for the advocacy and awareness raising program. On the basis of subjective (opinions of domain experts) and objective (support and confidence) measures of interestingness, a number of rules having practical relevance or that can add to the current knowledge in the problem domain were identified.