Browsing by Author "Meshesha, Million(PhD)"
Now showing 1 - 10 of 10
Results Per Page
Sort Options
Item Afaan Oromo Text Retrieval System(Addis Ababa University, 2012-06) Gutema, Gezehagn; Meshesha, Million(PhD)This study is mainly intended to make possible retrieval of Afan Oromo text documents by applying techniques of modern information retrieval system. Information retrieval is a mechanism that enables finding relevant information material of unstructured nature that satisfies information need of user from large collection. Afaan Oromo text retrieval developed in this study has indexing and searching parts. Vector Space Model of information retrieval system was used to guide searching for relevant document from Oromiffa text corpus. The model is selected since Vector space model is the widely used classic model of information retrieval system. The index file structure used is inverted index file structure. For this study text document corpus is prepared by the researcher encompassing different news article and experiment is made by using 9(nine) different user information need queries. Various techniques of text pre-processing including tokenization, normalization, stop word removal and stemming are used for both document indexing and query text. The experiment shows that the performance is on the average 0.575(57.5%) precision and 0.6264(62.64%) recall. The challenging tasks in the study are handling synonymy and polysemy, inability of the stemmer algorithm to all word variants, and ambiguity of words in the language. The performance the system can be increased if stemming algorithm is improved, standard test corpus is used, and thesaurus is used to handle polysemy and synonymy words in the language.Item Application of Data Mining Technology in Predicting The Seropre valence of Hbv, Hcv,Hiv; The Case of The National Blood Bank of Addis Ababa, Ethiopia(Addis Ababa University, 2011-07) Gebregziabher, Haftom; Meshesha, Million(PhD)Recent advancements in communication technologies, on the one hand, and computer hardware and database technologies, on the other hand, have made it easy for organizations to collect, store and manipulate massive amounts of data. As stated by Deogan, these large databases contain potential gold mine of valuable information, but it is beyond human ability to analyze substantial amounts of data and extract meaningful patterns. As the volume of data increases, the proportion of information in which people could understand decreases substantially. The applications of learning algorithms in knowledge discovery are promising and they are relevant area of research offering new possibilities and benefits in real-world applications such as blood bank data warehouse. The availability of optimal blood in blood banks is a critical and important aspect in a Blood transfusion service. Blood banks are typically based on a healthy person voluntarily donating blood used for transfusions. The ability to identify regular blood donors enables blood bank and voluntary organizations to plan systematically for organizing blood donation camps in an efficient manner. The objective of this study is to explore the immense applicability of data mining technology in the Ethiopian National Blood Bank Service by developing a predictive model that could help in the donor recruitment strategies by identifying donors that are at risk of TTI’s which can help in the collection of safe blood group which in turn assists in maintaining optimal blood. The analysis has been carried out on 14575 blood donor’s dataset that has at least one pathogen using the J48 decision tree and Naïve Bayes algorithm implemented in Weka. J48 decision tree algorithm with the overall model accuracy of 89 % has offered interesting rulesItem Evaluation of Knowledge Sharing Practice in Commercial Bank of Ethiopia(Addis Ababa University, 2011-06) Mohammed, Habtamu; Meshesha, Million(PhD)These days the banking sector is operating in a highly dynamic and competitive environment. Thus, with these driving forces banks are starting to understand the relevance and importance of financial knowledge sharing. They are beginning to appreciate knowledge as the most significant and valued assets that lead to organizational excellence for competitive advantage. Therefore, the main aim of this study is to evaluate the practice of knowledge sharing at Commercial Bank of Ethiopia using Nonaka‘s SECI model of knowledge creation and sharing. Nonaka‘s SECI model consists of Socialization, Externalization, Combination and Internalization, which is crucial to assess individual, group and organizational knowledge flow from tacit-to-tacit, tacit-to-explicit, explicit-to-explicit and explicit-totacit, respectively. To this end, the necessary data is collected using questionnaire from Commercial Bank of Ethiopia southern Addis Ababa district and the departments of Human Resource, Information Technology, Procurement and Outsourcing to get the overall picture of the knowledge flow in the banking sector. The result of the study revealed that Combination phase is with low standard deviation value of 1.01, followed by Socialization with standard deviation of 1.17 and Externalization and Internalization have a similar standard deviation of 1.22. From this, we can understand that the bank has relatively in a good position in synthesizing explicit knowledge from the existing explicit knowledge to come up with organizational knowledge. However, the culture of tacit-to-explicit and explicitto- tacit knowledge sharing is minimal. The major barrier to share knowledge among employees of the Bank is lack of time for externalizing existing knowledge and internalizing new knowledge. As a result, the Bank needs to arrange appropriate time for enabling knowledge sharing practice among employees. Further research directions are recommended to enhance knowledge sharing in Commercial Bank of Ethiopia.Item Feature Extraction and Classification Schemes for Enhancing Amharic Braille Recognition System(Addis Ababa University, 2011-06) Tadesse, Shumet; Meshesha, Million(PhD)Information in written form plays an undeniably important role in our daily lives. Recording and using information encoded in symbolic form is essential. Visually impaired people face a distinct disadvantage in this respect. To address their information need, the most widely adopted writing convention among visually impaired people is Braille. Since its inception in 1829, significant developments have taken place in the production of Braille and Braille media as well as in the transcription of printed material into Braille. Braille is understandable by visually impaired people; however vision people need not be able to understand these codes. The need to understand Braille documents by vision society and the production of huge amounts of Braille documents motivated the development of OBR for different languages (such as English, Arabic, etc.) across the world. The development of OBR for Amharic Braille has been started in recent years. However, OBR for Amharic Braille is still an area that requires the contribution of many research works. In this study an attempt has been made in exploring feature extraction and classification techniques for Amharic Braille recognizer. To extract valid Braille dots from a Braille image and to group them into Braille cells, three feature extraction algorithms based on: fixed cell measures, horizontal and vertical projections, and grid construction are tested. The experimental result shows that feature extraction based on fixed cell measures performs well. To build classification models for prediction of Amharic characters from Braille cell representation J48 decision tree and the support vector machine (SVM) classifiers are investigated. Based on experimental results SVM outperforms decision tree classifier in predicting unseen extracted Braille features. The explored feature extraction and classification techniques are integrated to the Amharic OBR system and are tested on real life Braille documents, in which 90.67% accuracy, on the average, is registered. This shows a promising result to design an applicable system. Handling noisy real-life Braille documents is the future research direction that needs an integration of generic segmentation and noise removal techniques.Item Feature Extraction and Matching in Amharic Document Image Collections(Addis Ababa University, 2011-06) Letta, Adane; Meshesha, Million(PhD)The ubiquity of digital computers and the boom of the Internet and World Wide Web resulted in massive information explosion over the entire world. Different types of information are uploaded in the Internet such as text documents, document images and other multimedia files. Document images facilitate office automation by preserving scanned documents in a document image database. However, information retrieving from document image database becomes a difficult task for organizations due to lack of efficient retrieval schemes. To overcome this challenge, recognition based and recognition free retrieval approaches are attempted by researchers. Recognition based retrieval first applies optical character recognition (OCR) to convert document images into text and then performs text retrieval using search engines. On the other hand, recognition free approach attempts to search and retrieve directly from document images relying on image features. Due to the limitation of OCR systems, recognition based retrieval is not effective. Hence, attempts are made by different researchers to develop a document image retrieval system without explicit recognition. On top of this, attempts are made to develop effective Amharic document image retrieval system. As a continuation, the current study is initiated to explore and design feature extraction and matching schemes that are insensitive to word variants, difference in font types, sizes and styles and degradation. In doing so, eight feature extraction methods and four matching techniques are tested. Of the four matching schemes dynamic time warping is insensitive to font types, sizes and styles difference. The eight feature extraction techniques are tested for performance, and then each feature is combined systematically following best stepwise feature selection method. The result shows that combined features score better performance than individuals. Using the best performer matching algorithm stemming is performed in image domain to handle word variants. Accordingly, promising experimental results are registered for word variants. The explored matching, feature extraction and stemming techniques are integrated with the previous Amharic document image retrieval system and tested on noisy document images. As the experimentation, the performance of the current system outperforms the previous attempts. Besides, relevant conclusions are drawn and some valid recommendations are forwarded to future investigation.Item Mining Insurance Data for Fraud Detection: the Case of Africa Insurance Share Company(Addis Ababa University, 2011-06) Adame, Tarku; Meshesha, Million(PhD)The insurance industry has historically been a growing industry. It plays an important role in insuring the economic well being of one country. But ever since it’s beginning as a commercial enterprise, the industry is facing difficulties with insurance fraud. Insurance fraud is very costly and has become a world concern in recent years. Fraudulent claims account for a significant portion of all claims received by insurers, and cost billions of dollars annually. Nowadays, great efforts have been made to develop models to identify potentially fraudulent claims for special investigations using the data mining technology. This study is initiated with the aim of exploring the potential applicability of the data mining technology in developing models that can detect and predict fraud suspicious in insurance claims with a particular emphasis to Africa Insurance Company. The research has tried to apply first the clustering algorithm followed by classification techniques for developing the predictive model. K-Means clustering algorithm is employed to find the natural grouping of the different insurance claims as fraud and non-fraud. The resulting cluster is then used for developing the classification model. The classification task of this study is carried out using the J48 decision tree and Naïve Bayes algorithms in order to create the model that best classify fraud suspicious insurance claims. The experiments have been conducted following the six-step Cios et al. (2000) process model. For the experiment, the collected insurance dataset is preprocessed to remove outliers, fill in missing values, select attributes, integrate data and derive attributes. The preprocessing phase of this study really took the highest portion of the study time. A total of 17810 insurance claim records are used for training the models, while a separate 2210 records are used for testing their performance. The model developed using the J48 decision tree algorithm has showed highest classification accuracy of 99.96%. This model is then tested with the 2210 testing dataset and scored a prediction accuracy of 97.19%. The results of this study have showed that the data mining techniques are valuable for insurance fraud detection. Hence future research directions are pointed out to come up with an applicable system in the area.Item Predicting Infant Immunization Status in Ethiopian: The Case of Ethiopia Demographic and Health Survey 2011.(Addis Abeba University, 2014-06) Abebe, Hiwot; Meshesha, Million(PhD); Mekonnen, Wubegzier(PhD)Background: Immunization is one of the most cost effective and efficient interventions saving the lives of many millions of infants and children from dying of infectious and preventable diseases. In 2007, approximately 27 million infants are not vaccinated against common childhood diseases and 2–3 million children are dying annually from easily preventable diseases and many more fall ill. Objective: The research has a general objective of construct a predictive model using data mining technology that helps to predict the infants’ immunization status in Ethiopia. The result of the study is expected to be important for different parties such as infants, health professionals, policy makers, programmers and researchers. Methodology: This study is guided by a Hybrid-data mining model which is a six step knowledge discovery process model such as understanding of the problem, understanding of the data, preparation of the data, data mining, and evaluation of the discovered knowledge and use of the discovered knowledge. The study has used 8,210 instances, 12 predicting and one outcome variables to run the experiments.Due to the nature of the problem and attributes contained in the dataset, classification data mining task is selected to build the classifier models. The mining algorithms; J48 decision tree, sequence minimal optimization support vector machine, multilayer perceptron neural network and partial decision tree rule induction are used in all experiment due to their popularity in recent related works. Ten-fold cross validation technique is used to train and test the classifier models. Performance of the models is compared using accuracy, true positive rate, false positive rate, and the area under the Receiver Operating Characteristics curve. Result: The J48 decision tree has given the best classification and a better predictive accuracy of the infant immunization status in Ethiopia. The experiment has generated a model with accuracy of 62.5%, weighted precision of 62.5% and weighted ROC area of 67.6% for the J48 decision tree. And if place of delivery = home region = Affar AND mother-education-level = noeducation AND wealth-status=poor AND listening-to-radio=not-at-all AND mother-age=2529 AND parity = 6-7 then Unimmunised (10.0/1.0)wherefore, increase awareness creation among women in pastoralist communities so as to enhance vaccine coverage. Conclusion: The results achieved from this research indicate that data mining is useful in bringing relevant information from large and complex EDHS dataset, and we can this information for predicting infant immunization status and decision making. The most important attributes that determine infant immunization status were place of delivery, region, mother's educational level, listening to radio, father education level, residence, mother age, wealth status, parity, distance to health facility and marital status.Item A Self-learning Knowledge Based System for Diagnosis and Treatment of Diabetes(Addis Ababa University, 2013-01) Geberemariam, Solomon; Meshesha, Million(PhD)Diabetes is a permanent disease in which the human body‟s cells either do not respond properly to insulin or insulin production is insufficient. If the disease is not treated well and on time, it can lead to severe health problems like heart disease, blindness, failure of kidney, and amputations of the lower extremity. Therefore, this chronic disease needs dietary control, physical exercise and insulin management. However, among people in the developing countries like Ethiopia, permanent diseases are growing to be causes of death. These problems are becoming worse due to the scarcity of specialists, practitioners and health facilities. In Ethiopia, there has been observed a threat of increased diabetes prevalence and the number of death rates imputed to diabetes reached above 21,000 in 2007. In an effort to address such problem, this study attempts to design and develop a prototype self-learning knowledge-based system that can provide advice for physicians and patients to facilitate the diagnosis and treatment of diabetic patients. To this end, knowledge is acquired using both structured and unstructured interviews from domain experts which are selected using purposive sampling technique from Black Lion Hospital Diabetes Center. Relevant documents analysis method is also followed to capture explicit knowledge. Then, the acquired knowledge is modeled using decision tree that represents concepts and procedures involved in diagnosis and treatment of diabetes and production rules are used to represent the domain knowledge and knowledge-based system is developed using SWI Prolog editor tool. It uses backward chaining which begins with possible solutions or goals and tries to gather information that verifies the solution. Moreover, in testing and evaluating the prototype system eighteen patients‟ history are selected in order to test the accuracy of the prototype system and also for ensuring whether the prototype system satisfies the requirements of its end-users or not. Thus, the overall total performance of the prototype system is 84.2%. The prototype system achieves a good performance and meets the objectives of the study. However, in order to make the system applicable in the domain area for diagnosis and treatment of diabetes additional study is needed like updating the rules in the knowledge base of the system automatically, incorporating a well designed user interface and a mechanism of NLP facilities. Keywords: Knowledge-Based System, Self-learning, Diabetes.Item A Self-learning Knowledge Based System for Diagnosis and Treatment of Diabetes(Addis Ababa University, 2013-01) Geberemariam, Solomon; Meshesha, Million(PhD)Diabetes is a permanent disease in which the human body‟s cells either do not respond properly to insulin or insulin production is insufficient. If the disease is not treated well and on time, it can lead to severe health problems like heart disease, blindness, failure of kidney, and amputations of the lower extremity. Therefore, this chronic disease needs dietary control, physical exercise and insulin management. However, among people in the developing countries like Ethiopia, permanent diseases are growing to be causes of death. These problems are becoming worse due to the scarcity of specialists, practitioners and health facilities. In Ethiopia, there has been observed a threat of increased diabetes prevalence and the number of death rates imputed to diabetes reached above 21,000 in 2007. In an effort to address such problem, this study attempts to design and develop a prototype self-learning knowledge-based system that can provide advice for physicians and patients to facilitate the diagnosis and treatment of diabetic patients. To this end, knowledge is acquired using both structured and unstructured interviews from domain experts which are selected using purposive sampling technique from Black Lion Hospital Diabetes Center. Relevant documents analysis method is also followed to capture explicit knowledge. Then, the acquired knowledge is modeled using decision tree that represents concepts and procedures involved in diagnosis and treatment of diabetes and production rules are used to represent the domain knowledge and knowledge-based system is developed using SWI Prolog editor tool. It uses backward chaining which begins with possible solutions or goals and tries to gather information that verifies the solution. Moreover, in testing and evaluating the prototype system eighteen patients‟ history are selected in order to test the accuracy of the prototype system and also for ensuring whether the prototype system satisfies the requirements of its end-users or not. Thus, the overall total performance of the prototype system is 84.2%. The prototype system achieves a good performance and meets the objectives of the study. However, in order to make the system applicable in the domain area for diagnosis and treatment of diabetes additional study is needed like updating the rules in the knowledge base of the system automatically, incorporating a well designed user interface and a mechanism of NLP facilities. Keywords: Knowledge-Based System, Self-learning, Diabetes.Item A Two step Approach for Tigrigna Text Categorization(Addis Ababa University, 2011-06) Assefa, Gebrehiwot; Meshesha, Million(PhD)Tigrigna language is a Semitic language spoken by the Tigray people in Northern Ethiopia and Eritrea which has more than six million speakers worldwide. There are large collections of Tigrigna document available in web, in addition to hard copy document in library, and documentation centers. Even though the amount of the document increase, there are challenging tasks to identify the relevant documents related to a specific topic. So, a text categorization mechanism is required for finding, filtering and managing the rapid growth of online information. Several researches have been done on text categorization, especially news text classification with the help of different machine learning approaches; and good results were found. However, with the growth of text corpus the text classification using a predefined category is an extremely costly and time-consuming activity. The need for classifiers that can learn from unlabeled data is required. Hence, this study attempts to design a two step Tigrigna text categorization system. First, clustering is used to find natural grouping of the unlabeled Tigrigna text documents. Here, repeated bisection and direct k-means clustering algorithms are used to obtain documents of natural group of the Tigrigna data set. The repeated bisection clustering algorithm outperforms the direct kmeans clustering algorithms. So the repeated bisection clustering algorithm results are selected for classification task. For the classification task decision tree and support vector machine techniques are used in the present study. The SMO support vector machine classifier performs better than J48 decision tree classifier. SMO registers 82.4% correct classification. However, there are challenges in designing a Tigrigna text categorization system; worth to mention are the mismatch encountered between clustering and classification algorithms, and the Tigrigna language ambiguity which demands further research to apply ontology-based hierarchical text categorization.