School of Information Science
Permanent URI for this college
Browse
Browsing School of Information Science by Author "Abebe, Ermias (PhD)"
Now showing 1 - 20 of 20
Results Per Page
Sort Options
Item Analyzing the Outbreak Surveillance and Response System in Ethiopia using Data Mining Techniques(Addis Ababa University, 2012-11) Mohammed Yimer; Abebe, Ermias (PhD); Addisse Adamu (PhD)The aim of this research work was to show the applicability of data mining techniques for the development of descriptive and predictive model to disease outbreak surveillance datasets in Ethiopia. To do that the three data mining applications such as classification, clustering and association rules mining were undertaken to explore the important applications to the datasets of the PHEM sectors from different perspectives. A total of 18600 records were collected and assessed from the data store of the surveillance system from the year 2004-2012G.C. After the preprocessing phase of knowledge discovery in databases of data mining application a total of 8796 records were prepared for data mining algorithms. From the total records filtered and prepared for model preparation 4703 were from the IDSR system dataset and the remaining 4093 records were taken from that of the PHEM dataset from the year 2004- 2008G.C. and 2009-2012G.C. respectively. The researcher analyzed two classification algorithms for the prediction of Epidemic typhus disease cases with decision tree J48 classifiers and Naïve Bayes classifiers. Finally the more performing algorithm has been taken for model development. From the results of the experiments done decision tree algorithm had a better performance to classify the disease cases in place and time setting. The accuracy rate of correctly classifying the Epidemic Typhus disease cases by the use of decision tree J48 algorithm was 87.44% whereas with Naïve Bayes classifier was 83.70%. The sensitivity and specificity test was also done for the two classifiers. The researcher also attempted to analyze the application of association rule mining to find some sort of correlation or patters among disease cases of the surveillance data. The attributes were selected only from the disease cases for the occurrence and nonoccurrence, which were collected in time and place bases. Here, Apriori association rule mining algorithm was run to find interesting patterns among the occurrence and co-occurrence of disease cases which were correlated together. The researcher used 20% for the minimum support and 90% for minimum confidence threshold before the application of the mining algorithm. The researcher took the combined (integrated) datasets for cluster analysis with the total numbers of 8796 records with 9 attributes. Simple K-Means clustering algorithm was used for the combined datasets since; the algorithm showed the grouping of disease cases with respect to time and place. In general data mining techniques were important and applicable in the classification, clustering and association rules model development for emerging and reemerging disease cases. But the datahas to have good quality with the inclusion of important attributes of variables for better prediction and description model development The results of the research, apart from its education purpose, were also used for the planning, preparedness, decision making, and disease control and prevention activities to the domain experts.Item Applicability of Data Mining Techniques to Support Voluntary Counseling and Testing (VCT) for HIV: The Case of Center for Disease Control and Prevention (CDC)(Addis Ababa University, 2009-01) Asmare, Biru; Abebe, Ermias (PhD)Data mining is emerging as an important tool in many areas of research and industry. Companies and organizations are increasingly interested in applying data mining tools to increase the value added by their data collections systems. Nowhere is this potential more important than in the healthcare industry. As medical records systems become more standardized and commonplace, data quantity increases with much of it going unanalyzed. Data mining can begin to leverage some of this data into tools that help health organizations to organize data and make decisions. Data related to HIV/AIDS are available in VCT centers. A major objective of this thesis is to evaluate the potential applicability of data mining techniques in VCT, with the aim of developing a model that could help make informed decisions. Using the dataset collected from OSSA, which is supported by CDC, and CRISP-DM as a knowledge discovery process model findings of the research are presented using graphs and tabular formats For the clustering task the K-means and EM algorithms were tested using WEKA. Cluster generated by EM were appropriate for the problem at hand in generating similar group. According to the results of these experiments it was possible to see similar groups from VCT clients. The gender, martial status, and HIV test result, and education has shown patterns. For the classification task, decision tree (J48 and Random tree) and neural network (ANN) classifier are evaluated .Although ANN shows better accuracy than decision tree classifier, the decision tree (J48) is appropriate for the dataset at hand and is used to build the classification model. Finally, cluster-derived classification models are tested for their cross-validation accuracy and compared with non cluster generated classification model. The outcomes of this research will serve users in the domain area, decision makers and planners of HIV intervention program like CDC and MOH.Item The Application of Decision Tree for Part of Speech (Pos) Tagging For Amharic(Addis Ababa University, 2009-09) Kebede, Gebeyehu; Abebe, Ermias (PhD)Automatic understanding of natural languages requires a set of language processing tools. POS tagger, which assigns the proper parts of speech (like noun, verb, adjective, etc) to words in a sentence, is one of these tools. This study investigates the possibility of applying decision tree based POS tagger for Amharic. The tagger was developed using j48 decision tree classifier algorithm, which is Weka’s implementation of C4.5 algorithm. In the process, a corpus developed by ELRC annotation team was used to get the required data for training and testing the models. The dataset is comprised of 1065 news documents; 210,000 words. A sample of some 800 sentences are selected and used for model development and evaluation. The dataset was preprocessed in line with the requirements of the Weka’s data mining tool. In order to support decision tree classification models, a table that contains the contextual and orthographic information is constructed semi-automatically and used as training and testing dataset. The right and left neighboring words tags for each word are used as contextual information. Moreover, orthographic information about the word like the first and last character, the prefix and suffix, existence of numeric digit within the word and so on are included in the table to provide useful information to the word to be tagged. Performance tests were conducted at various stages using 10-fold cross validation test option. Experimental results show that, only two successive left and right words tag provide useful contextual information; contextual information beyond two doesn’t provide useful information rather noise. In the end, an over all, including ambiguous and unknown words, 84.9% correctness (or accuracy) was obtained using 10-fold cross validation test option. Even though, the accuracy of this study is encouraging further study to improve the accuracy so as to reach at implementation level is recommended.Item Automatic Stemming For Amharic Text: An Experiment Using Successor Variety Approach(Addis Ababa University, 2009-01) Mezemir, Genet; Abebe, Ermias (PhD)The extensive use of the World Wide Web and the increasing digital availability of information and documents accelerated the demand for technologies and tools for an online data retrieval and extraction application. The natural language research, with the aim of quick and reliable online information searching and access, is one major component of the current advanced information technology development. In this research, an indexing system was developed and programmed by using the Successor Variety Stemming Algorithm to find stems for Amharic words. The research has set out to discover whether the Successor Variety Stemming Algorithm technique with the peak and plateau, entropy and complete word methods can be used for the Amharic language or what the limitation would be. In addition, the peak and plateau method compared with the entropy and the complete words method. Stemming is typically used in the hope of improving the accuracy of the search reducing the size of the index. A corpus of 6270 words was obtained form the Ethiopian News Agency (ENA) and Walta Information Center and used to train and test the methods. The experiment result showed that, the peak and plateau method had a performance of 71.8% level of accuracy, but the performance of the entropy and complete word methods are 63.95% and 57.99% level of accuracy respectively. Based on the observation made from the experimentation result, the successor variety algorithm with the peak and plateau method had a better performance than successor variety algorithm with the entropy method.Item Automatic Thesaurus Construction For Tigrigna Text Retrieval(Addis Ababa University, 2011) Hietel, Hagos; Abebe, Ermias (PhD)Thesaurus is a list of related terms, which helps to solve the vocabulary problem in information retrieval raised because authors and indexers use different terms for the same concept. Searchers may have no skill in selecting good search terms. They may use vocabulary for submitting a query that is different from the one indexed in the system. So, they may not get good results although there are some related documents in the collection. Therefore, it is reasonable to expand query terms with additional related terms drawn from a thesaurus. Tigrigna is a language in the Ethio-Semitic family spoken mainly in Tigray region of Ethiopia and in Eritrea. Currently, the size of electronic documents in Tigrigna language is increasing significantly. Robust retrieval tools would therefore be needed in order to use these documents. As thesaurus is an important component of information retrieval, studies have been conducted on automatic thesaurus construction for Tigrigna Information Retrieval. Even though automatic thesaurus construction has its own drawback, it is better than the alternative manual construction. In this thesis, an automatic approach to Tigrigna thesaurus construction from document collection based on term to term co-occurrence matrix is introduced. An encouraging result is obtained in the experimentation of the system on Tigrigna documents. The result on a random sample of terms shows that the system has accuracy of 75.28%.Item Automatic Thesaurus Construction From Wolaytta Text(Addis Ababa University, 2013-06) Beldados, Demewoz; Abebe, Ermias (PhD)Thesaurus is a set of terms for documents classification during indexing and query expansion during the process of searching with the aim of enhancing retrieval effectiveness. The major problem associated with information retrieval system: in one hand, users are required to explicitly describe their information need to the system, on the other the system itself often retrieve irrelevant documents due to vocabulary mismatch between query term and index term. As information retrieval system compares query term and index term at a lexical level, the mismatch is so pronounced to affect the retrieval performance. Therefore thesaurus a means to the problem by providing precise and controlled vocabulary of terms for indexing and searching there by resolve vocabulary mismatch. Wolaytta is an official language of literacy in Ethiopia. Since the introduction of the Latin script in the writing system in 1993, the language has evolved significantly from mere verbal communication to means of instruction then to source of information. To use the language as source of information, the retrieval system should be designed with enhanced capability in resolving what so ever mismatches that arise between query term and index term. This research thesis develops an automatic association thesaurus from Wolaytta text for possible inception of enhanced retrieval system or to provide a frame work for the development of crosslanguage retrieval system. The developed system is based on term-to-term co-occurrence based automatically constructed association thesaurus from document corpora. In order to obtain a reasonably good performance the system incorporated manual approaches regarding stop words and suffix list compilation processes and achieved a better result in generating related concepts.Item Designing a Stemmer for Afaan Oromo Text: A Hybrid Approach(Addis Ababa University, 2010-06) Tesfaye, Debela; Abebe, Ermias (PhD)Most natural language processing systems use stemmer as a separate module in their architecture. Specially, it is very significant for developing, machine translator, speech recognizer and search engines. In linguistic morphology, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form. In this thesis work, a stemming system for Afan Oromo is presented. This system takes as input a word and removes its affixes according to a rule based algorithm. This stemmer is not enough to define every rule applied in Afan Oromo word formation. Therefore, N-gram is integrated with the rule to handle cases that are not covered by rule in the hybrid version of this stemmer. The algorithm follows the known Porter algorithm for the English language and it is developed according to the grammatical rules of the Afan Oromo, as they are described in a Grammatical sketch of Written Oromo (Mewis, 2001) and Caasluga Afaan Oromoo, Jildii-1 (Oromo, 1995). Afan Oromo morphology was studied and described in order to model the language and develop an automatic procedure for conflation. The inflectional and derivational morphologies of the language are discussed. The result of the study is a prototype context sensitive iterative stemmer for Afan Oromo. Error counting technique was employed to evaluate the performance of this stemmer. For testing purpose 198 sentences (with a total of 2458 words) is collected from different public Afaan Oromo newspapers and bulletins to make the testing set address variety of issues. An evaluation of the system shows that the algorithms accuracy works with better performance than other past stemming algorithms for Afan Oromo giving 95.73 percent correct results. Finally, possible extensions of the proposed system and further evaluation methods are briefly reviewed.Item Designing A Stemmer For Ge’ez Text Using Rule Based Approach(Addis Ababa University, 2010-07) Belay, Abebe; Abebe, Ermias (PhD)In this study, a stemmer of Ge’ez text was developed. In designing processes, different concepts such as background for the thesis, literatures on conflation of the stemming algorithms, morphological nature of Ge’ez language, stemming techniques and other realted things were discussed in order to model and develop an automatic procedure for conflation. When inflectional and derivational morphologies of the language were discussed, affixations such as prefixing, infixing and suffixing are the main word formation processes in Ge’ez language. The language is morphologically complex. This is because different words can be formed due to the wide concatenations of affixes. For the experiment, two techniques were used: affix removal and morphological analysis techniques. To evaluate the stemmer, manually error counting technique was used. From the experiment, three types of errors are observed: over stemmed (6%), under stemmed (4.27%) and structural problems (7.31%). When the stemmer runs on the sample texts, it performed with an accuracy of 82.42%. The dictionary reductions of the stemmer were 29.9% to the stemmed words and 62.8% to root words. Lastly, the possible recommendations to future works and improvements of this work were reported.Item Designing a Web-Based Blood Bank Information Management System for the National Blood Bank of Ethiopia(Addis Ababa University, 2016-06) Kebede, Gadisa; Abebe, Ermias (PhD); Betre, Mulugeta (PhD)Background: Many medical advances that have improved the treatment of serious illness and injuries have increased the need for blood transfusion for patients‟ survival to support them through recovery or to maintain their health. Demand for blood is driven by an array of factors that include obstetric hemorrhage, road traffic accidents, armed conflict, sickle cell disease and childhood anemia, malnutrition, Human Immune Virus (HIV), malaria, and parasitic infections. Blood bank is a place where blood is collected from donors, typed and separated into components, stored, and prepared for transfusion to recipients. The blood bank information management system is used to control and manage the overall activities performed in the blood bank centers. Objective: The main objective of this project is to design a web-based blood bank information management system for the National Blood Bank of Ethiopia. Methodology: This project is carried out at National Blood Bank Center, Addis Ababa. The project follows a design science methodology and an object oriented system analysis and design approach to analyze and design the system. In-depth interview, document review and inventory were done to analyze the existing situation. To model the analysis and design of the proposed system Unified Modeling Language (UML) modeling techniques is used and both Hyper-Text Transfer Protocol (HTML) and Hypertext Preprocessor (PHP) is used to develop the system prototype. And My Structured Query Language (MySQL) database management system is used to design the prototype database. Results: All the system‟s processes and its boundary were identified and described by using use case diagram. Eight processes with their corresponding actors were identified for the system. The flow of the process were presented using activity diagrams. The object model were described by using class diagram. And finally, the system prototype was developed for the user interface testing. The results of the user interface testing shows that User test for the system prototype was done and it shows that 75% of the participants in the evaluation and testing has shown positive attitude and response for the system usability. Conclusion: This project shows only the system prototype of the blood bank information management system. The prototype can be developed through iterative process along with users‟ feedback. From the user test for the system prototype it is identified that some parts need to be improved. Recommendation: It is recommended for future researchers to implement the complete web-based blood bank information management system by enriching it with additional functionalities. Such functionality may include: integration with Smart Care, adding a knowledge based component, Short Message Service (SMS) based promotion.Item Designing an Electronic Medical Record System for Amanuel Hospital(Addis Ababa University, 2014-06) Alem, Getnet; Abebe, Ermias (PhD); Deyassa, Nigusse (PhD)Introduction: EMR is a computerized system of accessing the history of a patient’s care within a single practice. The content of an EMR is analogous to the paper record, but the electronic format creates usable data in medical outcome studies, improves the efficiency of care, and makes for more efficient communication among providers and easier management of health plans. Objective: The objective of this project was to Design an Electronic Medical Record System for Amanuel Mental Specialized Hospital. Method: The project used structured system analysis and design methodology with the incremental Water fall approach which includes requirement collection, analysis and design phases. Requirement was collected using the following tools (interview, observation and relevant document review techniques) to collect sufficient data needed for the system to be developed. Analysis and design of the proposed system was performed using tools like the data flow diagram, ER diagram, and flow chart diagram. Discussion of result: The current business process in Registration, Outpatient, Laboratory and Pharmacy departments of the hospital was described detail. According to the assessment most of the staffs didn’t take training on basic computer skill and EMR system. The infrastructure of the hospital is difficult to run the system and the internet connection is good however most of time it is interrupted. The numbers of hardware are not enough to run the system. The new system will have functionalities such as register users of system and patient personal information, search patient information, record patient diagnosis and treatment data, register patient laboratory and medication order. The system will have also non functionality requirements such as security, availability, maintainability and user interfaces issues. Conclusion: Electronic medical record significantly reduces medical errors, solve illegible hand writing of records, improve the quality and completeness of data and increase patient satisfaction by reducing patient waiting time. The requirements of the new system were collected using data collection tools and techniques. The business process of the current system, functional and non functional requirement and system requirement were described. The analysis of the proposed system were analyzed using the analysis model (use case diagram and use case description) and process model (contextual and DFD). The data model of the system was presented using the Entity relationship diagram. Therefore designing the EMR system at the hospital will lead to significant change in giving quality of patient care and solve the current problemsItem Designing an Information Extraction System for Amharic Vacancy Announcement Text(Addis Ababa University, 2011-06) Hirpassa, Sintayehu; Abebe, Ermias (PhD)The number of Amharic documents on the Web is increasing as many newspaper publishers started providing their services electronically. The unavailability of tools for extracting and exploiting the valuable information from Amharic text, which is effective enough to satisfy the users has been a major problem and manually extracting information from a large amount of unstructured text is a very tiresome and time consuming job, this was the main reason which motivate the researcher to engage in this research work. The overall objective of the research was to develop information extraction system for the Amharic vacancy announcement text. The system was developed by using Python and visual basic programming language and rule-based technique was applied to address the problem of automatically deciding the correct candidate texts based on its surrounding context words. 116 Amharic vacancy announcement texts which contain 10,766 words were collected from the ―Ethiopian reporter‖ newspaper published in Amharic twice in week. For this study, nine candidate texts are selected from Amharic vacancy announcement text, these are organization, position, qualification, experience, salary, number of people required, work agreement, deadline and phone number. The experiments have been carried out on each component of a system separately to evaluate its performance on each components, this helps us to identify drawbacks and give some clue for future works. The experimental result shows, an overall F - measure of 71.7% achieved. In order to make the system to be applicable in this domain which is Amharic vacancy announcement, further study is required like incorporating additional rules, improving the speed of the system by modifying the algorithm, a well designed user interface and integrating other NLP facilities.Item Developing an Enterprise Framework for Mental Health Information System in Addis Ababa(Addis Ababa University, 2014) Dejene, Hilina; Abebe, Ermias (PhD); Deyessa, Negussie (PhD)Background: “Mental health information system is a system for collecting, processing, analyzing, disseminating and using information about a mental health service and mental health desires of the population”. All types of mental health organizations should have a clearly defined set of quality information that is gathered and consolidated in to meaning full indicators for clinicians, managers and the executive. Objective: The general objective of this project was todevelop an enterprise framework for mental health information system in hospitals that provides mental health care in Addis Ababa. Methodology: For data collection interview, observation, document and literatures review was done. For the framework development the Zachman and the open group architectural frameworks were used. Iterative system development methodology was used for over all framework development. Discussion of Results: By using the perspectives of the Zachman framework and the open group architecture template different business, data and information architecture works were done. Taking the mental health organization mission, strategy and objectives in to consideration the investigator identified the architecture mission, business and information principles, information flow between different departments and different stakeholders that have impact on mental health information systems. Conclusion:Information is a critical component of mental health institutions for many purposes like patient care, decision making, and monitoring of outcomes. As a result of this proper and standardized way of information flow can improve communications among the business organizations and different stakeholders. This mental health information system architecture framework will serve as a base for developing more complete architecture framework of mental health institutions in the future.Item Development of A Stemmer for Afaraf Text Retrieval(Addis Ababa University, 2015-11) Taha, Osman; Abebe, Ermias (PhD)This study describes the design of a stemming algorithm for Afaraf text retrieval system. Nowadays, a considerable amount of electronical information has produced in Afaraf. Information retrieval system is a mechanism that enables users to retrieve relevant unstructured information material from large collection. The Afaraf morphology leterature reviewed in order to develop the rule-based stemmer. Each natural language has words structure in its own forms, that are different prefixes and suffixes, which need special handling of affixes with specific rules. The rule based stemmer proposed based on grammar and dictionary of Afaraf, included Numbers (singular and plural), personal pronoun, adjectives, adverbs, verbal-noun, strong and weak verb, indefinite pronoun, conditional and subjunctive mood, linkage to remove suffixes and prefixes from the word and produce stem word. For this study text document corpus are prepared by the researcher used 300 text files of Afaraf documents, which collected from different school text books, Samara university modules, Qusebaa Maca magazines and other online and experiment is made by using eight different queries. Data pre-processing techniques of VSM involved for both document indexing and query text. The evaluation conducted on the stemmer shows that the accuracy is 65.65 % with error rate of 4.50% for over-stemming and 29.85% for under-stemming. The information retrieval system registered effective performance of 0.785 precision and 0.233 recall. It has been witnessed that the challenging task in developing a full-fledged Afaraf text retrieval system is handling morpholoical word variations. The performance of the system may increase if the performance of the stemming algorithm is improved and if standard test corpus is used.Item Development of a web-based Information Management System for Communicable Disease Surveillance: for Dire Dawa regional health bureau.(Addis Ababa University, 2015-06) Shiferaw, Seble; Abebe, Ermias (PhD)Background: Surveillance with the context of health is the ongoing systematic collection, analysis, and interpretation of outcome specific data for use in planning, implementing and evaluating public health policies and practices. Disease surveillance is considered as a corner stone of any disease prevention, eradication, elimination and control program. Surveillance data are crucial for monitoring the health status of the population, detecting diseases and triggering action to prevent further illness, and to contain public health problems. Currently, all health care organizations in Dire Dawa that collect surveillance data use manual system for reporting to regional health bureau. The regional health bureau uses excel spreadsheet to store and analyze the data. However, there is lack of organized, efficient surveillance system to detect and predict disease outbreaks. Objective: The general objective of this project was to develop a web-based information management system for communicable disease surveillance for Dire Dawa Regional Health Bureau Methodology: This study used qualitative study design for assessing the existing surveillance system. Interview and document analysis were used as a main tool to capture the business system requirement along with observation. A phased based development, Rapid Application Development (RAD) methodology of object oriented approach was applied to design the system. Unified modeling language (UML) development technique was applied in the process of requirements capture, model organization business system and design. Result: By using RAD approach of object oriented methodology different system requirements were identified. Based on that requirement, the designed information system incorporated: weekly case registration, outbreak notification and report generation. Conclusion and Recommendation: Different problems were found in the existing system. There was lack of organized outbreak information and early detection for decision making. Appropriate and on time information was a critical component in all health organization. As a result of this proper and standardized way of information flow can improve communications among the business organizations and different stakeholders.Item Exploring the Prevalence of Diarrheal Disease Using Data Mining Technology (A Case of Tikur Anbessa Hospital)(Addis Ababa University, 2011-06) Endalew, Muluneh; Abebe, Ermias (PhD); Seme, Assefa( PhD)The amount of health related data available to healthcare providing organizations for various diseases is being massive and ongoing to collect from time to time. As a result, huge amount of data is being stored in the health care organizations and facilities. Diarrheal disease is one of those which is being the causes of morbidity and mortality for many children especially under the age of five and from which large amount of data is being collected in both Rural and Urban health facilities of Ethiopia. This data represents a useful resource for making a wide variety of real-time decisions and determinations, from the quality of care delivered to trends in treatment modalities and staffing issues. The problem is to be able to handle this huge amount of data and information in such a way that they can identify what is important and be able to extract it from the accumulated data. It is too complex and voluminous to be processed and analyzed by traditional methods. Now a days, data mining technology is being used as a tool that provides the techniques to transform these mounds of data into useful information which in turn enables to derive knowledge for decision making. A number of data mining techniques and tools are available to perform this task. The researcher considered selective techniques and tools which were used to explore the prevalence of diarrheal disease and develop classification and prediction models. Thus, the purpose of this study is to investigate the potential applicability of data mining techniques in exploring the prevalence of diarrheal disease using the data collected from the diarrheal disease control and training center of African sub Region II in Tikur Anbessa Hospital. Patients’ records with age of five years (60 months) and under are included in the study. Two machine learning algorithms from WEKA software such as J48 Decision Trees(DT) and Naïve Bayes(NB) classifiers are adopted to classify diarrheal disease records on the basis of the values of attributes ‘Treatment’ and ‘Type of Diarrhea’. Initially, a total dataset of 5,572 records with 9 attributes were collected for the study. However, the size of class labels for the selected target classes was not balancedand number of records were resample using ‘SMOTE (Synthetic Minority Oversampling TEchnique) from Weka preprocess package. After this process, the number of records used for model building was increased to 13,710 and 16, 460, for ‘Treatment’ and ‘Type of diarrhea’ target classes respectively. This was done in order to decrease biasness or preconception of classifiers in model building process. Results of the experiments have shown that J48 DT classifier has better classification and accuracy performance as compared to NB classifier. Two consecutive models selected in evaluation performance of these classifiers depicted that J48 DT and NB classified ‘treatment modalities’ and ‘diarrheal types’ with the accuracy of 88.3%, 79.54%, 85.64% and 73.94% respectively. Overall, this study has proved that data mining techniques are valuable to support and scale up the efficacy of health care services provision process.Item Mining Art Data Set to Predict Cd4 Cells Count the Case of Jimma, Bonga and Aman Hospitals(Addis Ababa University, 2013-06) Tadesse, Misganaw; Abebe, Ermias (PhD); Deyasa, Nigussie (PhD)Background: Recent reports from WHO and UNAIDS indicate that the number of people using ART are increasing from time to time. This number is dramatically increasing in sub Saharan African countries including Ethiopia. According to the report of WHO and UNAIDS, as of the end of 2011, over 8 million people had access to ART in low and middle-income countries. Objective: The purpose of this study is to apply data mining techniques on ART records of patients maintained in Jimma, Bonga and Aman Hospitals ART database to build a model capable of predicting CD4 cells count of patients after six, twelve and eighteen months of treatment. Methodology: The overall activity of this thesis is guided by a Hybrid-DM model which is a six step knowledge discovery process model. The study has used 7,252 instances, ten predicting and three outcome variables to run the experiments. Due to the nature of the problem and attributes contained in the dataset, classification mining task is selected to build the classifier models. The mining algorithms; J48, PART, SMO and MLP are used in all experiments due to their popularity in recent related works. In addition to base classifiers, due to the imbalanced nature of classes in each of the three outcome variables, a boosting algorithm (AdaBoostM1) is used to boost the classifiers predictive performance. Ten-fold cross validation technique is used to train and test the classifier models. Performance of the models is compared using accuracy, TPR, FPR, mean absolute error, F-measure, and the area under the ROC curve. Results: The boosting algorithm has given the base classifier a better predictive accuracy with the PART unprunned decision tree yielding a better model of the sixth and twelfth month CD4 count, and the pruned PART decision tree performed better for the eighteenth month CD4 count. The joined rules of the three models indicated that, baseline CD4 count, drug-regimen, age, family planning usage status, WHO clinical stage, and functional status of a patient are the most determinant attributes used to predict CD4 counts. Conclusion: A promising result is observed in applying data mining techniques to build CD4 count predictive model using socio-demographic, clinical and biological features. Future works can be done both on validating the results using clinical trials and also by doing the same study changing the source data or knowledge discovery techniques used in this work.Item Phrasal Translation for Amharic English Cross Language Information Retrieval (Clir)(Addis Ababa University, 2010-06) Tesfaye, Fasika; Abebe, Ermias (PhD)Amharic is a language most widely used in Ethiopia and serve as the official working language of the Federal Democratic Republic of Ethiopia. Despite this fact, English serves as medium of instruction and communication in academic environment, working language in some governmental and nongovernmental organizations in Ethiopia. This fact showed that there is a language barrier between what most peoples of Ethiopia are familiar with and expected to use in their working and academic environment. Hence, experimenting on the applicability of a cross language information retrieval system for Amharic-English which can break the language barrier is important. This research is mainly conducted to break the language barrier that Amharic speaking users face in obtaining and utilizing documents available in English. The experimentation conduct is employed a corpus based approach which make use of phrasal query translation. This approach requires accessibility of a large volume of parallel documents prepared in Amharic and English. News article were used to conduct this research. The performance of the system was measured by average precision and recall. The result of the experimentation is recall value of 0.248 for translated Amharic queries, 0.463 for Amharic queries 0.436 for the baseline English queries. This showed that the result of the translated queries was low compared to the baseline queries. The performance of such system is highly dependent on the phrase translation system. Hence coming up with a good translation model will have a paramount impact on the performance of the system. Therefore, with the use of adequately large and cleaned parallel Amharic-Englishcorpus, it is possible to develop a phrasal query translation for Amharic English a cross language information retrieval. Key words: phrasal query translation, Cross Language Information Retrieval, phrase alignmentItem Privacy and Confidentiality Issues of AnEmr Application: Health Professionals, Health Managers and Patients’ Perception in Zewditu and Ras Desta Damtew Hospitals(Addis Ababa University, 2013-06) Tadesse, Mamush; Mitike, Getnet (PhD); Abebe, Ermias (PhD)Introduction Establishing a nationwide Electronic Health Record system has become a primary objective for many countries around the world in order to improve the quality of healthcare while at the same time decreasing its cost. However, implementation of EHR systems is being hindered by several obstacles, among which are concerns about data privacy and trustworthiness. With the introduction of e-health, concerns of right to data privacy became a primary concern for the patients and health professionals. Objective The objective of this study is to explore the perception of health professionals, managers and patients’ towards privacy and confidentiality issues of Electronic Medical Records in Zewditu and Ras Desta Damtew memorial hospitals. Method A hospital based cross-sectional, quantitative study was conducted among 420 health professionals and patients to assess health care professionals, health managers and patients attitude towards privacy issues of EMR. The sample size was calculated using single population proportion formula. The data was collected through standardized questionnaire. Respondents were assessed by socio-demographic, knowledge, attitude and practice variables. Result Among users of EMR, 229 (79.8 %) are trained to use the software by NGO, Government, and by themselves or self trained. About 66 (28.8%) replied the training prepared them fully to keep patient privacy, 120 (52.2%) mostly prepared, 41(17.9%) somewhat prepared and the rest 2 (0.7%) said not at all. Forty eight (21%) of the respondents have no individual Log in Id and passwords. They have no other option to enter their patients data other than sharing others passwords. Nineteen (16.5%) of patients even don’t know that their health data will go into computerized method. In this study, patients who are relatively literate (assessed by their educational status) showed concerns of privacy issues but illiterate patients didn’t know risks and benefits of computerized medical record. Almost all patients want to be asked before their health data is taken for research or other purpose. Overall managers, health professionals and patients are comfortable by the existing system except privacy concerns.Conclusion and Recommendation In this study, all respondents believe that EMR system is likely to increase the quality of care. At the same time, respondents have significant concerns about the privacy of their medical records. There is no means to control (audit trial) over who can access patient data among employees. Most respondents are obliged to share passwords because they have no individual Log in Id and passwords. This shows the presence of some gaps not to ensure privacy of patient information. Even though respondents are comfortable by the EMR system, privacy concerns are still there. Therefore, government and other responsible bodies should implement and enforce strategies to strengthen privacy. Health care providers should inform to their patients how their health data is being stored in order to increase their awareness, avoid confusion and build public trustItem Syllable-Based Text-To- Speech Synthesis (Tts) for Amharic(Addis Ababa University, 2013-06) Wordofa, Mulualem; Abebe, Ermias (PhD)We have experienced an exponential increase in the electronic Amharic text information inside and outside of the organization. This accumulation of information is challenging for archival and searching of information. Due to that, an information retrieval (IR) system for Amharic language become indispensible and it allows the user to retrieve relevant documents that satisfies information need of users. Some Amharic IR systems were developed in the last couples of decade, however, the performance measure of the systems were not adorable. It happened because of different reasons but the major one was not properly address semantic natures of the language. Moreover, there was no any attempt to make retrieval effective through document clustering. In order to solve these issues, integrating of semantic indexing of documents and document clustering techniques with generic IR system will improve the retrieval performance. Methods: In this research semantic indexing and document clustering of Amharic IR system is developed. It comprises three basic components indexing, clustering and searching. The system comprises all processes exist in generic IR plus to that C-value technique multi word term extraction, k-means algorithms document clustering, cluster base searching strategy used. Conclusion: The system tested using tagged Amharic news documents size of 650Kb and it registered F-measure of 66% accuracy. It is by far good compared with the latest work (Amanuel[31] work). Nevertheless, the performance of the system is greatly affected by synonyms and polysemous, incorrect clustering, cluster representative problems, Amharic knowledge base. Keywords: Information Retrieval, Semantic indexing, Document Clustering, AmharicItem Web- Based Medical Equipment Information Management System, The Case of Dire Dawa Administration Health Bureau(Addis Ababa University, 2015-06) Mohammed, Eptisam; Abebe, Ermias (PhD); Deyessa, Negusse (PhD)Background: In delivering the health care service, health professionals use different kinds of medical equipment to provide quality health care service. Medical equipment is any instrument, apparatus, machine, appliance, implant, in vitro reagent or calibrator, software, material or other similar or related article which does not achieve its primary intended action in or on the human body by pharmacological, immunological or metabolic means. Medical equipment information management system is used to automate the documentation of all activities relating to medical equipment. Objective: The objective of this project is to design web- based information system for medical equipment management taking the Dire Dawa administration health bureau as a case. Methodology: This project is conducted in Dire Dawa administration health bureau for the biomedical case team. This project follows the design science methodology. Object Oriented analysis and design methodology is used for requirement analysis and design. Purposive sampling was used to find the study area. In depth interview and document review were done to analyze the existing situation. UML techniques were used to model the analysis and design of the new proposed system. The investigator uses HTML, PHP and MySQL to design system prototype. Discussion Of Result: The use case diagram identifies all the process and system boundary of the proposed system. Eleven processes were identified with their corresponding actors. Fourteen objects of the system were identified and modeled using the class diagram. The flow and sequence of the processes were presented using the sequence diagrams. User prototype was modeled for the system usability testing. Conclusion: This project does not show the final product of the system, rather provides system prototype for further continuous evaluation and development along with the user feedback input. In addition to the system prototype, the project identifies the points need to be improved and the areas need further investigation in the existing system.