Health Informatics
Permanent URI for this collection
Browse
Browsing Health Informatics by Issue Date
Now showing 1 - 20 of 232
Results Per Page
Sort Options
Item Application of Multilayer Feed Forward Artificial Neural Network Perceptron in Prediction of Court Case’s Time Span: The Case of Federal Supreme Courts’(Addis Ababa University, 2009-01) Mesfin, Eskinder; Shiferaw, Yehenew (Associate Professor)This research examines and analyzes the use and application of neural networks as a predictive tool. The research was undergone with the assumption to give the Federal Supreme courts in advance estimation of the court case’s time span. The significance of the research could possibly benefit a plaintiff and defendants to know their case time length in prior as well the federal courts to perform court room monitoring, ensuring transparency and work efficiency. A model to address these needs was constructed using a feed forward multilayer neural network perceptron having 9 input neurons to the network and one hidden layer with 20 neurons and finally having a single output neuron, which is the predicted time of the cases in months using MATLAB 7.0 neural network tool box. A selected model was trained with training and validation datasets[67% of the whole datasets], finally tested with the test set reserved for these purpose[33% of the datasets] and a total of more than 33,000 record set was used in building the model. Based on the performance function, the selected model shows a good performance range of Mean Square error [MSE] which is the difference between the target output and the network output was minimized to fit to the range offering a value of 0.0033 with 94.44% of the error rate was between +0.2 normalized months. This is the good indication that the developed model could be a reliable predictive model for court cases time span especially for criminal, civil and labor court cases with the assumption that the external factor that affect the court case time span prediction are constant and stable. ix Finally when the network is trained with same court case types, the network has show high predictive capability for criminal cases with 95.65% of the data sets residual error minimized between + 0.005, 89.54% for civil cases and 91.55% for labor cases. This is the good indication that the developed predictive model can satisfactorily be an alternate choice for predicting court case time span especially court cases related to criminal cases.Item Bayesian Network for Modeling Determinant Factors Influencing Offenders to Commit Crime (The Case of Addis Ababa Police Commission)(Addis Ababa University, 2009-01) Abrar, Mohammed; Bekele, Rahel (PhD)The identification of causes and phenomena associated with crime is one of the most popular goals in criminology, especially in view of its practical value and the belief that such identifications are useful when seeking to correct or control criminal behavior. The utility of discovering causes must, however, be qualified. Understanding and processing of offenders’ records is one method to learn about both crime and the individuals who involve in misdeeds so that police can take crime prevention measures accordingly. Though data on criminals are continuously being gathered, they are not effectively being utilized for extracting patterns that can be used for effective management of crimes. This is mainly due to the inadequacy of the human brain to search for complex and multifactor dependencies in data and the lack of objectiveness in such analysis demanded a computerized approach. Developments in the information and communication technologies have made it possible for organizations to collect, store and manipulate massive amount of data. One such development is Bayesian Network. In this study, the main objective of the research is to develop a predictive model for factors that constitute higher crime trends in Addis Ababa which makes use of Bayesian Network modeling techniques. For this purpose, published literatures in related areas have been studied together with the review of different Bayesian Network modeling approaches. Different tools and techniques supporting such task were examined by taking into consideration their application to the problem domain. In addition, an experiment is conducted to explore the potential of Bayesian IV network in modeling factors that constitute higher crime trend using personal identification record of criminals. For the purpose of the experimentation 1572 criminal records were collected from the Addis Ababa Police Commission. The records were manually and automatically further preprocessed to make them compatible with software used. Important attributes that are considered relevant for the constructing predictive model for higher crime trends were selected. After preprocessing the data, a learning classifier is used to learn from the training data and use this classifier to classify new data. A model is constructed for the best learned model from data. Based on the experimental data, a Bayesian performance prediction model was developed where 73.25 % prediction accuracy was first observed. Further experiments and modification of the prediction model increased the level of prediction accuracy to 75.78 %. Finally, Three Phase Dependency Analysis in particular and Bayesian network in general is found applicable for modeling determinant factors for higher crime trends.Item Automatic Stemming For Amharic Text: An Experiment Using Successor Variety Approach(Addis Ababa University, 2009-01) Mezemir, Genet; Abebe, Ermias (PhD)The extensive use of the World Wide Web and the increasing digital availability of information and documents accelerated the demand for technologies and tools for an online data retrieval and extraction application. The natural language research, with the aim of quick and reliable online information searching and access, is one major component of the current advanced information technology development. In this research, an indexing system was developed and programmed by using the Successor Variety Stemming Algorithm to find stems for Amharic words. The research has set out to discover whether the Successor Variety Stemming Algorithm technique with the peak and plateau, entropy and complete word methods can be used for the Amharic language or what the limitation would be. In addition, the peak and plateau method compared with the entropy and the complete words method. Stemming is typically used in the hope of improving the accuracy of the search reducing the size of the index. A corpus of 6270 words was obtained form the Ethiopian News Agency (ENA) and Walta Information Center and used to train and test the methods. The experiment result showed that, the peak and plateau method had a performance of 71.8% level of accuracy, but the performance of the entropy and complete word methods are 63.95% and 57.99% level of accuracy respectively. Based on the observation made from the experimentation result, the successor variety algorithm with the peak and plateau method had a better performance than successor variety algorithm with the entropy method.Item Part-Of-Speech Tagging For Afaan Oromo Language(Addis Ababa University, 2009-01) Mamo, Getachew; Meshesha, Million (PhD)Most natural language processing systems use part-of-speech (POS) tagger as a separate module in their architecture. Specially, it is very significant for developing parser, machine translator, speech recognizer and search engines. Tagging is a process of labeling part-of-speech tags to words of a text such that contextual information can be obtained from word labels. The main aim of this study is to develop part-of-speech tagger for Afaan Oromo language. After reviewing literatures on Afaan Oromo grammars and identify tagset and word categories, the study adopts Hidden Markov Model (HMM) approach and has implemented unigram and bigram models of Viterbi algorithm. HMM is a statistical approach which is used in this study for part-of–speech tagging for Afaan Oromo words in a given corpus. Unigram model is used to understand word ambiguity in the language, while bigram model is used to undertake contextual analysis of words. For training and testing purpose 159 sentences (with a total of 1621 words) that are manually annotated sample corpus are used. The corpus is collected from different public Afaan Oromo newspapers and bulletins to make the sample corpus balanced. A database of lexical probabilities (LexProb) and transitional probabilities (TransProb) are developed from this annotated corpus. These two probabilities are from which the tagger learn and tag sequence of words in a sentence.Java programming language is used to develop the tagger prototype based on the Viterbi algorithm with unigram and bigram models. It is also used to compute both lexical probabilities and transitional probabilities. The performance of the prototype, Afaan Oromo tagger is tested using ten fold cross validation mechanism. The result shows that in both unigram and bigram models 87.58% and 91.97% accuracy is obtained, respectively. Based on experimental analysis, concluding remarks and recommendations are forwarded.Item Bayesian Approach for Analysis of Road Traffic Accident (The Case of Addis Ababa(Addis Ababa University, 2009-01) Tabor, Alemayehu; Bekele, Rahel (PhD)Road traffic accidents are among the top leading causes of deaths and various levels of injuries in the world. Ethiopian is one of the developing countries where the situation is becoming worse and worse form time to time. The country is experiencing highest rate of such accident result in fatalities and a high economic loss. Addis Ababa, the capital city of Ethiopia, accounts for approximately more than 21% of the fatal accidents, 42% of the injury accident and 65% of the total accidents reported in the whole country. This thesis reports the study carried out to develop accident predictive models based on the data collected on road accident in Addis Ababa. The BN power predictor and constructor were used for prediction and model construction purposes respectively. As a result relating or finding interrelatedness between the road and traffic flow explanatory variables and building a significant accident predictive models was possible. In doing so the potential applicability of Bayesian network to help traffic accident data analysis in decision-making process was explored. In the thesis, the process of building a model using Bayesian network tools and techniques from historical road accident record data is explained. Different tools and techniques are also used for the purpose of data analysis. The methodology adopted consisted of basic steps of data collection in which all the records are selected and extracted from Addis Ababa traffic office; data preparation which includes tasks such as data transformation, deriving of new attributes, and handling of missing values, and finally model building and validation using the selected tools and techniques. In the first experiment, a best learned model that can classify accidents well with a better accuracy as serious, crash or property damage was selected and evaluated. The second experiment was also conducted after the necessary input of the domain experts is added. Experiment results reveal that the model built with mentioned techniques and tools are very much helpful in identifying the potential contributors or causes of this ever-growing challenge of the road transport in Addis Ababa and their interrelatedness. The whole research process can be a good input for further research works.Item Applicability of Data Mining Techniques to Support Voluntary Counseling and Testing (VCT) for HIV: The Case of Center for Disease Control and Prevention (CDC)(Addis Ababa University, 2009-01) Asmare, Biru; Abebe, Ermias (PhD)Data mining is emerging as an important tool in many areas of research and industry. Companies and organizations are increasingly interested in applying data mining tools to increase the value added by their data collections systems. Nowhere is this potential more important than in the healthcare industry. As medical records systems become more standardized and commonplace, data quantity increases with much of it going unanalyzed. Data mining can begin to leverage some of this data into tools that help health organizations to organize data and make decisions. Data related to HIV/AIDS are available in VCT centers. A major objective of this thesis is to evaluate the potential applicability of data mining techniques in VCT, with the aim of developing a model that could help make informed decisions. Using the dataset collected from OSSA, which is supported by CDC, and CRISP-DM as a knowledge discovery process model findings of the research are presented using graphs and tabular formats For the clustering task the K-means and EM algorithms were tested using WEKA. Cluster generated by EM were appropriate for the problem at hand in generating similar group. According to the results of these experiments it was possible to see similar groups from VCT clients. The gender, martial status, and HIV test result, and education has shown patterns. For the classification task, decision tree (J48 and Random tree) and neural network (ANN) classifier are evaluated .Although ANN shows better accuracy than decision tree classifier, the decision tree (J48) is appropriate for the dataset at hand and is used to build the classification model. Finally, cluster-derived classification models are tested for their cross-validation accuracy and compared with non cluster generated classification model. The outcomes of this research will serve users in the domain area, decision makers and planners of HIV intervention program like CDC and MOH.Item Possible Application of Data Mining Technology in Supporting Term Loan Risk Assessment: The Case if United Bank S.C.(Addis Ababa University, 2009-01) Tadesse, Samson; VNV, Manoj (Professor)A Commercial Bank is a financial intermediary that holds deposits for individuals and businesses in the form of checking and savings accounts and certificates of deposit of varying maturities while it issues loans in the form of personal and business as well as mortgages. It arises due to a debtor's non-payment of a loan or other line of credit. In order to control and manage the risk, banks normally have discipline called risk management. Hence it is very important to develop and implement an effective technology that can support risk management. This research focused on the application of data mining techniques in supporting loan risk assessment taking as case study United Bank Share Company. It used two data mining techniques namely, decision tree and neural network. Different decision tree models using j48 algorithm were constructed during the experiments and among them a tree with overall accuracy of 95.65% with conceivable rule was selected. The important attributes that were identified by the selected decision tree were: Networking capital, Current Ratio, Total Asset, TL/TA, Current Liability, Collateral Value, Years in; Business, Number of prior term loans settled, Performance of term PriorLoans, Collateral Type, Credit Relationship with other bank, Trade Sector, Performance in; other types of loan ;and Current Asset. Based on the above selected attributes different types of neural network models with multilayer perceptron algorithm were constructed and a model that maximizes the accuracy in predicting poor payment performance was selected with over all accuracy of 92.83%.When evaluation was done, the overall accuracy of decision tree found better than the neural network even if further research is needed. In addition the result of decision tree is more interpretable than neural network. In general the result showed the possible application of data mining in loan risk assessment term loan.Item Application of Data Mining Technology to Support the Prioritization of Dangerous Crash Locations the Case of Addis Ababa Traffic Office(Addis Ababa University, 2009-01) Kiflu, Haleluya; Bekele, Rahel (PhD)The development of automotive industry, the slowly improvement of the roadways and the behavior of the traffic participants increased the number of the road accidents. Traffic accident results in loss of life, human injury and financial prejudices. Road Traffic Safety which is currently one of the highest priorities may be affected by a number of factors. One important group of bottlenecks in traffic safety are dangerous accident locations. Addis Ababa is a city where the number of traffic accident is increasing from time to time. Identification of high crash locations in the city will either protect the accident occurrences or minimize the rate of damage to be caused. This paper reports on the findings of a research that had the objective to prioritize high crash locations and predict exposure of the society on different crash locations. The study used data obtained from the Addis Ababa Traffic Office. In order to prioritize high crash locations different data mining tools and techniques were used. The data mining process in this research is divided into two major phases. During the first phase data was prepared and formatted into the appropriate format for the respective data mining software to be used (Weka 3.5.8). The second phase contains model building for prioritization using decision tree classification. In the classification phase J4.8 algorithm were employed to generate rules. Traffic accident locations were prioritized based on their degree and number of fatality occurrence. The patterns obtained from the J-48 algorithm separated these locations as: death, severe injury, and light injury.The outcome of the study is highly useful for the Traffic police office on developing traffic management system; for the society, drivers and pedestrians, on pre-informing the accident occurrences on those black spots. It also provides valuable information for making decisions effectively for road safety investment projects.Item Uncertainity Management Technique to Support Biological Modeling for Conservation of Priority Tree Species(Addis Ababa University, 2009-03) Getachew, Behailu; Bekele, Rahel (PhD)Bayesian belief networks (BBNs) are useful tools for modeling biological predictions and aiding species conservation and managing uncertainty in decision-making. This paper provides practical indications for predicting, building, testing, and eliciting BBNs. Primary steps in this process include preparing data for experiment and predicting of the hypothesized “causal(dependency) relationship or conditional independence” of major biological factors affecting the target tree species or biological outcome of interest. A total of 1200 cases and 9 attributes were used for BN model prediction with 10-fold cross validation and building BBN model before elicitation process; and reinforcing the model after experts’ opinion; testing and visualizing the model with instance examples to see the conditional probabilities of the predictive inference thereby evaluating the final application model have been conducted respectively. To this end, the average prediction accuracy for the BN model is 75.76%, and this is a promising indication for the domain experts to make decision in their future endeavors. The paper also shows that the Bayesian network classifier has a potential to be used as a tool for prediction of biological modeling to forward about conservation actions in the field of forestry. In general, the whole research process can be a good input for further in-depth study and thus, making a good pragmatic analysis in the real world situations.Item Automatic Classification of Afaan Oromo News Text: The Case of Radio Fana(Addis Ababa University, 2009-03) Diriba, Abera; Ejigu, Dejene (PhD)The vast growth of information and communication technology resulted in a huge volume of information very large bulk of which is stored as unstructured text. The presence of so much text in electronic form is a challenge to natural language processing. As the volume of electronic information increases, there is growing interest in developing tools to help people better find, filter, and manage these resources. Arguably, the only way for humans to cope with the information explosion is to exploit computational techniques that can sift through huge bodies of text. Currently news agencies in Ethiopia in which large amount of news from all the available sources are processed every day is implementing a manual classification system to categorize news items in their daily activities despite the fact, they are using computerized system to store and edit news items. Radio Fana is the one among these agencies. The objective of this research is to develop and adopt processing tools for Afaan Oromo text classification and investigate the application of machine learning techniques for automatic classification of Afaan Oromo news items. The data source for this research is the Afaan Oromo news items obtained from Radio Fana Share Company. In this research, tools for pre-processing Afaan Oromo news items such as tokenization, removal of extraneous characters, removal of stop-words and removal of affixes from the words are prepared to facilitate the experimentation process for the automatic classifiers. Among the automatic classifiers which are applicable on high dimensional data, four of them; Sequential Minimal Optimization (SMO) algorithm from Support Vector Machines, NaiveBayesMultiNominal (NBM) from Bayesian Classifiers, J48 algorithm from the Decision trees and K-Nearest Neighbor (KNN) from the Lazy Learners have been experimented on the final data. The data, the pre-processed Afaan Oromo news items, is organized in to categories of four classes, seven classes and all (eleven) classes for the experimentation purpose and the experimentation uses 10-fold stratified cross validation for training and test data. For the SMO and NBM classifiers, which have best accuracy over the others, the detailed accuracy by class together with the confusion matrix of the experimentation is shown, whereas for J48 and KNN classifiers the average accuracy on each category is presented in this thesis. The result of the experimentation is encouraging, the best result (accuracy) from both the SMO and BayesMultiNominal classifiers, 95.82% and 96.58% respectively, is obtained when the number of instance documents is approximately equal in the classes, and it was for the four categories of news items. The lower accuracy seen is for J48 on category of 7 classes, 79.69% and on category of 11 classes, 82.05%. SMO tends to have better accuracy over the other classifiers for the Afaan Oromo news items classification. In all the classifiers, unevenly distributions of instances of documents in classes tend to decrease the accuracy of the classifiers when taken together, i.e, experimentation on all of the eleven categories taken together; while an increase in number of instances in a given class tends to increase the accuracy for the class. Accordingly, from the result of this research, it was observed that Machine Learning approach can be applied to Afaan Oromo news items classification task, nevertheless, additional works are recommended in order to come up with best result. Key Words: Natural language processing, machine learning, text classification, document indexing, Classifier algorithms, Afaan Oromo newsItem Application of Knowledge Based System for Woody Plant Species Identification(Addis Ababa University, 2009-04) Alemu, Dejen; Meshesha, Million (PhD)Finding the correct identity of trees is the beginning of any inventory and management activities as well as any studies regarding the tree species. Identification of plant species in Ethiopia is conducted only in the National Herbarium. At present, the centre is not supported by information systems, which makes the identification process and dissemination of information inefficient and difficult. The need of KBS for technical information transfer and efficacy in dendrology can be identified by recognizing the problems in using the current system for technical information transfer and by proving that KBS can help to overcome the problems addressed, and are feasible to be developed. This study attempts to design prototype KBS for woody plant species identification. As compared to existing way of identification we come up with new knowledge/rules with minimum features that registers comparable performance. By using this system, users can get access to expert knowledge and will be able to identify woody plant species like taxonomists do/judge. Using taxonomic KBS in different forestry research centers, high-paid taxonomists will reduce the costs of scientific research and will allow many researchers to conduct their research more independently (without going to the National Herbarium for identification). This research is conducted in a step-wise manner. After problem selection, knowledge acquisition process is conducted. In this process, a key informant interview is held with experts (two taxonomists and one researcher). In addition to the key informant interview,x manuals and books used in woody plant species identification are also consulted. The knowledge extracted from the experts’ and relevant documents that uses to solve a problem is modeled in hierarchical or laddering technique. Based on the final knowledge modeled in decision laddering, domain knowledge is represented using production rules in prolog to construct the knowledge base. The system is developed to load the knowledge base and starts to infer from the knowledge base based on the users input/ facts. The prolog built in backward inferring mechanism is used for the identification of the species. The user interface is designed in vb.net. Finally, the system is tested and evaluated by the users. The result shows that, the system identifies the woody plant species correctly and can be applicable in woody plant species identification. Key words: knowledge based system, prolog, tree species identification, knowledge acquisition, knowledge modeling, and KBS evaluation.Item Applying Data Mining to Identify Determinant Factors of Drivers and Vehicles in Support of Reducing and Controlling Road Traffic Accident: In the Case of Addis Ababa City(Addis Ababa University, 2009-04) Mossie, Getnet; V.N.V, Manoj (PhD)Road transport plays vital roles in the effort of enriching the economic growth of the society, especially in developing countries. An efficient transport system is decisive factor to promote socio-economic development of Ethiopia. Although the transport sector is important in facilitating economic growth and development, a very negative phenomenon, namely road traffic accident, has increased thereby highly threatening the safety of every traveler in Ethiopia, in particular at Addis Ababa city. Traditionally, simple manual and statistical techniques are used for traffic accident analysis at Addis Ababa traffic control and investigation office. These methods are inefficient and impractical as the volume of road traffic accident data increases. Thus this research work will discuss how to investigate the potential application of data mining tool and techniques to develop models that can support to reduce and control road traffic accident by identifying and predicting the major drivers and vehicles determinant risk factors (attributes) that causes road traffic accident. The methodology used for this research work had three basic steps namely, data collection, data preprocessing and model building and evaluating. The dataset used for this research work was collected from Addis Ababa traffic control and investigation office, 6107 road traffic accident records. Since the collected dataset was not suitable as it is for experiment, data preprocessing activities were done. In data preprocessing steps data cleaning and data reduction were undertaken. To build models decision tree and rule induction techniques were employed using Weka, version 3-5-8, data mining tool.In the experiment section models were built and rules also generated with decision tree (using J48 algorithm) and rule induction (using PART algorithm) techniques. The experiment of this research proves that the performance of J48 algorithm is slightly better than PART algorithm. In this research the variables LicenseGrade, VehicleServiceyear, Typesofvehicles and Experience were identified as the most important variables to predict accident severity pattern.In this research work, the researcher has proved that the road traffic accident database could be successfully mined to identify determinant risk factors of drivers and vehicles that cause accidentItem Amharic Document Image Retrieval without Explicit Recognition(Addis Ababa University, 2009-06) Worku, Mesfin; Meshesha, Million (PhD)Retrieval of the stored information is a key issue. Especially image retrieval needs an emphasis, because the nature of the data is complex and difficult to retrieve. There are many problems to be studied in the area of image retrieval. From these, Document Image Retrieval is one of the issues that have to be given attention. Document retrieval can use either a textual-based retrieval system or an image-based retrieval system. Document image retrieval system can also be done in two ways: recognition-based document image retrieval or document image retrieval without explicit recognition. Currently, little has been done on the Amharic document retrieval systems. The Amharic text retrieval systems which are covered by the researchers considered limited Amharic documents that are available only in hardcopy format. The proposed system incorporates document images and user queries. The document image is preprocessed, segmented at word level and the feature of each word is extracted. Then the textual query is rendered to convert into an image query, preprocessed, segmented and the feature is extracted. The technique used for feature extraction considers the word shape analysis. The extracted feature of the image query is matched with the feature of the document images, at word level using Euclidean and cosine similarity measures. Finally relevant document images are retrieved in ranked order in response to the given query. To verify the validity of the approach proposed, experiment is carried out on 121 scanned Amharic documents that are selected from printed legal documents and news items. The data retrieval effectiveness is measured using retrieval measures such as precision, recall and F-Score. The experimental results confirmed the validity of the model for retrieving relevant document images from the collection of scanned document images.Item Designing Pediatricians Communities of Practice: The Case of Government Hospitals in Addis Ababa(Addis Ababa University, 2009-07) Regassa, Abebe; Bekele, Rahel (PhD)Social media tools are changing the way people communicate, collaborate and interact in the day to day activities of employees of the organization especially in sharing tacit knowledge /or experience. They are used for capturing and sharing knowledge resources including experiences of organizations or individuals so that the dissemination or transfer of knowledge will be facilitated. This research work seeks to design communities of practice as the strategy for acquiring and sharing knowledge using one of the appropriate social media tools for government hospitals in Addis Ababa. A qualitative exploratory case study research design is selected to conduct this research. Structured interview, questionnaire survey and informal discussions were used to collect data that support the analysis. The result of data analysis shows that communities of practice is the best alternative of acquiring and sharing knowledge using electronic media to communicate with peer physicians. And E-mail contact as a necessary and efficient way to share ideas as the means of communication that paediatricians have with other paediatricians in the past.Item Automatic Thesaurus Construction for Amharic Text Retrieval(Addis Ababa University, 2009-07) Mekonnen, Andargachew; Meshesha, Million (PhD)Thesauri have been used for literary composition since their inception in 1852, but nowadays their primary use is for information retrieval. Even they are among the crucial components of retrieval systems which are typically used for enhancing indexing operations and query expansions during searching. Even though Amharic language has been a written language for a couple of centuries and huge volumes of Amharic electronic documents are accumulated, not much has been done towards the development of effective and efficient Amharic retrieval systems. In this research work much effort has been exerted to generate thesaurus automatically for text retrieval in order to help the development of an effective and efficient Amharic retrieval system. The development of the automatic thesaurus generation system is based on the WORDSPACE model. The WORDSPACE model is derived from the inverted file index by applying Random Projection algorithm for dimensionality reduction. Nearest Neighboring clustering algorithm is employed to generate thesaurus automatically from the WORDSPACE model constructed. An encouraging result is obtained in the experimentation of the system on Amharic Bible documents. During experimentation the accuracy of the automatically generated thesaurus is evaluated. The result on a random sample of ten terms shows that the system has accuracy of 58%. To further investigate its applicability for Amharic information retrieval, the thesaurus is integrated to an IR system for query expansion. The retrieval system is tested with and without using thesaurus in order to show the improvement made in retrieval effectiveness. Performance analysis shows that the recall of the system while using thesaurus is superior to not using it. The average recall values are 73.34% and 37.29% after and before using thesaurus for query expansion, respectively. Keywords: Amharic Thesaurus, WORDSPACE, Information Retrieval (IR)Item Designing A Knowledgebase System For Vat Administration(Addis Ababa University, 2009-09) Kidane, Desalegn; Meshesa, Milion (PhD)Value Added Tax (VAT) has become a major tax instrument worldwide. In Ethiopia, it is a new tax system introduced since Jan. 2003 and an essential component of the tax reform programs which are currently undertaken. The existing VAT systems do not provide a simplified access to and clarification on information of the tax laws, tax payers lack awareness on tax rules and regulations. These problems have a great impact on the practicability of the regulations. This paper examines VAT administration in Ethiopia and identifies key problems including lack of public awareness, efficiency, effectiveness, performance and gaps in the administration of VAT refunding, fraud and invoice. It is worth clarifying the Ethiopian VAT system and the rationales behind for the business owners through easy, effective and accessible systems using the KBS technology. In this research work, a KBS is design in support of VAT administration in order to solve the problems identified specially in the area of VAT refunding. Both tacit and explicit knowledge for the KBS is acquired through interviewing domain experts and document analysis. The knowledge is modeled in a hierarchical tree structure. The knowledge representation is done using a rule based system. For the prototype development Prolog preprogramming language has been used. The performance of the prototype system is evaluated by implementing continual as well as summative evaluation techniques on qualitative basis. The result is encouraging to design a practical a KBS for VAT administration. The finding in terms of accuracy, efficiency and effectiveness is discussed and further research areas are recommendedItem The Application of Decision Tree for Part of Speech (Pos) Tagging For Amharic(Addis Ababa University, 2009-09) Kebede, Gebeyehu; Abebe, Ermias (PhD)Automatic understanding of natural languages requires a set of language processing tools. POS tagger, which assigns the proper parts of speech (like noun, verb, adjective, etc) to words in a sentence, is one of these tools. This study investigates the possibility of applying decision tree based POS tagger for Amharic. The tagger was developed using j48 decision tree classifier algorithm, which is Weka’s implementation of C4.5 algorithm. In the process, a corpus developed by ELRC annotation team was used to get the required data for training and testing the models. The dataset is comprised of 1065 news documents; 210,000 words. A sample of some 800 sentences are selected and used for model development and evaluation. The dataset was preprocessed in line with the requirements of the Weka’s data mining tool. In order to support decision tree classification models, a table that contains the contextual and orthographic information is constructed semi-automatically and used as training and testing dataset. The right and left neighboring words tags for each word are used as contextual information. Moreover, orthographic information about the word like the first and last character, the prefix and suffix, existence of numeric digit within the word and so on are included in the table to provide useful information to the word to be tagged. Performance tests were conducted at various stages using 10-fold cross validation test option. Experimental results show that, only two successive left and right words tag provide useful contextual information; contextual information beyond two doesn’t provide useful information rather noise. In the end, an over all, including ambiguous and unknown words, 84.9% correctness (or accuracy) was obtained using 10-fold cross validation test option. Even though, the accuracy of this study is encouraging further study to improve the accuracy so as to reach at implementation level is recommended.Item Information systems development outsourcing management in Ethiopia: the case of the Ethiopian Telecommunications Corporation.(Addis Ababa University, 2009-10) Atinaf, Muluneh; Beshah, Tibebe (PhD)Information systems/technology outsourcing is considered as an approach for information systems/technology management by chief information officers and organizations. Thus it is given due attention by various organizations and management because of the advantages it brings to them. Despite the advantages it brings, IS/IT outsourcing faces problems/challenges and risks at any point in the process that may even lead to failure before its maturity. However, all theses issues are not studied in the case of Ethiopia. This study thus aims to empirically assess how IS/IT outsourcing is managed in Ethiopia, specifically at the Ethiopian Telecommunications Corporation. A case study approach considering a currently outsourced information systems development project at the corporation is followed so that a better and detailed understanding of the subject can be developed. Furthermore this research tried to identify the problems/challenges that the corporation faced during outsourcing and the possible causes of the problems. The research employed a qualitative research methodology, and hence interview and document analysis were major data collection methodologies used. The data collected is analysed as per research questions and objectives of the study. Finally, from the data analysis it is clearly indicated that the outsourcing organization has failed to meet its schedule, cost, requirements, and customer service objectives. The major causes for these problems were lack of detailed requirements from the beginning, lack of detail in the SLAs, lack of previous experience in software outsourcing management.Item Exploring Trends, Challenges, and Opportunities in the Ethiopian Tourism Information System(Addis Ababa University, 2009-10) Hailu, Digajara; Beshah, Tibebe (PhD)This research was conducted to explore the trends, challenges and opportunities of Tourism Information Systems (TIS) in Ethiopia with a special emphasis on tourism websites. Two surveys were conducted: while the first one dealt with tourism institutions, the second dealt with tourists. In the first case questionnaires, mainly, and interviews, to fill gaps, were employed as methods of data collection. In the second survey only questionnaires were used because of the large number of visitors. The data gathered were analyzed using the Statistical Product and Service Solutions (SPSS) and simple descriptive statistics like the mode (frequency) was used to explain the empirical findings. Tables were used for data presentation, based on which comparisons were made. The results of the study reveal that, embarking with the then “Thirteen Months of Sunshine” manual ad sign; the Ethiopian TIS is now experiencing the opportunities presented by modern information technology such as websites. There is a promising awareness about modern information systems among the businesses, but in its infancy. Major challenges identified by the study include: gaps between businesses’ and the tourists’ perceptions about the local tourism websites, missing website components and inability to fully harness the Internet. Yet, there are some opportunities created by the Web for such tourism businesses. The study also provides some recommendations towards TIS for tourism businesses in the nation. Keywords: Ethiopia, Tourism, Website, Information SystemsItem A General Approach for Amharic SPEECH-TO-TEXT Recognition(Addis Ababa University, 2009-10) Melese, Michael; Meshesha, Million (PhD)In this paper, the researcher has tried to investigate the capability of exploring speech recognition technique for converting an Amharic speech to text taking native and non native speaker of the language Amharic. For this, Hidden Markov Model (HMM) is used as a model along with the tool Hidden Markov Modeling Toolkit (HTK) to implement and get the desired result out of the training. In the development process, a total of four hundred sentences are used for the training and hundred data sets which are not included in the training set are used for testing the performance of the speech recognizer. The primary data has been collected from four different ethnic groups that could not speak out Amharic as a mother tongue and one from the mother tongue, with an input of hundred records from each group and secondary data from the previous researcher. Then, the primary data has been labeled, preprocessed, trained and realigned as per the requirement of the HTK for the purpose of training and testing the models. During the experiment process, a lot of challenging issue pointed out which makes the researcher to draw attention in order to confront the problems that suspend the success from attainment to the point of end. As a final point, through all this complication, the existence of the research comes to the end provided the constraint the result obtained is promising and serve as a proof that it is possible to build general speech recognition technique that convert an Amharic SPEECHTO- TEXT using HMM. Once the experiment is completed and result obtained, analysis on the result is forwarded through the justification that are supported by different researcher in addition to the comparison of the result obtained. As a final point, conclusion and recommendation are forwarded for the upcoming research area in the field.