College of Natural and Computational Sciences

Permanent URI for this college

http://etd.aau.edu.et/handle/123456789/24

Browse

Filter results by year or month

Now showing 1 - 20 of 6691

Possible Application of Data Mining Technology in Supporting Credit Risk Assessment: the Case of Nib International Bank S.C.
(Addis Ababa University, 204-07) Shawui, Meretework; Tadesse, Nigussie (PhD)
Financial institutions in a nation playa crucial role in the development of its economy. The banking sector as one type offinancial institution is indisputably the new ji'ontier of economic development in a country. In this respect, banking has to be sound and safe jar its clistomers as well as jar the stability of the currency and economy of a counl1y. One factor that affects the well fimctioning of the banking sector is credit risk. This factor is also a general problem among commercial banks in Ethiopia. In order to deal with high default rates banks in other countries are making use of data mining. The possible application of data mining in the commercial banking sector of Ethiopia has also been tested by the use of neural network techflique. As credit risk is a risk type that bank managers give more emphasis in the loan disbursement process because it is one of the major reasons that cause a bank to fail, the study of the possible application of data mining needed jilrther investigation. To this end, the present study focuses on the application of data mining to support credit risk assessment taking as a case study Nib International Bank S.C.(NIB). In doing so the aim of this research was to assess the potential applicability of decision tree technique to help in the loan disbursement decisionmaking process of banks. The methodology used for this research had three basic steps. These were collecting of data, data preparation, and model building and testing. The required data was selected and extracted ji'01l/ Nib International Bank records. Then, data preparation tasks (such as data tram!ormation, deriving of new fields, and handling of missing variables) were undertaken. Decision tree data mining technique was employed to build and test models. , Several decision tree models were built and testedfor their classification accuracy and the model with encouraging results was taken to generate rules to support credit decision makers and the procedures adopted are described in this document .The peliormance of the developed model is validated using new datasets and its predictive accuracy is also tested. The result shows that the use of decision tree technique produces rules for justifiable credit decision-making and that it is the best technique that needs to be adopted for NIB bank as it presents a means of providing explanation for proposed decisions as compared to neural network techniqlles. A 1/ things considered, the existence of an electronic system to support the credit risk assessment of NIB bank will promote the services of the bank to its customers as well as minimize risk
Develop an Audio Search Engine for Amharic Speech Web Resources
(Addis Ababa University, 10/10/2019) Hassen, Arega; Atnafu, Solomon (PhD)
Most general purpose search engines like Google and Yahoo are designed bearing in mind the English language. As non-resource rich languages have been growing on the web, the number of online non-resource rich speakers is enormously growing. Amharic, which is a morphologically rich language that has strong impact on the effectiveness of information retrieval, is one of the non-resource rich languages with a rapidly growing content on the web in all forma of media like text, speech, and video. With increasing number of online radios, speech based reports and news, retrieving Amharic speech from the web is becoming a challenge that needs attention. As a result, the need to develop speech search engine that handles the specific characteristics of the users’ Amharic language query and retrieves Amharic languages speech web documents becomes more apparent. In this research work, we develop an Audio Search Engine for Amharic speech Web Resources that enables web users for finding the speech information they need in Amharic languages. In doing so, we have enhanced the existing crawler for the Amharic speech web resources, transcribed the Amharic speech, indexed the transcribed speech and developed query preprocessing components for user text based query. As base line tools, We have used open source tools (JSpider, and Datafari) for web document crawling, parsing, indexing, ranking and retrieving and sphinx for speech recognition and transcription. To evaluate the effectiveness of our Amharic speech search engine, precision/recall measures were conducted on the retrieved speech web documents. The experimental results showed that the Amharic speech retrieval engine performed 80% precision on the top 10 results and a recall of 92% of its corresponding retrieval engine. The overall evaluation results of the system are found to be promising.
Mosque Building Detection Using Deep Convolutional Neural Network
(Addis Ababa University, 10/10/2020) Ergete, Samrawit; Belay, Ayalew (PhD)
Object detection is a computer technology related to computer vision and image processing that detects and defines objects such as humans, buildings and cars from images and videos. Object detection is breaking into a wide range of industries, with use cases ranging from personal security to productivity in the workplace. Facial recognition or face detection is one of an object detection examples, which can be utilized as a security measure to let only certain people into a classified area of building. It can also be used within a visual search engine to help consumers find a specific item they’re on the hunt for. In this work we propose detection system start from collecting and preparing data to detecting mosque building by using deep convolutional neural network (DCNN). Mosque building detection is done using Faster RCNN model. Faster RCNN is trained on 1848 dataset collected from different websites and by directly taking pictures and splinted into 90% for training and 10% for testing. Experimental results have proved the efficiency of the proposed technique, where the accuracy of the proposed scheme has achieved mAP of 0.70.
Anomaly Based Peer-to-Peer Botnet Detectionusing Fuzzy-Neuronetwork
(Addis Ababa University, 10/10/2020) Worku, Tewodros; Gizaw, Solomon (PhD)
Peer-to-Peer (P2P) botnets are considered as one of the most significant contributors to various malicious activities on the Internet. The denial of service attacks, spamming, keylogging, click fraud, traffic sniffing, stealing personal user information, for example credit card numbers, and social security numbers, are some of the illegal activities based on botnets. P2P botnets are networks of infected computing devices, called zombies or bots. These bots are remotely controlled and instructed by malicious entities commonly referred to as Botmasters or hackers. In recent years, lots of researchers have proposed a number of P2P botnet detection models, but due to the evolving nature of botnets, there is still a need for new techniques to identify recent botnets. Due to that, we propose a model that is able to distinguish genuine network traffic from malicious one by analyzing the network flow data using Fuzzy-Neuro Network (FNN). The proposed model has the following components: Feature Extractor, Feature Selector, Dataset Constructor, Preprocessor, Classifier and P2P Botnet Detector. The feature extraction component extracts the network traffic-based feature vectors from the network traffic whereas the feature selection component selects vital features based on their information gain value. The next component which is the dataset constructor is used to convert the comma separated value (CSV) file into sets and help us to split the dataset as training (70%) and testing (30%) sets. Then, the major activities in the preprocessing component are data cleaning, data transformation and data reduction. Finally, the FNN classifier is utilized to classify the network traffic into P2P botnet and normal using the botnet detection module. The feasibility of our proposed model has been validated through experiments using network traffic records acquired from two publicly available P2P botnet datasets Bot-IoT and UNSW-NB15. The datasets include both genuine and malicious network traffic. The evaluation result shows the proposed model is effective in detecting P2P botnets. Based on the evaluation results of our classifier, using Bot-IoT dataset, the model scored 100% for all evaluation metrics. Whereas, using the UNSW-NB15 dataset, the model scored highest classification accuracy of 99.9%, precision of 99.9% and recall of 100% with F-measure rate of 99.9%.
PDTB Style Sentence Level Shallow Discourse Parser for Amharic
(Addis Ababa University, 10/10/2020) Arega, Robel; Libsie, Mulugets (PhD)
Research on natural language processing applications (NLP) is a very important topic in our daily life, by enabling computers to understand human languages. Such researches has come a long way in foreign languages like English, Japanese, Chinese, Portuguese and Arabic. NLP applications such as include machine translation, question answering, knowledge extraction and information retrieval are some of the fruits of such researches. Discourse parser is one of the main components that enables the realization of such NLP applications. For foreign languages like English and Arabic, many discourse parsers are developed in different approaches. However, in the case of Amharic, there are no works done, to the best of the researcher’s knowledge, on Amharic discourse parser so far. In this study, a Penn Discourse Tree Bank (PDTB) style sentence level shallow discourse parser for Amharic is developed. We have used machine-learning algorithms to accomplish the subtasks of discourse parsing. The algorithms utilize lexical and positional features of the discourse marker and related words for segmentation and identify associated discourse relation. The parser is tested on test sentences, which are extracted from different sources. Encouraging results are observed from the experiments performed,
Semantic-Aware Amharic Text Classification Using Deep Learning Approach
(Addis Ababa University, 10/10/2020) Moges, Girma; Assabie, Yaregal (PhD)
Now we are at the age of information era, information is stored, extracted, and used in different formats. Text can be an extremely rich source of information, but extracting insights from it can be hard and time-consuming due to its unstructured nature. Text classification is one of the methods used to organize massively available textual information in a meaningful context to maximize the utilization of information. Amharic text classification is done using the classical and traditional machine learning approaches with a limitation of semantic representation and use of high engineered feature extraction. However, the newly emerged deep learning approach and the use of word embedding improves the performance of text classification through extracting features automatically and represent words semantically in sparse vector. Thus, we develop the LSTM model to train our data and to make the classification process. The classification of the Amharic text documents using the LSTM pass through the process of; preprocessing, word-embedding, deep network building, output determination, training the model, and classification. The semantics of document is done using word2vec, to map similar words in to a single vector using neural network architecture. Thus, the vector representations of words are used as the input for the dep network building component. The model is evaluated using accuracy and loss by training, testing, and validation dataset and resulted 92.13 testing accuracy, and 86.71 validation accuracy.
Grid Based Node Deployment Approach Using Homogeneous Wireless Sensor Network
(Addis Ababa University, 10/10/2020) Atnafu, Kebede; Belay, Ayalew (PhD)
Wireless Sensor Network is one of the mechanisms to monitor different Wireless Sensor Network application such as environmental and habitat monitoring. In which cooperatively pass their data through the network to a main location or sink where the data can be observed and analysed. Sensor placement is an important task in WSN applications. The number of sensors and their location will affect the performance, accuracy, and cost of the deployment. One of the important issues in WSNs application is node deployment which decides where the sensor nodes should be placed in order to satisfy the desired requirements like maximize the efficient coverage area ratio and minimize the size of the network and cost. The effectiveness of these networks is determined to a large extent by the coverage provided by the sensor deployment scheme. Determining the required number of sensors to be deployed is a critical decision for wireless sensor networks In this thesis we develop a homogenius sensor node deployment scheme using grid based deployment scheme where all of the sensor nodes have similar processing and hardware capabilities by placing node in Hexagonal scheme, which cover with minimum number of sensor node and reduces the cost of sensor node. The number of sensor node used evaluated by comparing with previous system in which the proposed work used minimum number of sensor node for covering the region than from previous one. The proposed system consider the shape of the monitored land and we integrate PEGASSIS, hierarchical clustering routing protocol, where each sensor node transfer data to neighbourhood node. We used MATLAB for implementation of the proposed system and performance evaluation. From the evaluation result, the proposed homogenius node deployment approach has a minimum number of node, cost effective and better coverage than the existing shape based node deployment approach.
Automatic Soybean Quality Grading Using Image Processing and Supervised Learning Algorithms
(Addis Ababa University, 10/12/2021) Hassen, Muhammed; Assabie, Yaregal (PhD)
Soybean is one of the most important oilseed crops of the world which requires 25 to 30°C temperature for growth and proper modulation. Due to its high protein content and nutritional quality soybean usually used in food preparation, animal feed and industry sector. It is an input for food products like soy milk, for human consumption and as in put for industry for production like paper, plastic and cosmetics. The trading of soybean in Ethiopia is done through Ethiopian Commodity exchange internally as well as for export trading. Determining the quality grade of soybean is crucial in the trading process. It improves the production of quality soybeans and it helps to become competent in the market. This process is done manually in Ethiopian Commodity Exchange which is subjected to different problem: less efficient, inconsistent and vulnerable to subjectivity. As a solution in this thesis we propose an automated quality grading of soybean using image processing techniques and supervised learning algorithms, which is the aim of this thesis. Image acquisition, image pre-processing, image segmentation, predict soybean type and determining the grading are the major steps that are followed. For image preprocessing, methods like median filter to remove noise, modified unsharp masking sharpening technique is used to enhance the quality of acquired soybean image. In image segmentation a modified Otsu‟s threshold segmentation method is used to apply to a color image. Nineteen typical characteristic parameters of samples are extracted as the characteristic soybean, which are 7 morphological, 6 colors and 6 texture features. Three different supervised learning algorithm classifiers are applied and compared: support vector machine algorithm, artificial neural network and convolutional neural network. Experimental results show one dimensional convolutional neural network outperforms the others with accuracy rates of 93.71% on the test datasets collected from Ethiopian Commodity Exchange. We concluded that the CNN is superior to other supervised learning algorithm, and using aggregated features is better than using single type of features.
Optimal Control and Qualitative Analysis Using Epidemiological Model for Lumpy Skin Disease (Lsd)
(Addis Ababa University, 10/15/2017) Alemu, Abebe; Oseloka, Okey (Professer)
Lumpy skin disease (LSD) is an infectious, eruptive, now and then mortal disease of cattle. Generally it is skin disease which is caused by a virus of the family Poxviridae. LSD disease damages cattle's hides, because of this it has economically importance. As a result of seriously a_ected an- imals by the disease there exist losses of weight, as a result of inammation temporary or permanent reduction of milk production, as a consequence of orchitis temporary or permanent infertility or even sterility in bulls, and abortion in approximately 10 % of infected pregnant [10],(Birhanu,H. and Gezahign,A.,[5]. The study was undertaken to investigate outbreaks of lumpy skin disease (LSD), based on the research that had been taken by researchers on this area and model for controlling optimally. We used epidemiological model and Optimal control for the analysis of transmission of disease and the cost of control (vaccination). Mainly this thesis focus on the process for analyz- ing and computationally illustrate for optimally controlling of disease, and its factors.
Automatic Plant Species Identification Using Image Processing Techniques
(Addis Ababa University, 10/2/2018) Tsegaye, Dejene; Assabie, Yaregal (PhD)
Plants are one of the important things that plays a very essential role for all living beings exists on earth. Plants form a fundamental part of life on Earth, providing us with breathable oxygen, food, fuel, medicine and more. Plants also help to regulate the climate, provide habitats and food for insects and other animals. But due to unawareness and environment deterioration, many plants are at the verge of extinction. Understanding of plant behavior and ecology is very important for human being and the entire planet. Plants possess unique features in their leaf that distinguish them from others. Taxonomists use these unique features to classify and identify plant species. However, there is a shortage of such skilled subject matter experts, as well as a limit on financial resources. Several leaf image based plant species identification methods have been proposed to address plant identification problem. However, most methods are inaccurate. Invariant moments that are used for leaf shape features extraction are inadequate. Hu moments are inadequate when leaves from different species have very similar shape. The computation of Zernike moments involve discrete approximation of its continuous integral term which result in loss of information. Hence, it is extremely important to look for an improved method of plant species classification and identification using image processing techniques. In this work a new method based on combined leaf shape and texture features using a class of ensemble method called Random Forest for the classification and identification of plant species has been proposed. Morphological features and Radial Chebyshev moments are extracted from the leaf shape and Gabor filters are extracted from leaf texture. These three features are combined, important features are selected to form a feature set that trained the Random Forest classifier. The Random Forest was trained with 1907 sample leaves of 32 different plant species that are taken form Flavia dataset. The proposed approach is 97% accurate using Random Forest classifier.
Amharic Information Retrieval Using Semantic Vocabulary
(Addis Ababa University, 10/2/2019) Getnet, Berihun; Assabie, Yaregal (PhD)
The increase in large scale data available from different sources and the user’s need access to information retrieval becomes more focusing issue these days. Information retrieval implies seeking relevant documents for the user’s queries. But the way of providing the queries and the system responds relevant results for the user should be improved for better satisfaction. This can be enhanced by expanding the original queries from semantic lexical resources that are constructed either manually or automatically from a text corpus. But, manual construction is tedious and time-consuming when the data set is huge. The way semantic resources are built also affects retrieval performance. Based on formal semantics the meaning is built using symbolic tradition and centered around the inferential properties of languages. It is also possible to automatically construct semantic resources based on the distribution of the word from unstructured data which applies the notion about unsupervised learning that automatically builds semantics from high dimensional vector space. This produces contextual similarity via word’s angular orientation. There have been attempts done to enhance information retrieval by expanding queries from semantic resources for non-Ethiopian languages. In this study, we propose Amharic information retrieval using semantic vocabulary. It isfigured out by considering components including text preprocessing, word-space modeling, semantic word sense clustering, document indexing, and searching. After the Amharic documents are preprocessed the words are vectorized on a multidimensional space using Word2vec based on the notion words surrounding another word can be contextually similar. Based on the word’s angular orientation, the semantic vocabulary is constructed using cosine distance. After Amharic documents are preprocessed it is indexed for later retrieval. Then the user provides the queries and the system expands the original query from the semantic vocabulary. The queries are reformulated and words are searched from indexed data that returns more relevant documents for the user. A prototype of the system is developed and we have tested the performance of the system using Amharic documents collected from Ethiopian public media. The semantic vocabulary based on the word analog prediction using the cosine metric is promising. It is also compared against the semantic thesaurus constructed with the latent semantic analysis and it increases by 17.2% accuracy. Information retrieval using semantic vocabulary based on ranked retrieval increases by 24.3% recall, and using unranked set of retrieval, 10.89% recall improvement was obtained.
Mobile Based Expert System for Diagnosis of Cattle Skin Diseases With Image Processing Techniques
(Addis Ababa University, 10/2/2019) Lake, Bezawit; Getahun, Fekade (PhD)
Cattle population is critical socioeconomic assets in a nation like Ethiopia where the society depends on farming and animal husbandry. However, there is huge loss of livestock population by a disease that undermines the efforts towards achieving food security and poverty reduction. Many expert systems have been developed for the diagnosis of cattle disease. The diagnosis starts by collecting information about symptoms, signs and other related issues. In most of them, this information is obtained from the person using text dialogue. Every person has different ways of expressing the same thing, which results, in the inconsistency of description lead to an incorrect diagnosis. To address this problem, we propose an approach for cattle disease diagnosis by integrating image processing using deep learning with an expert system. The proposed system has an expert system and an image processing component. The symptom identified by naked eyes are represented using image and its category is identified by the image processing component. The image processing component consists training and classification phase. In the training phase images collected from different source are preprocessed and feed to the classification model. The classification model used is a convolutional neural network with three convolutional and two fully connected layers. In the classification phase the trained model is used to classify the input images. The expert system have reasoning, knowledgebase and user interface component. The user interface allows communication between the system and the user. The knowledgebase contains information and facts required for diagnosis. The reasoning component reaches a final diagnosis conclusion based on classification results and other related information. The developed classification model trained on 3990 dataset collected from different sources. To increase the dataset we apply different augmentation techniques. We split the dataset into 90% for training and 10% for testing. The model classifies the input symptom image with 95 % accuracy. The entire system has been evaluated by veterinarians and people having cattle farming, the analysis shows that the system is effective to diagnosis cattle disease.
Amharic Document Image Retrieval Using Lingustic Features
(Addis Ababa University, 10/21/2011) Yeshambel, Tilahun; Assabie, Yaregal(PhD)
The advent of modern computers play important roles in processing and managing electronic information that are found in the form of texts, images, audios and videos, etc. With the rapid development of computer technology, digital documents have become popular options for storage, accessing and transmission. With the need of current fast evolving digital libraries, an increasing amount of historical documents, newspaper, books, etc. are being digitized into an electronic format for easy archival and dissemination purposes. Optical Character Recognition (OCR) and Document Image Retrieval (DIR), as part of information retrieval paradigm, are the two means of accessing document images that received attention among the IR community. Amharic is the official language of Ethiopia since 19th century and as a result so many religious and government documents are written in Amharic. Huge collections of Amharic machine printed documents are found in almost every institution of the country. It is observed that accessing those documents has become more and more difficult. To address this problem, very few number of research works have been attempted recently by using OCR and DIR methods. The aim of this research is to develop a system model that enables users to find relevant Amharic document images from a corpus of digitized documents in an easy, accurate, fast and efficient manner. So this work presents the architecture of Amharic DIR which allows users to search scanned Amharic documents without the need of OCR. The proposed model is designed after making detailed analysis of the specific nature of Amharic language. Amharic belongs to the Semitic languages and is morphologically rich language. Surface words formation involves prefixation, suffixation, infixation, circumfixation and reduplication. In this work a model for searching Amharic document images is proposed and word image features are systematically extracted for automatically indexing, retrieving and ranking of document images stored in a database. A new approach that applies one of the NLP tools which is Amharic word generator is incorporated in the proposed system model. By providing a given Amharic root word to this Amharic specific surface word synthesizer, a number of possible surface words are produced. Then, the descriptions of these surface word images are used for indexing and searching purposes. On the other hand the system passes through various phases such as noise removal, binirization, text line and word boundary identification, word segmentation and resizing to normalize different font types, sizes and styles, feature extraction and finally matching query word image against document word images. The proposed method was tested on different real world Amharic documents from different sources like magazines, textbooks and newspapers with various font styles, types and sizes. Precision-recall measures of evaluation had been conducted for sample queries on sample document images and promising results have been achieved.
Semantic Relation Extraction for Amharic Text Using Deep Learning Approach
(Addis Ababa University, 10/22/2020) Abi, Aschenaki; Assabie, Yaregal (PhD)
Relation extraction is an important semantic processing task in the field of natural language processing. The task of relation extraction can be defined as follows. Given a sentence S with a pair of annotated entities e1 and e2, the task is to identify the semantic relation between e1 and e2 following a set of predefined relation types. Semantic relation extraction can support many applications such as text mining, question answering, information extraction, etc. Some state-of-the-art systems in foreign languages still rely on lexical resources such as WordNet and natural language processing tools such as dependency parser and named entity recognizers to get high-level features. Another challenge is that important information can appear at any position in the sentence. To tackle these problems, we propose Amharic semantic relation extraction system using a deep learning approach. From the existing deep learning approaches, the bidirectional long short-term memory network with attention mechanism is used. It enables multi-level automatic feature representation learning from data and captures the most important semantic information in a sentence. The proposed model contains different components. The first is a word embedding that maps each word into a low dimension vector. It is a feature learning techniques to obtain new features across domains for relation extraction in Amharic text. The second is BLSTM that helps to get high-level features from embedding layer by exploiting information from both the past and the future direction. The single direction of relation may not reflect all information in context. The third is attention mechanism that produces a weight vector, and merges wordlevel features from each time step into a sentence-level feature vector, by multiplying the weight vector. To evaluate our model, we conduct experiments on Amharic-RE-Dataset, which is prepared from Amharic text for this thesis. The commonly used evaluation techniques precision, recall, and F-score are used to measure the effectiveness of the proposed system. The proposed attention based bidirectional long short term memory model yields an F1- score of 87.06%. It performs good result with only word embedding as input features, without using lexical resources or NLP systems.
Afaan Oromo Named Entity Recognition Using Neural Word Embeddings
(Addis Ababa University, 10/26/2020) Kasu, Mekonini; Assabie, Yaregal (PhD)
Named Entity Recognition (NER) is one of the canonical examples of sequence tagging that assigns a named entity label to each of a sequence of words. This task is important for a wide range of downstream applications in natural languages processing. Two attempts have been conducted for Afaan Oromo NER that automatically identifies and classifies the proper names in text into predefined semantic types like a person, location, and organizations and miscellaneous. However, their work heavily relied on hand design feature. We proposed a deep neural network architecture for Afaan Oromo Named Entity Recognition, based on context encoder and decoder models using Bi-directional Long Short Term Memory and Conditional Random Fields respectively. In the proposed approach, initially, we generated neural word embeddings automatically using skip-gram with negative subsampling from an unsupervised corpus size of 50,284KB. The generated word embeddings represent words in semantic vectors which are further used as an input feature for encoder and decoder model. Likewise, character level representation is generated automatically using BiLSTM from the supervised corpus size of 768KB. Because of the use of character level representation, the proposed model is robust for the out-of-vocabulary words. In this study, we manually prepared annotated dataset size of 768KB for Afaan Oromo Named Entity Recognition. We split this dataset into 80% for training, 5% for testing and 15% for validation. We prepared totally 12,963 named entities from these 10,370.4 %, 648.15% and 1,944.45% are used for training, validation and test set respectively. Experimental results show that the combination of BiLSTM-CRF algorithms with pre-trained word embedding and character level representation and regularization techniques (dropout) perform better as compared to the other models such as Bi-LSTM, BiLSTM-CRF with only character level representation or word embeddings. Using Bi-LSTM-CRF model with pre-trained word embeddings and character level representation significantly improved Afaan Oromo Named Entity Recognition with an average of 93.26 % F-Score and 98.87 % accuracy.
Morphology Based Spell Checker for Kafi Noonoo Language
(Addis Ababa University, 10/3/2018) Tafesse, Fikru; Assabie, Yaregal (PhD)
There are a number of NLP tools that are used in processing texts and other human languages. Among these tools spell checker is one that check the validity of words in the document. Spell checker is NLP application that is needed for every word processing document that analyze the input text for misspelled words and then provides possible suggestions for misspelled word for making correction. Two class of error in spelling error check: non-word error and real-word error. Non-word error is an error word that is misspelt and have no meaning in that specific language. Real-word error is a word that have meaning in that specified language but semantically and syntactically incorrect. Real word error is difficult to detect and provide suggestions and it needs syntactic and semantic analysis of the text. Dictionary look up and N-gram analyses are the most common used spelling error detection approaches. Edit distance, noisy channel model, neural network, rule-based, N-gram, phonetic based techniques are applied to generate suggestions for error correction. In spell checking area, a lot of work has been done in English, Arabic and Asian languages. Kafi Noonoo is one of the language spoken in South West part of Ethiopia by Kaffecho people. It is morphological rich language. There is no available spell checker for Kafi Noonoo language to analyze text written using this language which we were work on it. This thesis work is aimed to design and implement a spell checker system for Kafi Noonoo language. The proposed architecture of spell checker contains four main components: tokenization, error detection, word suggestion and error correction and with two backend components. Dictionary look up approach and morphology based approaches are used to implement the spell checker for Kafi Noonoo language. The prototype of the system is developed to test and evaluate the functionality and performance of the spell checker system. To test and evaluate the system, we used 2743 unique words collected from different sources. To measure the accuracy of the spell checker system lexical recall, error recall and precision evaluation metrics were used. Based on these evaluation metrics we get promising result of 95.91% lexical recall, 100% error recall and 62.76% precision.
Automatic Sediment Detection in Urine Micrograph
(Addis Ababa University, 10/3/2018) Worku, Ameha; Assabie, Yaregal (PhD)
Urine is one of the most complex fluid specimens found in our body. The concentration of urine sediments is an indication of various diseases. Invariably, medical facilities in our country employ manual approach to detect sediments under microscopes. However, medical results using manual approach are not always accurate. It could vary from person to person. Also, the approach is time consuming, and increases workloads of technicians. To mitigate these problems, scholars in the field recommend using automated detection of sediment in urine. However, ensuring accuracy from detection of se in urine remains challenging due to variations in urine color, irregular shape, and non-uniform illumination. Hence, in this research a better segmentation and feature extraction technique is proposed to detect urine sediments. Urine microscopic input image is improved by grayscale image, adaptive median filtering, and image adjustment which in turn yield uniform illumination for further analysis task. This study proposes a fusion of adaptive threshold, canny edge detection and morphological operations to isolate the background from the foreground and remove tiny objects. In this regard, a total of twenty-three features are extracted from shape, texture and color of urine to represent white blood cells, red blood cells, epithelia cells, and crystal in urine microscopic image. Finally, classification models are built using Neural Network and Multi Class Support Vector Machine. The performance of each model is compared using tenfold cross validation technique. Compared to other methods, this technique demonstrated acceptable detection performance with average sensitivity of 95.34%, specificity of 98.10%, precision of 90.22%, and accuracy of 95.93% using neural network classifier and an average sensitivity of 90.38%, specificity of 98.01%, precision of 91.68%, and accuracy of 97.40% using multiclass support vector machine for white blood cell(WBC), red blood cell(RBC), epithelial cell(EP) and Crystal, respectively. The performance of the proposed prototype is found to be effective for the identification of sediment in urine sample even in the context where sediment in urine have irregular shape, different color and poorly illuminated microscopic images.
E-Complaint Management System in Local Government: Case of Addis Ababa City Administration
(Addis Ababa University, 10/31/2018) Wolde, Endashaw; Getahun, Fekade (PhD)
Today e-government technologies are changing the nature of the interactions between residents and local government of a given city by intensifying the speed and impact of citizen complaints. The question of handling complaints of residents is an important part of service delivery in local government. The way in which residents of Addis Ababa city can file their complaint to their respective local government and get immediate response for their filed complaint is time taking and complex. Due to absence of effective web based customer complaints management system in the city, the residents are posing a question on the lack of effective complaint handling system. In this project we developed a web based complaint management system in local government of Addis Ababa city. To do that, we first study the current system to get necessary information to have a clear view of the existing method of complaint management system in AACA. This is done using observation, and revision of documents that AACA currently uses to handle complaint. Based on requirements gathered, analysis and design documents are prepare. The system enables city residents to participate in controlling the quality of the service provided in a city and able to citizens’ report/complaint their problems to their local governments to have an effective and efficient response. Finally, web based complaint management System prototype evaluation is conducted using questionnaire by involving 73 different participants. The results have shown that the web based complaint management system is easy to use, saves time and resource.
Query Expansion for Tigrigna Information Retrieval
(Addis Ababa University, 10/4/2017) Zeray, Tsadu; Asabie, Yaregal (PhD)
This research has been prepared to enhance the precision and recall of Tigrigna IR system by integrating query expansion mechanism. Query expansion is an effective mechanism to control the effect of polysemous and synonymous nature of query terms. The main reason for integrating query expansion is to increase retrieval of relevant documents as per user’s query based on the correct sense of query terms. This study has a way to discriminate the various meanings of a polysemous term, based on word sense disambiguation (WSD) and find synonymous terms for reformulating user’s query. The proposed algorithm determines the senses of synonymous and polysemous words in user’s query using Tigrigna WordNet. In this study, we experiment root form Tigrigna WordNet and Tigrigna morphological analysis in IR for the first time. Using the idea of N-gram model, word sense disambiguation is performed by comparing the existence of ambiguous query terms, associated with its synsets and related word using reference to Tigrigna WordNet. The notion of WSD is to identify the correct sense of ambiguous terms in user’s query and select the synonyms of the word. Then the selected synonyms of the ambiguous query term added to reformulate the original users query and the modified query will be used for searching of final result. The experimental result of this research gains in two different way, first prior IR system tested with morphological analysis instead of stemmer and second this IR system test by integrating query expansion model. The experiment shows encouraging result, the method of using morphological analysis before query expansion register a performance of 9%precision and 1.6 % recall, expanding query using synset expansion register an improvement of 12% precision and 4% recall on the overall performance. The number of words related to each polysemy terms is limited because of the lack of resource. Therefore, the uses of query expansion terms are limited to the information available on the WordNet.
Development of Automatic Parser for Tigrigna Sentences Using Bottom-Up Probabilistic Chart Parser
(Addis Ababa University, 10/4/2017) Medhin, Yaynshet; Assabie, Yaregal (PhD)
Automatic parsing is the process of dividing a given sentence to its grammatical structure. Parsing is useful for improving the performance of many NLP applications. There are many research works done on automatic parsing for different languages. The aim of this research work is to design and develop automatic parser for Tigrigna sentences using bottom-up probabilistic chart parser. We proposed the architecture of the designed system to the identified problem. The architecture has two parts: The learning and parsing. The learning part contains components from which the supervised learning is accomplished. The corpus collected from the different sources is preprocessed by developing simple preprocessing component. The preprocessed sample corpus is manually tagged by two language experts in the language. The tagged corpus is then parsed manually by the linguists. From the parsed sentences Probabilistic Context Free Grammar (PCFGs) are extracted. From the tagged corpus, lexicon was generated using the lexicon generation component. The parsing part contains components which perform the task of parsing given an input sentence such as sentence tokenization, morphological analysis and the PCFG parsing. The first two components make the input sentence suitable to the PCFG chart parsing component. We then conducted several experiments for both simple and complex Tigrigna sentences. Experimental findings were attained and the solution to the identified problems was addressed and suggested. The experiments were conducted in three parts. The first test was from the training set and the second test was done on test sets from the sample corpora. The third set was different from the two sets which was not from the sample corpora used in the study. The accuracy found on the first test set, second test set and third test set was 95%, 94% and 85%, respectively for the simple Tigrigna sentences. For the complex Tigrigna sentences the result achieved on the three test sets was 91%, 90% and 80%, respectively.

Browse

Browsing College of Natural and Computational Sciences by Issue Date

Results Per Page

Sort Options