Computer Science

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 20 of 373
  • Item
    DEACT: Hardware Solution to Rowhammer Attacks
    (Addis Ababa University, 2024-05) Tesfamichael Gebregziabher; Mohammed Ismail
    Dynamic Random-Access Memory (DRAM) technology has advanced significantly, resulting in faster access times and increased storage capacities by shrinking the size of memory cells and tightly packing them on a chip. However, as the scaling of DRAM continues, it presents new challenges and considerations that need to be addressed. Smaller memory cells and the proximity between them have led to circuit disturbance errors, such as the Row-hammer problem. These errors can be exploited by attackers to induce bit flips and gain unauthorized access to systems, posing a significant security threat. In this research, we propose DEACT, a counter-based hardware mitigation approach designed to tackle the Row-hammer problem in DRAM. It moves all frequently accessed rows to a safety sub-array. DEACT aims to prevent further row activations and maintain hot rows, effectively eliminating the vulnerability. Furthermore, our counter implementation requires smaller chip area compared to existing solutions. Moreover, We introduce DDRSHARP, a cycle-accurate DRAM simulator that simplifies configuration and evaluation of various DRAM standards. DDRSHARP provides over 1.8x simulation time reduction compared to contemporary simulators. Its performance is optimized by avoiding infeasible iterations, minimizing branch instructions, caching repetitive calculations and other optimizations.
  • Item
    A Hybrid Deep Learning-Based ARP Attack Detection and Classification Method
    (Addis Ababa University, 2023-12) Yeshareg Muluneh; Solomon Gizaw
    To map the Internet Protocol (IP) addresses to the Media Access Control (MAC) addresses and vice versa in local area network communication, the Address Resolution Protocol (ARP) is the most crucial protocol. ARP, however, is an unauthenticated protocol that lacks security features and is stateless in nature. Therefore, ARP is vulnerable to many attacks, and it can be easily exploited to gain unauthorized access to one's sensitive data and transmit bogus ARP messages to poison the ARP caches of the hosts within the local area network. These attacks may result in a loss of data integrity, confidentiality, and the availability of an organization's information. Many researchers have struggled to detect ARP attacks using different methods. However, some of these papers are not time-effective, require more human effort and involvement, and have high communication overhead. The other works use machine learning and deep learning methods, which have better solutions for detecting ARP attacks. However, those approaches have a significant false alarm rate of 13%, a low attack detection rate, and a classification accuracy of 87%. This thesis work aims to solve those problems using a hybrid deep learning-based ARP attack detection and classification method. In this work, we used a Sparse Autoencoder for important feature extraction and dimensionality reduction for input data and a Convolutional Neural Network for attack detection and classification to achieve the highest attack detection rate and classification accuracy with a minimized false alarm rate. To evaluate the performance of the proposed model, we used an open-source benchmark NSL-KDD dataset for training and testing. The results obtained by the implementation and evaluation are measured in comparison with a single Convolutional Neural Network model with different evaluation metrics. Hence, the proposed approach scores the highest results for attack detection rate of 98.97%, classification accuracy of 99.26%, and minimum false alarm rate of 0.74%.
  • Item
    Design of Searchable Encryption with Refreshing Keyword Search Using Pairing-Based Cryptography
    (Addis Ababa University, 2024-10) Kuma Bekele; Minale Ashagrie
    To maintain data security and privacy, the Public Key Encryptions with Keyword Search (PEKS) scheme has been implemented. They offer search capabilities for encrypted data. However, because the Key Generating Center (KGC) knows the target users' private key, the existing PEKS schemes are vulnerable to key-escrow issues. The Certificate-Less Public Key Encryptions with Keyword Search (CL-PEKS) scheme was created to address the key escrow problem in PEKS schemes. However, refreshing keyword searches are not considered by the CL-PEKS schemes that are currently existing. As a result, the target server can launch keyword-guessing attacks and store search trapdoors for system keywords. By appending date information to the encrypted data and keyword, we proposed the certificate-less based Searchable Encryption with a Refreshing Keyword Search (SERKS) scheme. We designed the system model and algorithms for the proposed certificate-less based SERKS using pairing-based cryptography. We also developed its prototype in the case of a web-based e-mail system by using Java Pairing-Based Cryptography (JPBC) library. The security hardness of the proposed scheme is based on the hardness of the Bilinear Diffie-Hellman (BDH) problem assumption. We assessed the suggested scheme's performance with respect to time complexity in terms of both communication and computational costs. The experimental results demonstrate that the suggested SERKS scheme has a lower computational cost than the two related schemes during the key generation and testing phases when compared to the earlier related work. Additionally, it has lower communication costs.
  • Item
    Development of Text-to-Speech Synthesis Model for Afaan Oromoo Using Transformer Neural Network
    (Addis Ababa University, 2025-03) Bayisa Bedasa; Yaregal Assabie (PhD)
    Text-to-speech (TTS) is a process of converting written text into spoken words. It analyzes the incoming text, processes linguistic data, and produces audio output using algorithms. TTS systems are widely utilized in applications such as virtual assistants, accessibility tools for individuals with visual impairments, and language learning software. Afaan Oromoo is a Cushitic language mostly spoken in Ethiopia and other parts of Africa and serves as an essential means of communication for the Oromo people. For Afaan Oromoo, developing a TTS system is essential for enhancing accessibility and promoting the use of the language in digital environments. This study focuses on a transformer-based neural network model technique for Afaan Oromoo TTS. The model architecture comprises an encoder-decoder structure. The encoder processes input text by converting it into a contextualized representation, while the decoder generates speech waveforms from this representation. We enhanced the model with multi-head attention mechanisms to capture long-range dependencies in the input text, improving prosody. Additionally, we employed a HiFi-GAN-based vocoder for converting the model's output into high-fidelity audio waveforms, enhancing the overall quality of the synthesized speech. Utilizing the transformer architecture, the implementation is carried out in Python. We have produced 17 hours of audio dataset and their corresponding text transcription from the Afaan Oromoo speech corpus by a male speaker. The transformer-based text-to-speech synthesis architecture has outperformed the previously done model based on BLSTM-RNN for Afaan Oromoo language TTS, whose results are 3.77 and 3.76 in terms of intelligibility and naturalness, respectively. We used the Mean Opinion Score (MOS) to assess naturalness and intelligibility subjectively. Experimental results indicate that our transformer-based TTS system achieved a MOS score of 4.21 for naturalness and 4.23 for intelligibility, reflecting a commendable performance level. Our model also enables prosody modeling with user input parameters to generate deterministic speech, positioning it as a state-of-the-art solution.
  • Item
    Development of Spell Checker for Guragina Language
    (Addis Ababa University, 2024-03) Mengistu Gebre; Yaregal Assabie
    A spell checker is an essential tool in Natural Language Processing (NLP). Its purpose is to identify and correct spelling errors in text, providing suggestions for correct spellings in a specific language. Spelling errors can be categorized into two types: non-word errors and real-word errors. Non-word errors are misspelled words that have no meaning in the particular language, while real-word errors involve words that exist in the language but are used incorrectly in terms of semantics and syntax. The research focused on non-word error detection as a strategic decision, given the complexity and limited resources available for the Gurage language, also known as Guragina. This language consists of over thirteen varieties and different orthographies, but there is a modern standard. Currently, there is no existing spell checker for any Guragina Language varieties or the standard. Addressing non-word errors first provides a solid foundation before tackling the more challenging task of real-word error detection and correction. This phased approach allows researchers to make meaningful progress on this under-resourced language, rather than attempting to solve the entire spell checking problem at once. The intention is to use the non-word spell checker as a starting point, then leverage that knowledge to progressively tackle real-word error handling. This work introduce a non-word spell error checker for the standard Guragina Language. The system detects and corrects errors using Ratcliff algorithms for identification and distance calculator techniques for correction. The prototype of the system was developed using Python. We evaluate the performance of the system using metrics such as accuracy of 98.27%, precession of 98.07%, recall of 97.75%, and F1 Score of 95.45%. Future work includes enhancing rule definitions by incorporating word classes, handling exceptions, adding supplementary spell checker functionalities, and expanding the system to encompass real-word errors.
  • Item
    Jamming Attack Detection and Classification Using Exponentially Weighted Moving Average and Random Forest in Wireless Sensor Networks
    (Addis Ababa University, 2024-10) Alemayehu Ebissa; Mulugeta Libsie
    Wireless sensor networks are widely used in environmental monitoring, industrial automation, healthcare, and smart cities for data collection, real-time monitoring, and automated decision-making. These networks, consisting of randomly distributed autonomous nodes, are vulnerable to jamming attacks where malicious entities disrupt network transmissions by emitting interfering signals. Existing detection methods typically rely on either statistical or machine learning-based approaches, each with significant limitations: statistical-based methods are prone to high false alarm rates, while machine learning-based methods impose computational overhead on resource-constrained nodes. To address these limitations, this thesis presents a two-level jamming attack detection and classification method that combines the strengths of both approaches. The method integrates an Exponentially Weighted Moving Average (EWMA) for lightweight detection with a Random Forest classifier for accurate jamming attack classification. The approach begins with feature selection, utilizing key features such as the Received Signal Strength Indicator (RSSI) and Packet Error Rate (PER), which can be easily obtained without adding significant overhead on sensor nodes. The method consists of a training phase and a testing phase. In the training phase, the dataset is processed through the EWMA computation to smooth the time-series data, followed by threshold calculation. The EWMA-smoothed data is then used to train the Random Forest classifier. In the testing phase, the testing dataset also passes through the EWMA computation, and the EWMA-based jamming detection determines if a jamming attack is occurring by comparing against a predefined threshold. Once potential jamming is detected, the system transitions into the classification of the three jamming types: constant, periodic, or reactive jamming. Experimental evaluation demonstrates that our method achieves a 99.91% detection rate and 99.26% accuracy in jamming classification. These results show significant improvements over existing methods, particularly in reducing false positives while maintaining high detection accuracy.
  • Item
    A Framework for Detecting Multiple Cyberattacks in IoT Environment
    (Addis Ababa University, 2025-02-25) Yonas Mekonnen; Mesfin Kifle (PhD)
    The Internet of Things refers to the growing trend of embedding ubiquitous and pervasive computing capabilities through sensor networks and internet connectivity. The growth and expansion of newly evolved cyberattacks, network patterns and heterogeneous nature of cyberattacks trend has become the warfare across the globe and challenges to apply single layer cyberattacks detection techniques to the Internet of Things. This research work identified the lack of cyberattacks detection framework as the major gap for detection of multiple cyberattacks such as denial of services, distributed denial of services, and Mairi attacks while it includes multiple parameters at the same time. The proposed framework contains three modules; data acquisition and preprocessing module that is responsible for capturing and pre-processing the captured data and ready for the construction of the model, then the attack detection module which is the core engine that orchestrates the detection of cyberattacks, the third module notifies and displays the results in a dashboard. This research study used multiple parameters including multiple attack classes, network packet patterns, and three scaler types namely no scaler, MinMax, and Standard, and regardless of the defined parameters used, minmax scaler followed by standard scaler gives better detection performance than models trained with no scaler. The proposed framework is trained and evaluated with different models including CNN, Hybrid, FFNN, and LSTM provides a result of 91.42%, 82.75%, and 78.38% ,74.83% detection accuracy respectively where it is observed that CNN model outperforms the optimal results among followed by hybrid and FFNN.
  • Item
    Transformer-Based Machine Translation System Model from Geez to Tigrigna
    (Addis Ababa University, 2024-06) Aberash Berhe; Yaregal Asabie
    This thesis presents the first attempt at building a transformer based neural machine translation system for the language pair of Geez and Tigrigna. Geez and Tigrigna are closely related Semitic languages, with Geez being the liturgical language of the Eritrean and Ethiopian Orthodox churches, and Tigrigna being a widely spoken language in Eritrea and parts of Ethiopia. Due to the lack of publicly available parallel corpora for this language pair, the thesis describes the manual collection and curation of a new Geez-Tigrigna parallel dataset, which consists of 10,362 sentence pairs. This process is detailed as it proved to be a laborious and time-consuming task given the limited availability of translated text between the two languages. The architecture of the proposed neural machine translation system is based on the transformer model, which has shown state-of-the-art performance on many language pairs. To address the challenges of translating between low-resource languages like Geez and Tigrigna, an alignment-based approach is integrated into the standard transformer architecture. This alignment mechanism aims to better capture the relationships between source and target language elements during the translation process. The word-level alignments between the parallel sentences are done manually. Experiments are conducted to compare the performance of attention-based recurrent neural network model, a standard transformer model, and the proposed alignment-augmented transformer model. The results show that the standard transformer model achieved a BLEU score of 54%, outperforming the RNN model, which had a BLEU score of 46%. Further improvements were made by integrating the alignment mechanism into the transformer architecture, resulting in an alignment-augmented transformer model that achieved a BLEU score of 63%. These findings demonstrate the feasibility of building neural machine translation systems for low-resource language pairs like Geez and Tigrigna, and that the proposed alignment-based modifications to the transformer architecture can lead to significant improvements in translation quality compared to the standard transformer model.
  • Item
    Automatic Amharic Text Catigorization
    (Addis Ababa University, 2007-03) Yohannes Afework; Mulugeta Libsie (PhD)
    Rapid developments in Information and Communication Technology are making available huge amount of data and information. Much of these data is in electronics forms (like the more than billion documents in the World Wide Web). Usually these data do not have a standard structure like that of the relational database. Much of the data are unstructured or semi-structured and can generally be considered as a text database. Text databases are showing accelerated growth throughout the world. As the result, there is an active field of study in text mining to facilitate the extraction of u useful and relevant information from text databases. The text data In local languages is also increasing fast, requiring text-processing tools for text documents to be available in local languages. This is true for Amharic also, as can be surmised from the recent boom of online newspapers. magazines, data in electronics storage, etc. To facilitate the retrieval of useful and relevant information from Amharic documents, a number of researches on automatic processing of Amharic text have recently been conducted. This research work in Automatic Amharic Text Categorization is an effort to contribute in this direction. Automatic classification of text data requires that documents are represented by feature words. Representing a document by relevant feature words is an important pre-processing step for automatic classification; it often determines the efficiency and accuracy of the classification. Standard pre-processing tools and methods are therefore very important for automatic classification. Because of the lack of standard in the Amharic writing system and unavailability of Amharic text processing tools, the focus of the research was on developing a document-pre-processing scheme which facilitates for an efficient automatic classification of Amharic documents. To this end much a ttention was given to the processing of the source data by developing and enhancing the following tools. The tools are specific to the source data - Amharic news documents from ENA. • A tool to correct word spelling variations. Focusing on spelling variation due to pronunciation differences. • Enhancement to the suffix and prefix removal tool developed in a previous study, so that it can perform semantic analysis before stripping-off affixes from words. • A tool to correct word variations due to gender marker suffixes. • A tool to correct word variations due to number marker suffixes. • A tool to merge com pound words (when they may result In semantic loss if separated) written as separate words. The use of these tools (which enabled 10 to 30 % feature reduction) in addition to other tools and data reduction methods helped to analyze the huge source data (69,684 news items after data cleaning) and measure classifier performances. Because of the high dimensionality of the source data, classifier algorithms that are suitable for high-dimensional data, Decision Tree and Support Vector Machine (SVM) classifiers were selected for the research experiment. The open source Weka package is used for the automatic classification of the preprocessed data. Out of the many classifier algorithms available in Weka, the Logic Model Tree (LMT) and the Library of SVM (LibSVM) classifiers were used for performance testing. Both LMT and LibSVM classifier showed good classification accuracy correctly classifying 79.72% and 8l.15% of the test instance into the 15 news categories, respectively. However, the computational cost of the automatic classification was very high - taking several hours in high capacity computers (Computers with 512 MB RAM and 3.7 GHz speed). The classification performance measures indicate the need for additional works in developing tools and methods for mining Amharic data.
  • Item
    Term Re-Weighting Based Query Expansion Approach for Amharic Information Retrieval
    (Addis Ababa University, 2014-02) Zelalem Addis; Dereje Teferi (PhD)
    This research has been conducted, aiming at augmenting the precision while lessening the original recall of an Amharic IR system. The main reason for performing query expansion is to provide relevant document as per users query that can satisfy their information domain area. They mostly formulate weak queries to retrieve documents. Thus, they end up frustrated with to the results found from an IR system. Some of the causes for this type of problem are, polysemous and synonymous terms, which require an integration of query reformulation strategy to the IR system. The present study has explored term re-weighting based query expansion approaches that integrate term re-weighting with Statistical Co-occurrence analysis, bi-gram analysis and bi-gram thesaurus methods. In this approach, the users relevance feedback are represented as vector respectively, and the similarity between them can be obtained by calculating the vector similarity. Then we re-weight the terms through one single document and the entire document set using Rocchios reweighting scheme respectively final weight of the term can be selected as query expansion terms and then fed to the three query term, regardless of their position. Term re-weighting, three proposed query expansion techniques were integrated to an information retrieval system. Then test result showed that bi-gram method outperformed the other two and scored 2% improvement in total F-measure. The performance of the system can further be improved by designing ontology based query expansion in order to control expanding terms that are polysemous by themselves.
  • Item
    Coreference Resolution for Amharic Text Using Bidirectional Encoder Representation from Transformer
    (Addis Ababa University, 3/4/2022) Bantie, Lingerew; Assabie, Yaregal (PhD)
    Coreference resolution is the process of finding an entity which is refers to the same entity in a text. In coreference resolution similar entities are mention. The task of coreference resolution is clustering all similar mentions in a text based on the index of a word. Coreference resolution is used for several Natural Language Processing (NLP) applications like machine translation, information extraction, name entity recognition, question answering and others to increase their effectiveness. In this work, we have proposed coreference resolution for Amharic text using bidirectional encoder representation from transformer (BERT). This method is a contextual language model that generates the semantic vectors dynamically according to the context of the words. The proposed system model has training and testing phase. The training phase includes preprocessing (cleaning, tokenization and sentence segmentation), word embedding, feature extraction Amharic vocabulary, entity and mention-pair and coref model. Like training phase, testing phase has its own step such as preprocessing (cleaning, tokenization and sentence segmentation) and coreference resolution as well as Amharic predicted mention. The use of word embedding in the proposed model is that it represent each word into a low dimension vector. It is a feature learning technique to obtain new features across domains for coreference resolution in Amharic text. Necessary informations are extracted from word embedding and processed data as well as Amharic characters. After we extract important features from training data we build a coreference model. Moreover, in the model bidirectional encoder representation from transformer is used to obtain basic features from embedding layer by extracting various information from both the left and right direction of the given word. To evaluate the proposed model, we conduct the experiment using Amharic dataset, which is prepared from various reliable sources for this study. The commonly used evaluation metrics for coreference resolution task are MUC, B3, CEAF-m, CEAF-e and BLANC. Experimental result demonstrate that the proposed model outperformed state-of-the-art Amharic model achieving 80%, 85.71%, 90.9%, 88.86% and 81.7% F-measure values respectively on the Amharic dataset.
  • Item
    Deep Learning Based Emotion Detection Model for Amharic Text
    (Addis Ababa University, 8/26/2021) Tesfu, Eyob; Belay, Ayalew (PhD)
    Emotions are so important that whenever we need to make a decision, we want to feel other‟s emotions. This is not only true for individuals but also for organizations. Due to the rapid growth of internet peoples expirees their emotions using different social media networks, reviews, blogs, online and so on. The need for finding relevant sources, extracts related sentences with emotion, summarizes them and organize them to useful form is becoming very high. Emotion detection can play an important role in satisfying these needs. The process of emotion detection involves categorizing emotional sentences into predefined categories such as sadness, anger, disgust, happiness, so on based on the emotional terms that appear within the comment. So that it‟s difficult to manually identifying emotion of a million of users and aggregating them towards a rapid and efficient decision is quite a challenging task due to the rapid growth of Amharic language usage in social media. In this research work, an emotion detection model is proposed for determining the emotion expressed in the Amharic texts or comment. In this study, we proposed deep learning based emotion detection model for Amharic text using CNN with word embedding. The proposed model includes different tasks. The first task is text pre-processing which consists of commonly used text pre-processing tasks in many natural language processing applications. We perform text pre-processing in Amharic text and train the document using a word embedding in order to generate word embedding model. The embedding result provides a contextually similar word for every word in the training set then we implement our CNN model for emotion classification. The common evaluation metrics such as accuracy, recall, F1 score and precision were used to measure our proposed model performance. Deep learning based emotion detection model for Amharic text prototype is developed and used to tests the system performance using the collected Amharic text comments. Finally, this study with four categories (sadness, anger, disgust, and happiness) of classification shows a result of 71.11% accuracy. Also did better when the number of classification is two (positive and negative) shows result of 87.46% accuracy. We also evaluate our model using RNN to compare with our CNN model.
  • Item
    Semantic Role Labeling for Amharic Text Using Deep Learning
    (Addis Ababa University, 8/17/2021) Meresa, Bemnet; Assabie, Yaregal (PhD)
    Semantic Role Labeling (SRL), the task of automatically finding the semantic roles of each argument corresponding to each predicate in a sentence, is one of the essential problems in the research field of Natural Language Processing (NLP). SRL is a shallow semantic analysis task, and an important intermediate step for many NLP applications, such as Question Answering, Machine Translation, Information Extraction and Text Summarization. Feature-based approaches to SRL are based on parsing output, often using lexical resources, and require heavy feature engineering. Errors encountered in the parsing output can also propagate to the SRL output. Neural-based SRL systems, in contrast, can learn the intermediate representations from raw text, bypassing the manual feature extraction task. Recent SRL studies using Deep Learning have shown improved performance over feature-based systems for the English, Chinese and other languages. Amharic exhibits typical Semitic behaviors that pose challenges to the SRL task, such as, rich morphology, and multiple subject-verb-object word orders. In this work, we approach the problem of SRL for the language using deep learning. The input is raw sentence with words represented using a concatenation of word, character, and fastText-level neural word embeddings to capture the morphological, syntactic and semantic information of the words in sentences, and requires no intermediate feature extraction tasks. We have used a bi-directional Recurrent Neural Network (RNN) with Long-Short Term Memory (LSTM) to capture the bi-directional (for argument identification) and long-range (for argument boundary identification), and a conditional random field with viterbi-decoding to implement the SRL system for the language. The system was trained on 8000 instances and tested on 2000 instances, and achieved an accuracy of 94.96% and F-score of 81.2%. We have manually annotated the sentences with their corresponding semantic roles, and future works can consider improving the quality of the data and experiment feature representations using contextual embeddings for improved performance.
  • Item
    Open Source ESB Based Application Integration Case of Ethiopian Revenue and Customs Authority
    (Addis Ababa University, 12/6/2016) Tesfaye, Mihret; Getahun, Fekade (PhD)
    Nowadays integration and interoperability becomes a key issue for organizations that work together. Enterprise Service Bus has become the ideal integration architecture for heterogeneous systems that facilitates integration between disparate applications with different hardware and software platforms. The aim of this work is to assess services provided by Ethiopian Revenue and Customs Authority for vehicle declaration those require integration and study the workflows of the existing system. This study provided an Enterprise Service Bus product evaluation matrix and four open source ESB products evaluated and the appropriate product for implementation selected. After discussing core ESB concepts, features and benefits, proprietary and open source ESB products are described briefly. The Enterprise Service Bus product evaluation matrix prepared by reviewing variety of research papers by different professionals and organizations and the products evaluated based on the matrix. Based on the result from the comparison, the WSO2 ESB is used for developing the integration scenario. The development of the scenario is done using WSO2 ESB and detail information on the design and development is included. Finally the design and implementation of the integration scenario is done using the selected ESB solution that is WSO2 ESB and the integrated system evaluated by performing functional testing. The result of the functional testing indicated a successful outcome for all the test sets.
  • Item
    Amharic Sentence Generation from Interlingua Representation
    (Addis Ababa University, 12/27/2016) Yitbarek, Kibrewossen; Assabie, Yaregal (PhD)
    Sentence generation is a part of Natural Language Generation (NLG) which is the process of deliberately constructing a natural language text in order to meet specified communicative goals. The major requirement of sentence generation in a natural language is providing full, clear, meaningful and grammatically correct sentence. A sentence can be generated from different possible sources, including a representation which does not depend in any human languages, which is an Interlingua. Generating a sentence from an Interlingua representation has numerous advantages. Since Interlingua representation is unambiguous, universal and independent of both the source language and the target language, the generation should be target language-specific, and likewise should be the analysis. Among the different Interlinguas’, Universal Networking Language (UNL) is commonly chosen in view of various advantages over the other ones. Various works have been done so far for different languages of the world to generate sentences from UNL expression but to the best of our knowledge there are no works done so far for Amharic language. In this thesis, we present Amharic sentence generator that automatically generates Amharic sentence from a given input UNL expression. The generator accepts a UNL expression as an input and parses to build a node-net from the input UNL expression. The parsed UNL expressions are stored in a data structure which could be easily modified in the successive processes. UNL-to-Amharic word dictionary is also prepared and it contains the root form of Amharic words. The Amharic equivalent root word and attributes of nodes in a parsed UNL expression will be fetched from the dictionary to update the head word and attributes of the corresponding node. Then, the translated Amharic root words will be locally reordered and marked based on the Amharic grammar rules. When the nodes are ready for generation of morphology, the proposed system makes use of Amharic morphology data sets to handle the generation of noun, adjective, pronoun, and verb morphology. Finally, the function words are inserted to the morphed words so that the output matches with a natural language sentence. The evaluation of the proposed system has been performed on dataset of 142 UNL expressions. Subjective tests like adequacy and fluency tests have been performed on the proposed system. Moreover, the quantitative test or error analysis has also been performed by calculating Word Error Rate (WER). From this analysis, it has been observed that the proposed system generates 71.4% sentences that are intelligible and 67.8% sentences that are faithful to the original UNL expression. Consequently, the system achieved a fluency score of 3.0 (on a 4-point scale) and adequacy score of 2.9 (on a 4-point scale). Furthermore, the proposed system has word error rate of 28.94%. These scores of the proposed system can be improved further by improving the rule base and lexicon.
  • Item
    Integrated Caching and Prefetching on Dynamic Replication to Reduce Access Latency for Distributed Systems
    (Addis Ababa University, 7/13/2021) Binalf, Yilkal; Libsie, Mulugeta (PhD)
    Distributed computing is a rapidly developing IT technology. Every system connects to other systems via the network to improve its performance. Thanks to distributed systems technology, workers from all over the world can collaborate to work for a single company, and customers of these companies can access data and receive service as if they were in the same location. However, as the number of users and organizations requesting and delivering these services grows, there is a problem with access latency. One of the major problems of distributed systems is response time latency. As a result, we developed the integrated Caching and Prefetching on Dynamic Replication (CPDR) algorithm, which reduces access latency in distributed computing environments. Cacher, Prefetcher, and Replicator are the three main components of the developed system. There is one more unique component in the cacher called Notifier, which has the Prefetcher's status and is used to save time when the prefetcher is not active and the requested data is not available. Furthermore, the Cacher, Prefetcher, and Replicator each has a manager component that contains an algorithm for controlling the Cache storage, prefetching data, replicating data, and determining where data should be placed. Moreover, taking various scenarios, which depict the minimum and maximum capacity of the computing environment as well as different requirements of incoming jobs, we evaluated our algorithm With caching, prefetching, dynamic replication, the integration of caching and prefetching, the integration of caching and dynamic replication, integration of prefetching and dynamic replication algorithms. It is observed that the proposed algorithm outperforms the counterparts from the perspective of response time and storage utilization
  • Item
    Automatic Soybean Quality Grading Using Image Processing and Supervised Learning Algorithms
    (Addis Ababa University, 10/12/2021) Hassen, Muhammed; Assabie, Yaregal (PhD)
    Soybean is one of the most important oilseed crops of the world which requires 25 to 30°C temperature for growth and proper modulation. Due to its high protein content and nutritional quality soybean usually used in food preparation, animal feed and industry sector. It is an input for food products like soy milk, for human consumption and as in put for industry for production like paper, plastic and cosmetics. The trading of soybean in Ethiopia is done through Ethiopian Commodity exchange internally as well as for export trading. Determining the quality grade of soybean is crucial in the trading process. It improves the production of quality soybeans and it helps to become competent in the market. This process is done manually in Ethiopian Commodity Exchange which is subjected to different problem: less efficient, inconsistent and vulnerable to subjectivity. As a solution in this thesis we propose an automated quality grading of soybean using image processing techniques and supervised learning algorithms, which is the aim of this thesis. Image acquisition, image pre-processing, image segmentation, predict soybean type and determining the grading are the major steps that are followed. For image preprocessing, methods like median filter to remove noise, modified unsharp masking sharpening technique is used to enhance the quality of acquired soybean image. In image segmentation a modified Otsu‟s threshold segmentation method is used to apply to a color image. Nineteen typical characteristic parameters of samples are extracted as the characteristic soybean, which are 7 morphological, 6 colors and 6 texture features. Three different supervised learning algorithm classifiers are applied and compared: support vector machine algorithm, artificial neural network and convolutional neural network. Experimental results show one dimensional convolutional neural network outperforms the others with accuracy rates of 93.71% on the test datasets collected from Ethiopian Commodity Exchange. We concluded that the CNN is superior to other supervised learning algorithm, and using aggregated features is better than using single type of features.
  • Item
    Automatic Fraud Detection Model from Customs Data in Ethiopian Revenues and Customs Authority
    (Addis Ababa, Ethiopia, 2013-03) Muhammed, Meriem; Hailemariam, Sebsibe(PhD)
    C\.Jstoms, which is one of the three wings in Ethiopian Revenues and Customs Authority (ERCA), is established to secure national revenues by controlling impons and exports as well as coll ecting go~emmental tax and duties. This research focuses on identification, modeling and analysis of various conflicting issues that Ethiopian customs faces. One of the major problems identified during problem understanding is controlling and management of fraudu lent behavior of fo reign traders. The declarams' intent to various types of fraudulent activities which result in the need for serious inspection of declarations and al the same lime, the huge amount of declarations per day demand significant number of human resource and time. Recognizing this critical problem of the government, ERCA adopt Automated System for Customs DAta (ASYCUDA). ASYCUDA attempts to minimize the problems through risk level recommendation to declarations using select ivi ty method that uses five parameters from the decJarants' information. The fundamenta l problem to ASYCUDA risk leveling is, restricting the variables whi ch are used to assign risk level; this may lead to direct the declaration into incorrect channel. This research proposed a machine learning approach to model fraudulent behavior of importers through identification of appropriate parameters from the observed data to improve the quality of service at Customs, ERCA. In this research, the researcher proposed automated fraud detection models which predict fraud behaviors of importing cargos, in which the problem assoc iated with ASYCUDA risk leveling wi ll be minimized. The models have been bui lt through machine learning techniques by using the past data which was collected from customs data of ERCA. The analysis has been done on inspected cargos records having 74,033 instances and 24 attributes. Four different prediction models were proposed. The first model is fraud prediction model, which predicts whether incoming cargo is fraudulent or not. The second model is fraud category prediction model, which identifies the specific type of the fraud category among the ten identified categories. The third model is fraud level prediction model. which class ifies the fraud level as high or low. The last model is fraud ri sk level prediction model which is used to classify the risk level of importing cargos into high. medium or low. x i • • Moreover. from the recommendation of IEEE, four best machine learning approaches have been tested for each of the identified prediction models. These are C4.5, CART. KNN and Naive Bayes. Based on the results which are obtained through various experimental analyses. C4.5 is found to be the best algorithm to build all types of the prediction models. The accuracy obtained in the first, second, third and founh scenarios using C4.5 machine learning algorithms are 93.4%,84.4%, 89.4%, and 86.8% respecti vely. The next best algorithm, Classification and Regression Tree (CART), performed an accuracy of 92.9%,80. 1 %,89.4%,85.3% for the first, second, third and fourth scenarios respectively. The researchers observed that both C4.5 and CART perform better for fraud prediction and fraud level classification compared to fraud category and risk level prediction. Moreover, Naive Bayes statistical approach is found to be very poor. Key words: Fraud prediction, fraud category prediction, fraud level prediction, fraud risk level prediction, classification, machine learning algorithm, ASYCUDA.
  • Item
    Design and Implementation of Afaan Oromo Spell Checker
    (Addis Ababa, Ethiopia, 2013-06) Olani, Gaddisa; Midekso, Dida (PhD)
    Developing language applications or localization of software ;s a resource intensive task Ihal requires the active participation of stakeholders with various backgrounds (i.e. from lingllislic and campl/llI/iollal perspecli\'e~~. With a cons/ani increase in /he amou1Ifs of electronic information and the diverJ'ily of languages which are used to produce them, these challenges get compounded. Various researches ;n the fields of computotionollinguis/ics and computer science have been carried oul while slill many more are on Iheir way 10 alleviate such problems. Spell checker is one pOlential candidate 10 this. Use 0/ compUlers for document preparation is one of those many ,asks undertaken by different organizations. Introducing texts /0 word processing tools may result in spelling errors. Hence, texl processing applica/ion software has !>pell checkers. Inlegraling spell checker in/o word processors reduces Ihe amount of time alld energy lpentto find and correct/he misl-peJ/ed lYord. However, these lools are not ami/able for A/aan Oromo language, Lowland East Cushitic sub-family of the Afro-asialic s/lper·phylum language family spoken in Ethiopia. In this thesis, we describe the design aml implementation 0/ AfaWl Oromo spell checker. Morphology ba~ed (i.e. dictionary look-up with morphological rule!>) computational model was employed to design and develop Afaan Oromo Spell Checker (AOSq. Algorithms Ihat take Ihe morphological properties of Afaan Oromo into consideration are developed/rom scratch and applied, as there are no previous such allemplS. The proposed system was evaluated using Iwo datasets 0/ different size. The experiment result shows thai the lexicon size and nIles in the knowledge base play a vital role to recognize the valid input word, flag Ihe invalid word and generate correct suggestion/or Ihe misspelled word. In general, the algorilhms and techniques used in this study oblained good performance when compared to the other resource-rich languages like English The resalt obtained encourages the IIndertaking of further research in the area, especially wil the aim of developing a full-fledged Afaan Oromo spell checker.
  • Item
    Protocol of a Systematic Mapping Study of Requirements Engineering Approaches for Big Data Project Development
    (Addis Ababa University, 2021) Regane, Belachew; Beecham, Sarah; Lemma, Dagmawi; Power, Norah