School of Information Technology and Engineering
Permanent URI for this collection
Browse
Recent Submissions
Item Deep Learning-Based Amharic Keyword Extraction for Open-Source Intelligence Analysis(Addis Ababa Univeristy, 2025-06) Alemayehu Gutema; Henok Mulugeta (PhD)In today's digital age, the problem of information overload has become a pressing concern, especially in the field of OSINT (Open-Source Intelligence). With vast amounts of data available on the internet, it is challenging to separate relevant and credible information from the noise. An OSINT approach involves gathering intelligence from publicly available sources. However, with the increasing volume and diversity of online content, it has become difficult to extract actionable intelligence from enormous amounts of data. Deep learning can help identify patterns in large amounts of data and automate decision-making processes. Despite these advances, a problem of information overload still exists. One approach to addressing this problem is to develop effective deep learning model to extract the relevant information. Leveraging both machine and deep learning algorithms with natural language processing (NLP) can help automatically classify and categorize information. The purpose of this study is to design deep learning model to extract intelligence from vast amount of Amharic dataset, aiming to design model for keyword extraction. Keyword extraction is the process of identifying important words or phrases that capture the essence of a given piece of text. This task is critical for many natural language processing applications, including document summarization, information retrieval, and search engine optimization. In recent years, deep learning algorithms have shown great promise in this field, largely due to their ability to learn from vast amounts of data and extract complex patterns. In this paper, we propose a novel keyword extraction approach based on deep learning methods. We will explore different algorithms, such as recurrent neural networks (RNNs) and transformer models, to learn the relevant features from the input text and predict the most salient keywords. We evaluate our proposed method on datasets containing Amharic content, and show that it outperforms state-of-the-art methods. Our results suggest that deep learning-based approaches have the potential to significantly improve keyword extraction accuracy and scalability in realworld application.Item Multimodal Unified Bidirectional Cross-Modal Audio-Visual Saliency Prediction(Addis Ababa University, 2025-06) Tadele Melesse; Natnael Argaw (PhD); Beakal Gizachew (PhD)Human attention in dynamic environments is inherently multimodal and is shaped by the interplay of auditory and visual cues. Although existing saliency prediction methods predominantly focus on visual semantics, they neglect audio as a critical modulator of gaze behavior. Recent audiovisual approaches attempt to address this gap but remain limited by temporal misalignment between modalities and inadequate retention of spatio-temporal information, which is key to resolving both the location and timing of salient events, ultimately yielding suboptimal performance. Inspired by recent breakthroughs in cross-attention transformers with convolutions for joint global-local representation learning and conditional denoising diffusion models for progressive refinement, we introduce a novel multimodal framework for bidirectional efficient audiovisual saliency prediction. It employs dual-stream encoders to process video and audio independently, coupled with separate efficient cross-modal attention pathways that model mutual modality influence: One pathway aligns visual features with audio features, while the other adjusts audio embeddings to visual semantics. Critically, these pathways converge into a unified latent space, ensuring coherent alignment of transient audiovisual events through iterative feature fusion. To preserve finegrained details, residual connections propagate multiscale features across stages. For saliency generation, a conditional diffusion decoder iteratively denoises a noise-corrupted ground truth map, conditioned at each timestep on the fused audiovisual features through a hierarchical decoder that enforces spatio-temporal coherence via multiscale refinement. Extensive experiments demonstrate that our model outperforms state of the art methods, achieving individual improvements of up to 11.52% (CC), 20.04% (SIM), and 3.79% (NSS) across evaluation metrics over DiffSal on the AVAD datasetItem A Structured Framework for Email Forensic Investigations(Addis Ababa University, 2025) Biruk Bekele; Henok Mulugeta (PhD)Email forensics investigations become vital regarding legal, cybersecurity, and corporate challenges. However, most of the existing frameworks are suffering from inefficiency problems, data integrity, and handling such diverse data sources with complexity, considering encrypted emails and metadata. This thesis applied the Design Science Methodology to develop a structured framework that enhanced efficiency and effectiveness in email forensic investigations. These specifically deal with data quality, diversity in data management, and integrity of evidence. Among others, one key component is case management, which systemizes and keeps track of the investigation from the very outset to the last step in an appropriate manner and ensures each step is conducted methodically. The framework comprises key phases: case management, governance, identification, preservation, classification, analysis, presentation and compliance that address critical challenges such as ensuring data quality, managing diverse data sources, and maintaining evidence integrity. Case management forms the core part of the proposed framework for organizing, tracking the investigation process from start to finish in order ensuring that evidence is handled properly, and all phases are executed in a systematic manner. It integrates open-source tools, case studies of different varieties, and best practices to be relevant to different real-world scenarios. The effectiveness of the artifact can also be demonstrated in practical application, performance being measured in terms of speed of investigation, data quality, accuracy, and user satisfaction, among other metrics. This research underscores that the suggested framework decreases the time of investigation, reduces the rate of errors, increases the quality of data management, and guarantees the effective access of various data sources. This thesis contributes on both practical and theoretical levels, guiding practitioners and researchers comprehensively in the area of digital forensics to bring current email forensic investigations into a more efficient, accountable, and adaptable condition.Item Cybersecurity Incident Management Framework for Smart Grid Systems in Ethiopia(Addis Ababa University, 2024-06) Getinet Admassu; Henock Mulugeta (ጵህD)Merging OT and IT into smart grid systems brought along new advantages. Smart grids will be able to use this amalgamation to manage energy generation and transmission with minimal loss of energy, a factor that results in high efficiency. Besides that, integrating IT and OT into the smart grid presents real-time infrastructure management monitoring. On the other hand, this digital change subjected smart grids to many cybersecurity threats. This will be achieved by developing and implementing stable cybersecurity incident management systems to secure key infrastructures. Based on evidence from existing literature and expert judgments, this paper enumerates the principal challenges power utilities face in managing cybersecurity incidents. Then, it outlines a comprehensive cybersecurity incident management framework. This framework will, hence, enable power utilities to take on an active role and deal with relevant powers regarding cybersecurity incidents. Also, the model ensures that cybersecurity, concerning all strategic, engineering, procurement, construction, and operational aspects and involving all parties and resources concerned, is put together systematically. The underlying design science qualitative approach facilitated the development of this framework. It organizes sophisticated threat detection techniques and counter-threat strategies and correlates with Risk Management, Threat Analysis, Security Controls, Operational Models, and Management. They also involve real-time network traffic and system log monitoring, anomaly detection algorithms, intrusion detection, and prevention systems. Power utilities will significantly improve the ability to effectively detect and respond to cybersecurity-related events. The following threat scenarios, including organized DDoS and ransomware attacks as a taxonomy against the various components of the proposed framework, show how these smart grid technologies mentioned above can be used to develop effective solutions in response to cyber security incidents. It is indeed a systematic framework; it gives good advice. The recommendations will target particular challenge areas within the electric power industry and underpin its cybersecurity posture, with a view that our critical energy infrastructure will be reliable and capable of being counted upon in grace. This research encourages sustainable development and social welfare by resilience in cybersecurity for smart grid systems.Item Framework for PKI Implementation: Optimizing Project Management in Ethiopia(Addis Ababa University, 2024-09) Binyam Ayele; Henock Mulugeta (PhD)In today's increasingly digital world, the security of online communications and transactions is paramount. Public Key Infrastructure (PKI) has emerged as a cornerstone technology for ensuring secure, authenticated, and confidential digital interactions. However, the implementation of PKI projects remains challenging due to its inherent complexities, including certificate management, key distribution, and system integration, National legal framework contradictions & Limitations, lack of interoperability. The lack of a standardized implementation framework further exacerbates these challenges, leading to inconsistent and often flawed deployments that fail to leverage the full potential of PKI. This study investigates the importance of optimizing a PKI Project implementation framework that support the establishment of a national or organizational PKI project at national or organizational level by developing a comprehensive framework that mitigate PKI project implementation challenges. The study seeks to address the critical need for a comprehensive PKI Project Implementation Framework that can guide organizations in navigating the complexities of PKI deployment. The problem under investigation is the absence of standardized and generic framework and best practices for PKI implementation, which has resulted in varied levels of security and effectiveness across different sectors. The study aims to develop a framework that is adaptable to diverse organizational contexts, ensuring that PKI systems are implemented in a manner that is both secure and scalable. To achieve this goal, a systematic literature review (SLR) methodology will be employed as the primary research method. The SLR will systematically identify, evaluate, and synthesize existing research on PKI implementation, focusing on the challenges, best practices, and potential solutions proposed in the literature. By analyzing a wide range of studies, the SLR will provide a comprehensive understanding of the current state of PKI implementation and identify gaps that the proposed framework can address. This method will ensure a rigorous and evidence-based approach to the development of the PKI Project Implementation Framework. This research focused on developing a PKI implementation framework that assist PKI project management. A case study and Key Performance Indictor (KPI) is incorporated to evaluate the proposed framework. As a direct outcome of this study, stakeholders who have plans to implement PKI within Ethiopia or other country will obtain a proactive understanding of potential implementation considerations that should be taken.Item Leveraging Intel SGX and Hybrid Design for Secure National ID Systems(Addis Ababa University, 2025-01) Tesfalem Fekadu; Sileshi Demesie (PhD)Globally, 1.1 billion individuals, including 21 million refugees, lack proof of legal identity, disproportionately affecting children and women in rural areas of Asia and Africa. Without official identification, access to essential services such as education, healthcare, banking, and public distribution systems becomes nearly impossible. The increasing reliance on digital identity management systems demands robust security measures to safeguard sensitive personal data. The Modular Open-Source Identity Platform (MOSIP) is a widely adopted solution due to its flexibility and scalability. However, protecting sensitive data during National ID enrollment, registration, and authentication processes remains a significant challenge. Specifically, decrypting biometric data before feature comparison in server environments exposes this data to critical vulnerabilities, increasing the risk of potential attacks. The reliance on software-based Software Development Kits (SDKs) for biometric matching exacerbates the issue, as these SDKs often operate alongside other software modules, expanding the attack surface. Software-based approaches are inherently risky due to the high likelihood of exploitable bugs, which attackers can use to compromise data integrity or gain unauthorized access. This study addresses these security challenges by integrating Trusted Execution Environments (TEEs) to enhance data protection during processing. A hybrid architecture is proposed, incorporating an SGX-based solution named SGX-BioShield to improve the security and hybrid architecture for performance enhancement. A prototype of the proposed security solution has been developed and tested, demonstrating that SGX-BioShield significantly reduces the risk of unauthorized access and data breaches by isolating sensitive operations within a hardware-protected environment. Intel SGX ensures that data remains secure even if the operating system or hypervisor is compromised. This research contributes to the field of identity management by presenting a novel approach to securing platforms like MOSIP. It provides practical insights into improving data security and overall system performance through the implementation of a hybrid architecture in digital identity systems.Item Modular Federated Learning for Non-IID Data(Addis Ababa University, 2025-06) Samuel Hailemariam; Beakal Gizachew (PhD)Federated Learning (FL) promises privacy-preserving collaboration across distributed clients but is hampered by three key challenges: severe accuracy degradation under non-IID data, high communication and computational demands on edge devices, and a lack of built-in explainability for debugging, user trust, and regulatory compliance. To bridge this gap, we propose two modular FL pipelines—SPATL-XL and SPATL-XLC—that integrate SHAP-driven pruning with, in the latter, dynamic client clustering. SPATL-XL applies SHAP-based pruning to the largest layers, removing low-impact parameters to both reduce model size and sharpen interpretability, whereas SPATL-XLC further groups clients via lightweight clustering to reduce communication overhead and smooth convergence in low-bandwidth, high-client settings. In experiments on CIFAR-10 and Fashion-MNIT over 200 communication rounds under IID and Dirichlet non-IID splits, our pipelines lower per-round communication to 13.26 MB, speed up end-to-end training by 1.13×, raise explanation fidelity from 30–50% to 89%, match or closely approach SCAFFOLD’s 70.64% top-1 accuracy (SPATL-XL: 70.36%), and maintain stable clustering quality (Silhouette, CHI, DBI) even when only 40–70% of clients participate. These results demonstrate that combining explainability-driven pruning with adaptive clustering yields practical, communication-efficient, and regulation-ready FL pipelines that simultaneously address non-IID bias, resource constraints, and transparency requirements.Item Optimizing Explainable Deep Q-Learning via SHAP, LIME, & Policy Visualization(Addis Ababa University, 2025-06) Tesfahun Yemisrach; Beakal Gizachew (PhD); Natnael Argaw (PhD) Co-AdvisorReinforcement learning (RL) has demonstrated remarkable promise in sequential decision-making tasks; however, its interpretability issues continue to be a hindrance in high-stakes domains that demand regulatory compliance, transparency, and trust. Posthoc explainability has been investigated in recent research using techniques like SHAP and LIME; however, these methods are frequently isolated from the training process and lack cross-domain evaluation. In order to fill this gap, we propose an explainable Deep Q-Learning (DQL) framework that incorporates explanation-aligned reward shaping and model-agnostic explanation techniques into the agent’s learning pipeline. The framework exhibits broad applicability as it is tested in both financial settings and traditional control environments. According to experimental findings, the explainable agent continuously performs better than the baseline in terms of explanation fidelity, average reward, and convergence speed. In CartPole, the agent obtained a LIME fidelity score of 87.2% versus 63.5% and an average reward of 190 versus 130 for the baseline. It produced an 89.10% win ratio, a Sharpe Ratio of 0.4782, and a return of 154.32% in the financial domain. The development of transparent and reliable reinforcement learning systems is aided by these results, which demonstrate that incorporating explainability into RL enhances interpretability as well as stability and performance across domains.Item Provenance Blockchain with Predictive Auditing Framework for Mitigating Cloud Manufacturing Risks in Industry 4.0(Addis Ababa University, 2025-06) Mifta Ahmed; Gouveia , Luis Borges (PhD); Elefelious Getachew (PhD)Cloud manufacturing is an evolving concept that enables various manufacturers to connect and address shared demand streams regardless of their geographical location. Although this transformation facilitates operational flexibility and resource optimization, it concurrently introduces critical challenges related to continuous visibility, traceability, and proactive security management within Industrial Internet of Things (IIoT)-enabled cloud manufacturing environments. Notably, the absence of real-time insights into device states and operational behaviors increases susceptibility to unauthorized access, latent security breaches, and operational disruptions, whereas existing blockchMLn-based solutions predominantly emphasize initial authentication and transactional integrity but lack mechanisms for ongoing device verification and continuous provenance tracking. Simultaneously, artificial intelligence (ML)-driven predictive auditing techniques have evolved in isolation, without harnessing the immutability, accountability, and policy enforcement capabilities afforded by blockchMLn technology. This fragmentation results in limited traceability and weakened system integrity, particularly in dynamic IIoT ecosystems, where timely data-driven decision making is imperative. This study MLms to address these gaps through three primary objectives: (i) optimize blockchMLn architectures to support continuous monitoring, traceability, and visibility in IIoT environments; (ii) develop and integrate ML-based predictive auditing mechanisms with blockchMLn to proactively detect and mitigate security risks in IIoT-based cloud manufacturing; and (iii) evaluate the effectiveness of the integrated blockchMLn and predictive auditing framework in addressing security, traceability, and real-time visibility challenges while mMLntMLning operational continuity. Adopting a Design Science Research Methodology (DSRM), this study develops and rigorously evaluates an integrated framework that combines dynamic blockchMLn-based provenance logging with ML-driven anomaly detection. The experi-mental evaluation was conducted using a scenario-based experimental setup in a cloud simulated multizone warehouse environment involving IIoT-enabled forklifts that operated under three behavioral scenarios: fully compliant, partially compliant, and rogue. Key evaluation metrics included validation accuracy 94%, prediction precision (up to 99.7%, F1 score 90%, traceability rate (ranging from 82% to 85%, average system latency (3.95 seconds), transaction rejection rate (100% for rogue inputs), and operational uptime (100% resilience with no downtime). The results substantiate the ability of the framework to provide real-time responsiveness, robust security, and continuous traceability while mMLntMLning operational continuity, even under adversarial or non-compliant conditions. This study contributes to the body of knowledge by bridging the gap between blockchMLn technology and ML in IIoT-enabled cloud-manufacturing security. These findings have practical implications for the secure deployment of IIoT technologies across smart manufacturing ecosystems.Item Collatz Sequence-Based Weight Initialization for Enhanced Convergence and Gradient Stability in Neural Networks(Addis Ababa University, 2025-06) Zehara Eshetu; Beakal Gizachew (PhD); Adane Letta (PhD)Deep neural networks have achieved state-of-the-art performance in tasks ranging from image classification to regression. However, their training dynamics remain highly sensitive to weight initialization. This is a fundamental factor that influences both convergence speed and model performance. Traditional initialization methods such as Xavier and He rely on fixed statistical distributions and often underperform when applied across diverse architectures and datasets. This study introduces Collatz Sequence-Based Weight Initialization, a novel deterministic approach that leverages the structured chaos of Collatz sequences to generate initial weights. CSB applies systematic transformations and scaling strategies to improve gradient flow and enhance training stability. It is evaluated against seven baseline initialization techniques using a CNN on the CIFAR-10 dataset and an MLP on the California Housing dataset. Results show that CSB consistently outperforms conventional methods in both convergence speed and final performance. Specifically, CSB achieves up to 55.03% faster convergence than Xavier and 18.49% faster than He on a 1,000-sample subset, and maintains a 20.64% speed advantage over Xavier on the full CIFAR-10 dataset. On the MLP, CSB shows a 58.12% improvement in convergence speed over He. Beyond convergence, CSB achieves a test accuracy of 78.12% on CIFAR-10, outperforming Xavier by 1.53% and He by 1.34%. On the California Housing dataset, CSB attains an R score of 0.7888, marking a 2.35% improvement over Xavier. Gradient analysis reveals that CSB-initialized networks maintain balanced L2 norms across layers, effectively reducing vanishing and exploding gradient issues. This stability contributes to more reliable training dynamics and improved generalization.However, this study is limited by its focus on shallow architectures and lacks a robustness analysis across diverse hyperparameter settings.Item A Lightweight Model for Balancing Efficiency and Precision in PEFT-Optimized Java Unit Test Generation(Addis Ababa University, 2025-06) Sintayehu Zekarias; Beakal Gizachew (PhD)Software testing accounts for nearly 50% of development costs while being critical for ensuring software quality, creating an urgent need for more efficient testing solutions. This work addresses this challenge by developing an innovative framework that combines Parameter-Efficient Fine-Tuning (PEFT) techniques with transformer models to automate Java unit test generation. The study systematically evaluates three PEFT approaches—Low-Rank Adaptation(LoRA), Quantized LoRA (QLoRA), and Adapters—through a rigorous methodology involving specialized assertion pretraining using the Atlas dataset (1.2M Java method-assertion pairs), PEFT optimization, targeted fine-tuning with Methods2Test (780K test cases), and comprehensive validation on the unseen Defects4J benchmark to assess cross-project generalization. Experimental results demonstrate that LoRA maintains 92% full fine-tuning effectiveness (38.12% correct test cases) while reducing GPU memory requirements by 17% and improving generation speed by 23%. QLoRA achieves even greater efficiency with 36% memory reduction, making it particularly suitable for resource-constrained environments. However, evaluation on Defects4J, assessing cross-project generalization, showed that LoRA achieved 43.1% correct assertions (compared to a full fine-tuning baseline of 46.0% on Defects4J), indicating a minor reduction in generalization alongside the efficiency gains. Despite these promising advancements, it’s important to note that our findings are currently contextualized by the Java programming language and the specific datasets employed in our experiments. These findings provide valuable insights for the implementation of AI-powered test generation in practice, highlighting both the potential of PEFT techniques to reduce testing costs and the need for further research to address the nuances of maintaining test quality across diverse projects.Item Framework for Identifying Forensic Artifacts from Ride-hailing Android Applications(Addis Ababa University, 2025-03) Munir Kemal; Fitsum Assamnew (PhD)Different services are offered through our mobile devices as a result of the increasing usage of smartphones in this world. One of these services is the ride-hailing service in which the taxi transportation service is managed from a common operation center with the help of driver and passenger applications that the end users have installed on their smartphones. In our country, Ethiopia, there are many companies that offer this service, such as Ride, Feres, ZayRide, Seregela, Safe, Taxiye, and others. Today, many crimes such as theft, murder, etc. are committed against drivers or riders while working or using this transportation service in Ethiopia. Current research focuses mainly on the forensic investigation of social networks and banking applications. A research by K. Kiptoo proposed a forensic investigation framework to identify forensic artifacts from Android on-demand ride applications such as Uber, Little and Bolt that operate in Kenya. In this research, we propose a forensic framework by customizing the existing framework proposed by K. Kiptoo to enhance the identification of forensic artifacts from Android based ride-hailing applications after experimentation with ride-hailing applications such as Ride, Zayride and Feres. The proposed forensic framework for ride-hailing applications involves six phases: Collection, Setting up and Configuration, Extraction and Preservation, Application Database Location, Examination and Analysis, and finally Reporting. While experimenting, we were able to recover valuable artifacts such as passenger profile information, passenger device details, location data, time information, and driver-related data from ride-hailing applications, which are crucial digital evidence in the investigation of digital crimes. This research also investigated the level of role and the challenges of using digital forensic evidence to close a criminal case by Ethiopian law enforcement agencies using a specially designed questionnaire distributed to them. The research findings show that even though its role as evidence usage is increasing, we were able to identify major issues such as legal and procedural inconsistencies, lack of expertise, resource limitation, and lack of clear forensic standards that may hinder the use of digital evidence obtained from digital systems such as ride-hailing applications in a digital world full of complex digital crimes.Item Lightweight IOT Security With Deep Learning-Driven Biometric for Human Authentication(Addis Ababa University, 2025-02) Girma Alemu; Henock Mulugeta (PhD)Now today the number of Internet of Things (IoT) devices increases in number, as the number of IoT device increase there is also a rise in risk with these IoT devices. IoT devices have a great impact on daily lives of human being. Huge number of data can be stored, transmitted and used through IoT devices. Some of the data are very sensitive which are vulnerable to different attacks. To protect IoT devices from these attacks, different counter measures are conduct through previous researches. Conventional biometric authentication methods like possession-based (tokens) and knowledge-based (passwords/PINs) are used to tackle the problem of access control which are prone to loss, duplication, guesswork, and forgetfulness. Similarly, single-modality biometric identification—like fingerprint or facial recognition—is insufficient due to its susceptibility to spoofing attacks. When merging and comparing large amounts of biometric data, it is important to consider variations in the quantity and caliber of data sources, even though multi-biometric systems improve security. Our proposed solution to these problems combines a lightweight deep learning algorithm designed for Internet of Things devices with multimodal biometrics that are using fingerprint and face. By conducting an experiment on both training and unseen datasets, the model demonstrated good classification ability with 82.5% validation accuracy and 99.3% training accuracy. The suggested solution addresses the security issues of IoT devices through modeling and experimental validation. Through hands-on testing, we assessed the system's performance, and the outcomes showed a robust IoT security solution. In the end, the combination of deep learning algorithms and dual biometric modalities has greatly improved secure authentication procedures for IoT applications. At the end, secure authentication techniques for IoT applications have advanced significantly with the combination of deep learning algorithms and dual biometric modalities.Item Investigating Malicious Capabilities of Android Malwares that Utilize Accessibility Services(Addis Ababa University, 2025-02) Tekeste Fekadu; Fitsum Assamnew (PhD)The Android accessibility service provides a range of powerful capabilities. These include observing user actions, reading on-screen content, and executing actions on behalf of the user. Although these features are designed to enhance the user experience for individuals with disabilities, they introduce design vulnerabilities that make the accessibility service susceptible to malicious exploitation. This research investigates how Android malware leverages accessibility services for malicious purposes. By analyzing a dataset of malicious applications, we identified common patterns of accessibility service abuse and developed a machine learning-based detection approach using TinyBERT and XGBoost models. We first manually compiled a base dataset of 134 accessibility service event patterns comprising source and sink API calls. These patterns were labeled according to specific malicious functionalities: BlockAccess, ManipulateUI, and ContentEavesdrop. To address data limitations, we generated callgraph from 121 malware samples using Flow- Droid taint analysis and applied agglomerative clustering and fuzzy matching, ultimately expanding the dataset size to 1,497 patterns. Our classification experiments compared the performance of TinyBERT, a transformer-based model, and XGBoost, a gradient-boosted decision tree model, in classifying malicious functionalities. Results show TinyBERT’s outstanding performance, achieving an accuracy of 97.7% and an F1 score of 97.6% over ten-fold cross-validation, compared to XGBoost’s 90.4% accuracy and 90.0% F1 score. This study demonstrates the potential of transformer-based models in capturing sequential dependencies and contextual characteristics in API call patterns, enabling robust detection of accessibility service misuse. Our findings contribute a novel approach to detecting malicious behavior in Android malware and a valuable dataset that may aid similar research.Item Lightweight Intrusion Detection System for IoT with Improved Feature Engineering and Advanced Dynamic Quantization(Addis Ababa University, 2024-11) Semachew Fasika; Henock Mulugeta (PhD)In recent years, the proliferation of Internet of Things (IoT) devices and applications has experienced a significant surge globally, owing to their inherent advantages in enhancing both business and industrial landscapes, as well as facilitating improvements in individuals’ daily routines. Nevertheless, IoT devices are not immune to malicious attacks, which results potential negative consequences and malfunctioning of IoT devices, therefore, attack detection and classification becomes an important issue in IoT devices. This research proposes a lightweight hybrid deep learning model (DNN-BiLSTM) to detect and classify attacks in an IoT system with improved feature engineering and advanced quantization. Although leveraging hybrid deep learning model which combines DNN alongside BiLSTM, facilitates the extraction of intricate network features in a nonlinear and bidirectional manner, aiding in the identification of complex attack patterns and behaviors, its implementation on IoT devices remains challenging. To mitigate the constraints inherent in IoT devices, this research incorporates improved feature engineering, specifically Redundancy-Adjusted Logistic Mutual Information Feature Selection (RAL-MIFS) combined with a two-stage IPCA algorithm. Additionally, advanced quantization (QAT + PTDQ) techniques, alongside advanced Optuna for hyperparameter optimization, are utilized to enhance computational efficiency without compromising detection accuracy. Experimental evaluations were conducted on the CIC IDS2017 and CICIoT2023 datasets to assess the performance of a quantized DNN-BiLSTMQ model. The model demonstrated superior detection accuracy & computational efficiency compared to state-ofthe- art methods. On the CIC IDS2017 dataset, it achieved a detection accuracy of 99.73% with a model size of 25.6 KB, while on the CICIoT2023 dataset, it achieved a detection accuracy of 93.95% with a model size of 31.3 KB. These results highlight the potential of the quantized DNN-BiLSTMQ model for efficient and accurate cyber attack detection on IoT.Item A Hybrid Approach to Strike a Balance of Sampling Time and Diversity in Floorplan Generation(Addis Ababa University, 2024-05) Azmeraw Bekele; Beakal Gizachew. (PhD)Generative models have revolutionized various industries by enabling the generation of diverse outputs, and floorplan generation is one such application. Different methods, including GANs, diffusion models, and others, have been employed for floorplan generation. However, each method faces specific challenges, such as mode collapse in GANs and sampling time in diffusion models. Efforts to mitigate these issues have led to the exploration of techniques such as regularization methods, architectural modifications, knowledge distillation, and adaptive noise schedules. However, existing methods often struggle to effectively balance both sampling time and diversity simultaneously. In response, this thesis proposes a novel hybrid approach that amalgamates GANs and diffusion models to address the dual challenges of diversity and sampling time in floorplan generation. To the best of our knowledge, this work is the first to introduce a solution that not only balances sampling time and diversity but also enhances the realism of the generated floorplans. The proposed method is trained on the RPLAN dataset and combines the advantages of GANs and diffusion models while incorporating different techniques such as regularization methods and architectural modifications to optimize our objectives. To evaluate the effect of the denoising step, we experimented with different time steps and found better diversity results at T=20. The diversity of generated floorplans was evaluated using FID across the number of rooms, and the results of our model demonstrate an average 15.5% improvement over the state-of-the-art houseDiffusion model. Additionally, it reduces the time required for generation by 41% compared to the housediffusion model. Despite these advancements, it is acknowledged that the proposed research may encounter limitations in generating non-Manhattan floorplans and when dealing with complex layouts.Item Enhancing Neural Machine Translation Through Incorporation of Unsupervised Language Understanding and Generation Techniques: The Case of English-Afaan Oromo Translation(2024-05) Chala Bekabil; Fantahun Bogale (PhD)Breaking down language barriers is a paramount pursuit in the realm of Artificial Intelligence. Machine Translation (MT), a domain within Natural Language Processing (NLP), holds the potential to bridge linguistic gaps and foster global communication. Enhancing cross-cultural communication through MT will be realized only if we succeed in developing accurate and adaptable techniques which in turn demands adequate availability of linguistic resources. Unluckily, under-resourced languages face challenges due to limited linguistic resources and sparse parallel data. Previous studies tried to solve this problem by using monolingual pre-training techniques. However, such studies solely rely on either Language Understanding (LU) or Language Generation (LG) techniques resulting in skewed translation. This study aims to enhance translation outcomes beyond the capabilities of previous studies by marrying the concepts of LU and LG and hence boosting the quality of MT in both directions. Our proposed model, the BERT-GPT incorporated Transformer, combines SOTA language models, BERT and GPT, trained on monolingual data into the original Transformer model and demonstrates substantial improvements. Experimental results shows that translation quality leaps forward, as evidenced by a significant increase in the BLEU score reaching 42.09, from the baseline score of 35.75 for English to Afaan Oromo translation, and 44.51 from the baseline score of 40.35 for Afaan Oromo to English translation on test dataset. Notably, our model unveils a deep understanding of Afaan Oromo’s linguistic nuances, resulting in translations that are precise, contextually appropriate, and faithful to the original intent. By leveraging the power of unsupervised pre-training and incorporation of unsupervised LU and LG techniques to the transformer model, we pave the way for enhanced cross-cultural communication, advanced understanding and inclusivity in our interconnected world.Item Integrating Hierarchical Attention and Context-Aware Embedding For Improved Word Sense Disambiguation Performance Using BiLSTM Model(Addis Ababa University, 2024-06) Robbel Habtamu; Beakal Gizachew (PhD)Word Sense Disambiguation is a fundamental task in natural language processing, aiming to determine the correct sense of a word based on its context. Word sense ambiguity, such as polysomy, and semantic ambiguity poses significant challenges in the task of WSD. Recent advancements in research have focused on utilizing deep contextual models to address these challenges. However, despite this positive progress, semantical ambiguity remains a challenge, especially when dealing with polysomy words. This research introduces a new approach that integrates hierarchical attention mechanisms and BERT embeddings to enhance WSD accuracy. Our model, incorporating both local and global attention, demonstrates significant improvements in accuracy, particularly in complex sentence structures. To the best of our knowledge, our model is the first to incorporate hierarchical attention mechanisms integrated with contextual embedding. This integration enhances the model’s performance, especially when combined with the contextual model BERT as word embeddings. Through extensive experimentation, we demonstrate the effectiveness of our proposed model. Our research highlights several key points. First, we showcase the effectiveness of hierarchical attention and contextual embeddings for WSD. Second, we adapted the model to Amharic word sense disambiguation, demonstrating strong performance. Despite the lack of a standard benchmark dataset for Amharic WSD, our model performs 92.4% Accuracy on a self-prepared dataset. Third, our findings emphasize the importance of linguistic features in capturing relevant contextual information for WSD. We also note that Part-of-Speech (POS) tagging has a less significant impact on our English data, while word embeddings significantly impact model performance. Furthermore, applying local and global attention leads to better results, with local attention at the word level showing promising results. Overall, our model achieves state-of-the-art results in WSD within the same framework. Our results demonstrate a significant improvement of 1.8% to 2.9% F1 score over baseline models. We also achieve state-of-the-art performance on the Italian language by achieving 0.5% to 0.7% F1 score over baseline papers. These findings underscore the importance of considering contextual information in WSD, paving the way for more sophisticated and context-aware natural language processing systems.Item Reinforcement Learning Based Layer Skipping Vision Transformer for Efficient Inference(Addis Ababa University, 2023-05) Amanuel Negash; Sammy Assefa (PhD)Recent advancements in language and vision tasks owe their success largely to the Transformer architecture. However, the computational requirements of these models have limited their applicability in resource-constrained environments. To address this issue, various techniques, such as Weight pruning, have been proven effective in reducing the deployment cost of such models. Additionally, methods tailored just for transformers, such as linear self-attention and token early exiting, have shown promise in making transformers more cost-effective. Nevertheless, these techniques often come with drawbacks such as decreased performance or additional training costs. This thesis proposes a layer-skipping dynamic vision transformer (ViT) network that skips layers depending on the given input based on decisions made by a reinforcement learning agent (RL). To the best of our knowledge, this work is the first to introduce such a model that not only significantly reduces the computational demands of transformers, but also improves performance. The proposed technique is extensively tested on various model sizes and three standard benchmarking datasets: CIFAR-10, CIFAR-100, and Tiny-ImageNet. First, we show that the dynamic models improve performance when compared to their state-of-the-art static counterparts. Second, we show that in comparison to these static models, they achieve an average inference speed boost of 53% across different model sizes, datasets, and batch sizes. Similarly, the technique lowers working space memory consumption by 53%, enabling larger input processing at a time without imposing an accuracy-speed trade-off. In addition, these models achieve very high accuracy when tested in transfer learning scenarios. We then show that, although these models have high accuracy, they can be optimized even more through post-training using genetic algorithms (NSGA-II). As such, we propose the joint RL-NSGA-II optimization technique, where the GA is aware of the dynamics of skipping through the RL reward. These optimized models achieve competitive performance compared to the already high-performing dynamic models while reducing the number of layers by 33%. In real-world applications, the technique translates to an average of 53% faster throughput, reduced power consumption, or lower computing costs without loss of accuracy.Item Improving Knowledge Distillation For Smaller Networks Via Reducing Regularization(Addis Ababa University, 2023-05) Mubarek Mohammed; Beakal Gizachew(PhD)Knowledge Distillation (KD) is one of the numerous model compression methods that help reduce the size of models to address problems that come with large models. In KD a bigger model termed the teacher, transfers its knowledge, referred to as the Dark Knowledge (DK), to a smaller network usually termed the student network. The key part of the mechanism is a Distillation Loss added in the loss term that plays adual role: one as a regularizer and one as a carrier of the categorical information to be transferred from the teacher to the student which is sometimes termed DK [1]. It is known that the conventional KD does not produce high compression rates. Existing works focus on improving the general mechanism of KD and neglect the strong regularization entangled with the DK in the KD mechanism. The impact of reducing the regularization effect that comes entangled with DK remained unexplored. This research proposes a novel approach, which we termed Dark Knowledge Pruning (DKP), to lower this regularization effect in the form of a newly added term on the Distillation Loss. Experiments done across representative and benchmark datasets and models demonstrate the effectiveness of the proposed mechanism. We find that it can help improve the performance of a student against the baseline KD even in extreme compression, a phenomenon normally considered not well suited for KD. An increment of 3% is achieved in performance with a less regularized network on CIFAR 10 dataset with ResNet teacher and student models against the baseline KD. It also improves the current reported smallest result on ResNET 8 on the CIFAR-100 dataset from 61.82% to 62.4%. To the best of our knowledge, we are also the first to study the effect of reducing the regularizing nature of the distillation loss in KD when distilling into very small students. Beyond bridging Pruning and KD in an entirely new way, the proposed approach improves the understanding of knowledge transfer, helps achieve better performance out of very small students via KD, and poses questions for further research in the areas of model efficiency and knowledge transfer. Furthermore, it is model agnostic and showed interesting properties, and can potentially be extended for other interesting research such as quantifying DK.