School of Information Technology and Engineering

Permanent URI for this collection

http://etd.aau.edu.et/handle/123456789/460

Browse

Now showing 1 - 9 of 9

A Lightweight Model for Balancing Efficiency and Precision in PEFT-Optimized Java Unit Test Generation
(Addis Ababa University, 2025-06) Sintayehu Zekarias; Beakal Gizachew (PhD)
Software testing accounts for nearly 50% of development costs while being critical for ensuring software quality, creating an urgent need for more efficient testing solutions. This work addresses this challenge by developing an innovative framework that combines Parameter-Efficient Fine-Tuning (PEFT) techniques with transformer models to automate Java unit test generation. The study systematically evaluates three PEFT approaches—Low-Rank Adaptation(LoRA), Quantized LoRA (QLoRA), and Adapters—through a rigorous methodology involving specialized assertion pretraining using the Atlas dataset (1.2M Java method-assertion pairs), PEFT optimization, targeted fine-tuning with Methods2Test (780K test cases), and comprehensive validation on the unseen Defects4J benchmark to assess cross-project generalization. Experimental results demonstrate that LoRA maintains 92% full fine-tuning effectiveness (38.12% correct test cases) while reducing GPU memory requirements by 17% and improving generation speed by 23%. QLoRA achieves even greater efficiency with 36% memory reduction, making it particularly suitable for resource-constrained environments. However, evaluation on Defects4J, assessing cross-project generalization, showed that LoRA achieved 43.1% correct assertions (compared to a full fine-tuning baseline of 46.0% on Defects4J), indicating a minor reduction in generalization alongside the efficiency gains. Despite these promising advancements, it’s important to note that our findings are currently contextualized by the Java programming language and the specific datasets employed in our experiments. These findings provide valuable insights for the implementation of AI-powered test generation in practice, highlighting both the potential of PEFT techniques to reduce testing costs and the need for further research to address the nuances of maintaining test quality across diverse projects.
Attribution Methods for Explainability of Predictive and Deep Generative Diffusion Models
(Addis Ababa University, 2025-06) Debela Desalegen; Beakal Gizachew (PhD)
As machine learning models grow in complexity and their deployment in high-stakes domains becomes more common, the demand for transparent and faithful explainability methods has become increasingly urgent. However, most existing attribution techniques remain fragmented, targeting either predictive or generative models, and lack a hybrid approach that offers coherent interpretability across both domains. While predictive modeling faces challenges such as faithfulness, sparsity, stability, and reliability, generative diffusion models introduce additional complexity due to their temporal dynamics, tokento- region interactions, and diverse architectural designs. This work presents a hybrid attribution method designed to improve explainability for both predictive black-box models and generative diffusion models. We propose two novel methods: FIFA (Firefly-Inspired Feature Attribution), an optimization-based approach for sparse and faithful attribution in tabular models; and DiffuSAGE (Diffusion Shapley Attribution with Gradient Explanations), a temporally and spatially grounded method that attributes generated image content to individual prompt tokens using Aumann-Shapley values, Integrated Gradients, and cross-attention maps. FIFA applied to the Random Forest, XGBoost, CatBoost, and TabNet models in three benchmark datasets: Adult Income, Breast Cancer, and Diabetes, outperforming SHAP and LIME in key metrics: +6.24% sparsity, +9.15% Insertion AUC,-8.65% Deletion AUC, and +75% stability. DiffuSAGE evaluated on Stable Diffusion v1.5 trained on the LAION-5B dataset, yielding a 12.4% improvement in Insertion AUC and a 9.1% reduction in Deletion AUC compared to DF-RISE and DF-CAM. A qualitative user study further validated DiffuSAGE’s alignment with human perception. Overall, these contributions establish the first hybrid attribution methods for both predictive and\ generative models, addressing fundamental limitations in current XAI approaches and enabling more interpretable, robust, and human-aligned AI systems.
Collatz Sequence-Based Weight Initialization for Enhanced Convergence and Gradient Stability in Neural Networks
(Addis Ababa University, 2025-06) Zehara Eshetu; Beakal Gizachew (PhD); Adane Letta (PhD)
Deep neural networks have achieved state-of-the-art performance in tasks ranging from image classification to regression. However, their training dynamics remain highly sensitive to weight initialization. This is a fundamental factor that influences both convergence speed and model performance. Traditional initialization methods such as Xavier and He rely on fixed statistical distributions and often underperform when applied across diverse architectures and datasets. This study introduces Collatz Sequence-Based Weight Initialization, a novel deterministic approach that leverages the structured chaos of Collatz sequences to generate initial weights. CSB applies systematic transformations and scaling strategies to improve gradient flow and enhance training stability. It is evaluated against seven baseline initialization techniques using a CNN on the CIFAR-10 dataset and an MLP on the California Housing dataset. Results show that CSB consistently outperforms conventional methods in both convergence speed and final performance. Specifically, CSB achieves up to 55.03% faster convergence than Xavier and 18.49% faster than He on a 1,000-sample subset, and maintains a 20.64% speed advantage over Xavier on the full CIFAR-10 dataset. On the MLP, CSB shows a 58.12% improvement in convergence speed over He. Beyond convergence, CSB achieves a test accuracy of 78.12% on CIFAR-10, outperforming Xavier by 1.53% and He by 1.34%. On the California Housing dataset, CSB attains an R score of 0.7888, marking a 2.35% improvement over Xavier. Gradient analysis reveals that CSB-initialized networks maintain balanced L2 norms across layers, effectively reducing vanishing and exploding gradient issues. This stability contributes to more reliable training dynamics and improved generalization.However, this study is limited by its focus on shallow architectures and lacks a robustness analysis across diverse hyperparameter settings.
Ensemble Learning with Attention and Audio for Robust Video Classification
(Addis Ababa University, 2025-06) Dereje Tadesse; Beakal Gizachew (PhD)
The classification of video scenes is a fundamental task for many applications, such as content recommendation, indexing, and monitoring broadcasts. Current methods often depend on annotation-dependent object detection models, restricting their generalizability when working with different types of broadcast content, particularly cases where visual clues like logos or brands may not have clear definition or presence. This thesis is intended to address the problems associated with current methods through describing a two-stage classification framework that integrates both recognized and unheard information to improve accuracy and robustness of classification. The first stage utilizes a detection model based on pretrained models of object detection and enhanced spatial attention to detect physical visual markers (such as program logo or branded intro sequences) in video program content. However, individual visual indicators are sometimes not robust enough to add confidence, especially in content such as situational comedies where logos do not exist. The second stage describes a twostaged, early fusion ensemble presentation of convolutional neural network-based visual features and recurrent neural network-based audio features. The two modes each use some complementary properties, thus could be used for more robust classification. Experiments were completed with a dataset of approximately 19 hours of content from 13 TV programs across three channels, all focused on intro, credit, and outro segments. The visual-only model achieved 96.83% accuracy, while the audio-only model achieved 90.91%. The proposed early fusion ensemble method achieved 94.13% accuracy and revealed more robustness in difficult situations when quality of visual data was low or ambiguous. Ablation studies contrasting model performance with different ensemble methods confirmed the greater utility of early fusion and its capturing of cross-modal interactions. The system is also designed to be computationally efficient allowing for operationalization in broadcast media settings. This work, while also demonstrating methodical video classification ability, fills a significant gap for scalable and generalizable video classification through the integration of multimodal learning, especially with large amounts of uncontrollable annotations which has previously been a hurdle to more typical models.
Ensemble Learning with Attention and Audio for Robust Video Classification
(Addis Ababa University, 2025-06) Dereje Tadesse; Beakal Gizachew (PhD)
The classification of video scenes is a fundamental task for many applications, such as content recommendation, indexing, and monitoring broadcasts. Current methods often depend on annotation-dependent object detection models, restricting their generalizability when working with different types of broadcast content, particularly cases where visual clues like logos or brands may not have clear definition or presence. This thesis is intended to address the problems associated with current methods through describing a two-stage classification framework that integrates both recognized and unheard information to improve accuracy and robustness of classification. The first stage utilizes a detection model based on pretrained models of object detection and enhanced spatial attention to detect physical visual markers (such as program logo or branded intro sequences) in video program content. However, individual visual indicators are sometimes not robust enough to add confidence, especially in content such as situational comedies where logos do not exist. The second stage describes a twostaged, early fusion ensemble presentation of convolutional neural network-based visual features and recurrent neural network-based audio features. The two modes each use some complementary properties, thus could be used for more robust classification. Experiments were completed with a dataset of approximately 19 hours of content from 13 TV programs across three channels, all focused on intro, credit, and outro segments. The visual-only model achieved 96.83% accuracy, while the audio-only model achieved 90.91%. The proposed early fusion ensemble method achieved 94.13% accuracy and revealed more robustness in difficult situations when quality of visual data was low or ambiguous. Ablation studies contrasting model performance with different ensemble methods confirmed the greater utility of early fusion and its capturing of cross-modal interactions. The system is also designed to be computationally efficient allowing for operationalization in broadcast media settings. This work, while also demonstrating methodical video classification ability, fills a significant gap for scalable and generalizable video classification through the integration of multimodal learning, especially with large amounts of uncontrollable annotations which has previously been a hurdle to more typical models.
Integrating Hierarchical Attention and Context-Aware Embedding For Improved Word Sense Disambiguation Performance Using BiLSTM Model
(Addis Ababa University, 2024-06) Robbel Habtamu; Beakal Gizachew (PhD)
Word Sense Disambiguation is a fundamental task in natural language processing, aiming to determine the correct sense of a word based on its context. Word sense ambiguity, such as polysomy, and semantic ambiguity poses significant challenges in the task of WSD. Recent advancements in research have focused on utilizing deep contextual models to address these challenges. However, despite this positive progress, semantical ambiguity remains a challenge, especially when dealing with polysomy words. This research introduces a new approach that integrates hierarchical attention mechanisms and BERT embeddings to enhance WSD accuracy. Our model, incorporating both local and global attention, demonstrates significant improvements in accuracy, particularly in complex sentence structures. To the best of our knowledge, our model is the first to incorporate hierarchical attention mechanisms integrated with contextual embedding. This integration enhances the model’s performance, especially when combined with the contextual model BERT as word embeddings. Through extensive experimentation, we demonstrate the effectiveness of our proposed model. Our research highlights several key points. First, we showcase the effectiveness of hierarchical attention and contextual embeddings for WSD. Second, we adapted the model to Amharic word sense disambiguation, demonstrating strong performance. Despite the lack of a standard benchmark dataset for Amharic WSD, our model performs 92.4% Accuracy on a self-prepared dataset. Third, our findings emphasize the importance of linguistic features in capturing relevant contextual information for WSD. We also note that Part-of-Speech (POS) tagging has a less significant impact on our English data, while word embeddings significantly impact model performance. Furthermore, applying local and global attention leads to better results, with local attention at the word level showing promising results. Overall, our model achieves state-of-the-art results in WSD within the same framework. Our results demonstrate a significant improvement of 1.8% to 2.9% F1 score over baseline models. We also achieve state-of-the-art performance on the Italian language by achieving 0.5% to 0.7% F1 score over baseline papers. These findings underscore the importance of considering contextual information in WSD, paving the way for more sophisticated and context-aware natural language processing systems.
Modular Federated Learning for Non-IID Data
(Addis Ababa University, 2025-06) Samuel Hailemariam; Beakal Gizachew (PhD)
Federated Learning (FL) promises privacy-preserving collaboration across distributed clients but is hampered by three key challenges: severe accuracy degradation under non-IID data, high communication and computational demands on edge devices, and a lack of built-in explainability for debugging, user trust, and regulatory compliance. To bridge this gap, we propose two modular FL pipelines—SPATL-XL and SPATL-XLC—that integrate SHAP-driven pruning with, in the latter, dynamic client clustering. SPATL-XL applies SHAP-based pruning to the largest layers, removing low-impact parameters to both reduce model size and sharpen interpretability, whereas SPATL-XLC further groups clients via lightweight clustering to reduce communication overhead and smooth convergence in low-bandwidth, high-client settings. In experiments on CIFAR-10 and Fashion-MNIT over 200 communication rounds under IID and Dirichlet non-IID splits, our pipelines lower per-round communication to 13.26 MB, speed up end-to-end training by 1.13×, raise explanation fidelity from 30–50% to 89%, match or closely approach SCAFFOLD’s 70.64% top-1 accuracy (SPATL-XL: 70.36%), and maintain stable clustering quality (Silhouette, CHI, DBI) even when only 40–70% of clients participate. These results demonstrate that combining explainability-driven pruning with adaptive clustering yields practical, communication-efficient, and regulation-ready FL pipelines that simultaneously address non-IID bias, resource constraints, and transparency requirements.
Multimodal Unified Bidirectional Cross-Modal Audio-Visual Saliency Prediction
(Addis Ababa University, 2025-06) Tadele Melesse; Natnael Argaw (PhD); Beakal Gizachew (PhD)
Human attention in dynamic environments is inherently multimodal and is shaped by the interplay of auditory and visual cues. Although existing saliency prediction methods predominantly focus on visual semantics, they neglect audio as a critical modulator of gaze behavior. Recent audiovisual approaches attempt to address this gap but remain limited by temporal misalignment between modalities and inadequate retention of spatio-temporal information, which is key to resolving both the location and timing of salient events, ultimately yielding suboptimal performance. Inspired by recent breakthroughs in cross-attention transformers with convolutions for joint global-local representation learning and conditional denoising diffusion models for progressive refinement, we introduce a novel multimodal framework for bidirectional efficient audiovisual saliency prediction. It employs dual-stream encoders to process video and audio independently, coupled with separate efficient cross-modal attention pathways that model mutual modality influence: One pathway aligns visual features with audio features, while the other adjusts audio embeddings to visual semantics. Critically, these pathways converge into a unified latent space, ensuring coherent alignment of transient audiovisual events through iterative feature fusion. To preserve finegrained details, residual connections propagate multiscale features across stages. For saliency generation, a conditional diffusion decoder iteratively denoises a noise-corrupted ground truth map, conditioned at each timestep on the fused audiovisual features through a hierarchical decoder that enforces spatio-temporal coherence via multiscale refinement. Extensive experiments demonstrate that our model outperforms state of the art methods, achieving individual improvements of up to 11.52% (CC), 20.04% (SIM), and 3.79% (NSS) across evaluation metrics over DiffSal on the AVAD dataset
Optimizing Explainable Deep Q-Learning via SHAP, LIME, & Policy Visualization
(Addis Ababa University, 2025-06) Tesfahun Yemisrach; Beakal Gizachew (PhD); Natnael Argaw (PhD) Co-Advisor
Reinforcement learning (RL) has demonstrated remarkable promise in sequential decision-making tasks; however, its interpretability issues continue to be a hindrance in high-stakes domains that demand regulatory compliance, transparency, and trust. Posthoc explainability has been investigated in recent research using techniques like SHAP and LIME; however, these methods are frequently isolated from the training process and lack cross-domain evaluation. In order to fill this gap, we propose an explainable Deep Q-Learning (DQL) framework that incorporates explanation-aligned reward shaping and model-agnostic explanation techniques into the agent’s learning pipeline. The framework exhibits broad applicability as it is tested in both financial settings and traditional control environments. According to experimental findings, the explainable agent continuously performs better than the baseline in terms of explanation fidelity, average reward, and convergence speed. In CartPole, the agent obtained a LIME fidelity score of 87.2% versus 63.5% and an average reward of 190 versus 130 for the baseline. It produced an 89.10% win ratio, a Sharpe Ratio of 0.4782, and a return of 154.32% in the financial domain. The development of transparent and reliable reinforcement learning systems is aided by these results, which demonstrate that incorporating explainability into RL enhances interpretability as well as stability and performance across domains.

Browse

Browsing School of Information Technology and Engineering by Author "Beakal Gizachew (PhD)"

Results Per Page

Sort Options