School of Information Technology and Engineering
Permanent URI for this collection
Browse
Browsing School of Information Technology and Engineering by Author "Adane Letta (PhD)"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Attention-Guided Dual Deep Neural Networks for Robust Blind Denoising of Medical X-ray Images(Addis Ababa University, 2025-10) Fikir Awoke; Adane Letta (PhD)Medical image denoising is the process of reducing unwanted noise from medical images like X-rays, MRIs, and CT scans to improve diagnostic accuracy and clarity. Accurate diagnosis in medical imaging, particularly in radiology, heavily depends on the clarity and quality of visual data. In X-ray imaging, the presence of noise can obscure critical anatomical details, potentially leading to misinterpretation or delayed diagnosis. While previous methods such as BM3D, DnCNN, and domain-specific architectures like X-ReCNN and X-BDCNN have shown significant performance on denoising tasks, they often rely on predefined noise assumptions or lack mechanisms to adaptively attend to varying noise patterns in different image regions. To address these limitations, we propose an attention-guided dual-path deep neural network designed for blind image denoising of real-world medical X-ray images. Unlike standard attention modules, we integrate spatial and channel noise-aware attention mechanisms for medical X-ray denoising, enabling the network to dynamically focus on important features while effectively distinguishing structural details from noise. Our architecture combines U-Net for capturing detailed spatial features and Dilated CNN for extracting broader contextual information. We train our model on the ChestX-ray8 dataset, where it achieves a performance with an SNR of 37.23, PSNR of 42.08, and SSIM of 0.9736. These results demonstrate the model’s effectiveness in denoising X-ray images while preserving structural integrity. The main contributions include the introduction of a noise-aware attention mechanism and a multi scale dualbranch architecture for complementary feature learning. Nevertheless, the model has limitations, generalizing to other imaging modalities like MRI or CT.Item BWAF-Net: Enhanced Human Promoter Identification via Biologically Weighted Attention Fusion of Transformer and Graph Attention Networks(Addis Ababa University, 2025-10) Zemedkun Abebe; Adane Letta (PhD)The identification of gene promoter regions is crucial for understanding transcriptional regulation, yet computational methods often struggle to effectively integrate the diverse biological signals involved. Existing approaches typically focus on a single data modality, such as the DNA sequence, or employ simple fusion techniques that fail to leverage explicit biological knowledge. To address these limitations, we present BWAF-Net, a novel multi-modal deep learning framework for the identification of human promoters. BWAF-Net integrates three data streams: DNA sequences processed by a Transformer to capture long-range dependencies; gene regulatory context from 36 tissue-specific networks modeled by a Graph Attention Network (GAT); and explicit domain knowledge in the form of 11 quantified biological motif counts (priors). The framework’s central innovation is the Biologically Weighted Attention Fusion (BWAF) layer, which uses the biological priors to learn dynamic attention weights that modulate the fusion of the sequence and network representations. Evaluated on a balanced dataset of 40,056 human promoter and non-promoter sequences, BWAF-Net achieved outstanding performance, with 99.87% accuracy, 99.99% AUC-ROC, and 100% precision on the held-out test set. The proposed framework significantly outperformed a replicated state-of-the-art, sequence-only baseline as well as a series of ablated models. Our ablation studies confirm that naive feature concatenation is a suboptimal fusion strategy, validating the necessity of the intelligent BWAF mechanism. By providing a framework that is highly accurate, parameter-efficient, and interpretable, this work presents a significant advance in multi-modal AI for regulatory genomics.Item Collatz Sequence-Based Weight Initialization for Enhanced Convergence and Gradient Stability in Neural Networks(Addis Ababa University, 2025-06) Zehara Eshetu; Beakal Gizachew (PhD); Adane Letta (PhD)Deep neural networks have achieved state-of-the-art performance in tasks ranging from image classification to regression. However, their training dynamics remain highly sensitive to weight initialization. This is a fundamental factor that influences both convergence speed and model performance. Traditional initialization methods such as Xavier and He rely on fixed statistical distributions and often underperform when applied across diverse architectures and datasets. This study introduces Collatz Sequence-Based Weight Initialization, a novel deterministic approach that leverages the structured chaos of Collatz sequences to generate initial weights. CSB applies systematic transformations and scaling strategies to improve gradient flow and enhance training stability. It is evaluated against seven baseline initialization techniques using a CNN on the CIFAR-10 dataset and an MLP on the California Housing dataset. Results show that CSB consistently outperforms conventional methods in both convergence speed and final performance. Specifically, CSB achieves up to 55.03% faster convergence than Xavier and 18.49% faster than He on a 1,000-sample subset, and maintains a 20.64% speed advantage over Xavier on the full CIFAR-10 dataset. On the MLP, CSB shows a 58.12% improvement in convergence speed over He. Beyond convergence, CSB achieves a test accuracy of 78.12% on CIFAR-10, outperforming Xavier by 1.53% and He by 1.34%. On the California Housing dataset, CSB attains an R score of 0.7888, marking a 2.35% improvement over Xavier. Gradient analysis reveals that CSB-initialized networks maintain balanced L2 norms across layers, effectively reducing vanishing and exploding gradient issues. This stability contributes to more reliable training dynamics and improved generalization.However, this study is limited by its focus on shallow architectures and lacks a robustness analysis across diverse hyperparameter settings.Item Multimodal Contextual Transformer Augmented Fusion For Emotion Recognition(Addis Ababa University, 2025-06) Wesagn Dawit; Adane Letta (PhD)As emotionally intelligent systems increasingly become integral to human-centered Artificial Intelligence (AI), the precise recognition of emotions in conversational settings continues to pose a fundamental difficulty. This challenge arises from the context-sensitive and evolving characteristics of emotional expression. Although the majority of Multimodal Emotion Recognition (MER) systems utilize speech and text features, they often overlook conversational context, such as prior dialogue exchanges, speaker identity, and interaction history, which are crucial for discerning nuanced or ambiguous emotions, particularly during dyadic and multiparty interactions. This study presents Multimodal Contextual Transformer Augmented Fusion (MCTAF), a lightweight, context-sensitive framework for MER. MCTAF explicitly represents context as a third modality, integrating the prior K utterances (dialogue history including text and audio), speaker characteristics, and turn-level temporal structure. The contextual features are processed using a Bidirectional Gated Recurrent Unit (BiGRU)-based context encoder that functions concurrently with distinct BiGRU encoders for textual and audio characteristics. All three modality-specific representations are integrated using a transformer-based self-attention method to capture both intra- and inter-modal interdependence across conversation turns. To our knowledge, this is the first study to clearly conceptualize conversational history as a key modality inside a unified transformer architecture, processing it concurrently with voice and text before a dynamic, attentiondriven fusion. MCTAF surpasses robust baselines when assessed on Interactive Emotional Dyadic Motion Capture (IEMOCAP) and Multimodal EmotionLines Dataset (MELD). It achieves 89.9% accuracy and 88.3% weighted F1-score on IEMOCAP and MELD benchmarks, respectively, delivering performance increases of up to +4.0 percentage points in accuracy and +3.0 in F1-score above preceding state-of-the-art models. Ablation experiments further validate the significance of context modeling, demonstrating a 3-4 point decline in F1 when the context module is eliminated. In terms of efficiency, MCTAF decreases training time by 8% each epoch and employs 12% fewer parameters than equivalent transformer-based baselines, with an average inference time of 26.1 ms per syllable. These findings demonstrate the potential of MCTAF for scalable and resource-efficient implementation.