School of Information Technology and Engineering
Permanent URI for this collection
Browse
Browsing School of Information Technology and Engineering by Author "Adane Letta (PhD)"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item BWAF-Net: Enhanced Human Promoter Identification via Biologically Weighted Attention Fusion of Transformer and Graph Attention Networks(Addis Ababa University, 2025-10) Zemedkun Abebe; Adane Letta (PhD)The identification of gene promoter regions is crucial for understanding transcriptional regulation, yet computational methods often struggle to effectively integrate the diverse biological signals involved. Existing approaches typically focus on a single data modality, such as the DNA sequence, or employ simple fusion techniques that fail to leverage explicit biological knowledge. To address these limitations, we present BWAF-Net, a novel multi-modal deep learning framework for the identification of human promoters. BWAF-Net integrates three data streams: DNA sequences processed by a Transformer to capture long-range dependencies; gene regulatory context from 36 tissue-specific networks modeled by a Graph Attention Network (GAT); and explicit domain knowledge in the form of 11 quantified biological motif counts (priors). The framework’s central innovation is the Biologically Weighted Attention Fusion (BWAF) layer, which uses the biological priors to learn dynamic attention weights that modulate the fusion of the sequence and network representations. Evaluated on a balanced dataset of 40,056 human promoter and non-promoter sequences, BWAF-Net achieved outstanding performance, with 99.87% accuracy, 99.99% AUC-ROC, and 100% precision on the held-out test set. The proposed framework significantly outperformed a replicated state-of-the-art, sequence-only baseline as well as a series of ablated models. Our ablation studies confirm that naive feature concatenation is a suboptimal fusion strategy, validating the necessity of the intelligent BWAF mechanism. By providing a framework that is highly accurate, parameter-efficient, and interpretable, this work presents a significant advance in multi-modal AI for regulatory genomics.Item Collatz Sequence-Based Weight Initialization for Enhanced Convergence and Gradient Stability in Neural Networks(Addis Ababa University, 2025-06) Zehara Eshetu; Beakal Gizachew (PhD); Adane Letta (PhD)Deep neural networks have achieved state-of-the-art performance in tasks ranging from image classification to regression. However, their training dynamics remain highly sensitive to weight initialization. This is a fundamental factor that influences both convergence speed and model performance. Traditional initialization methods such as Xavier and He rely on fixed statistical distributions and often underperform when applied across diverse architectures and datasets. This study introduces Collatz Sequence-Based Weight Initialization, a novel deterministic approach that leverages the structured chaos of Collatz sequences to generate initial weights. CSB applies systematic transformations and scaling strategies to improve gradient flow and enhance training stability. It is evaluated against seven baseline initialization techniques using a CNN on the CIFAR-10 dataset and an MLP on the California Housing dataset. Results show that CSB consistently outperforms conventional methods in both convergence speed and final performance. Specifically, CSB achieves up to 55.03% faster convergence than Xavier and 18.49% faster than He on a 1,000-sample subset, and maintains a 20.64% speed advantage over Xavier on the full CIFAR-10 dataset. On the MLP, CSB shows a 58.12% improvement in convergence speed over He. Beyond convergence, CSB achieves a test accuracy of 78.12% on CIFAR-10, outperforming Xavier by 1.53% and He by 1.34%. On the California Housing dataset, CSB attains an R score of 0.7888, marking a 2.35% improvement over Xavier. Gradient analysis reveals that CSB-initialized networks maintain balanced L2 norms across layers, effectively reducing vanishing and exploding gradient issues. This stability contributes to more reliable training dynamics and improved generalization.However, this study is limited by its focus on shallow architectures and lacks a robustness analysis across diverse hyperparameter settings.