Collatz Sequence-Based Weight Initialization for Enhanced Convergence and Gradient Stability in Neural Networks
No Thumbnail Available
Date
2025-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Deep neural networks have achieved state-of-the-art performance in tasks ranging from image
classification to regression. However, their training dynamics remain highly sensitive to
weight initialization. This is a fundamental factor that influences both convergence speed
and model performance. Traditional initialization methods such as Xavier and He rely on
fixed statistical distributions and often underperform when applied across diverse architectures
and datasets. This study introduces Collatz Sequence-Based Weight Initialization,
a novel deterministic approach that leverages the structured chaos of Collatz sequences to
generate initial weights. CSB applies systematic transformations and scaling strategies to
improve gradient flow and enhance training stability. It is evaluated against seven baseline
initialization techniques using a CNN on the CIFAR-10 dataset and an MLP on the California
Housing dataset. Results show that CSB consistently outperforms conventional methods
in both convergence speed and final performance. Specifically, CSB achieves up to 55.03%
faster convergence than Xavier and 18.49% faster than He on a 1,000-sample subset, and
maintains a 20.64% speed advantage over Xavier on the full CIFAR-10 dataset. On the MLP,
CSB shows a 58.12% improvement in convergence speed over He. Beyond convergence, CSB
achieves a test accuracy of 78.12% on CIFAR-10, outperforming Xavier by 1.53% and He
by 1.34%. On the California Housing dataset, CSB attains an R
score of 0.7888, marking
a 2.35% improvement over Xavier. Gradient analysis reveals that CSB-initialized networks
maintain balanced L2 norms across layers, effectively reducing vanishing and exploding gradient
issues. This stability contributes to more reliable training dynamics and improved
generalization.However, this study is limited by its focus on shallow architectures and lacks
a robustness analysis across diverse hyperparameter settings.
Description
Keywords
Weight initialization, Collatz conjecture, Neural networks, Convergence stability, Deep learning.