Collatz Sequence-Based Weight Initialization for Enhanced Convergence and Gradient Stability in Neural Networks

No Thumbnail Available

Date

2025-06

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Deep neural networks have achieved state-of-the-art performance in tasks ranging from image classification to regression. However, their training dynamics remain highly sensitive to weight initialization. This is a fundamental factor that influences both convergence speed and model performance. Traditional initialization methods such as Xavier and He rely on fixed statistical distributions and often underperform when applied across diverse architectures and datasets. This study introduces Collatz Sequence-Based Weight Initialization, a novel deterministic approach that leverages the structured chaos of Collatz sequences to generate initial weights. CSB applies systematic transformations and scaling strategies to improve gradient flow and enhance training stability. It is evaluated against seven baseline initialization techniques using a CNN on the CIFAR-10 dataset and an MLP on the California Housing dataset. Results show that CSB consistently outperforms conventional methods in both convergence speed and final performance. Specifically, CSB achieves up to 55.03% faster convergence than Xavier and 18.49% faster than He on a 1,000-sample subset, and maintains a 20.64% speed advantage over Xavier on the full CIFAR-10 dataset. On the MLP, CSB shows a 58.12% improvement in convergence speed over He. Beyond convergence, CSB achieves a test accuracy of 78.12% on CIFAR-10, outperforming Xavier by 1.53% and He by 1.34%. On the California Housing dataset, CSB attains an R score of 0.7888, marking a 2.35% improvement over Xavier. Gradient analysis reveals that CSB-initialized networks maintain balanced L2 norms across layers, effectively reducing vanishing and exploding gradient issues. This stability contributes to more reliable training dynamics and improved generalization.However, this study is limited by its focus on shallow architectures and lacks a robustness analysis across diverse hyperparameter settings.

Description

Keywords

Weight initialization, Collatz conjecture, Neural networks, Convergence stability, Deep learning.

Citation