Calibrated Prediction Intervals with Deep Transformer-Based Ensembles for Long-Term Time Series Forecasting

No Thumbnail Available

Date

2026-02

Journal Title

Journal ISSN

Volume Title

Publisher

Addis Ababa University

Abstract

Long-term time series forecasting (LTSF) is crucial for decision-making in domains like energy and finance; however, deep learning models, including Transformers, often produce overconfident point predictions with miscalibrated uncertainty estimates. While prior quality-driven approaches, including QD, Dual-AQD, and Sum-K, have advanced prediction interval generation through gradient-based optimization, they remain constrained by manual hyperparameter tuning that limits adaptability across datasets. This study addresses this challenge by introducing the Self-Adaptive Quality-Driven (SA-QD) loss function, which incorporates learnable parameters to dynamically balance prediction interval (PI) coverage and sharpness without manual tuning. Integrated with deep Transformer-based ensembles leveraging the PatchTST architecture and enhanced by Horizon-Specific Conformal Prediction (HSCP) for final calibration, the framework effectively captures both aleatoric and epistemic uncertainties across varying forecast horizons. The proposed method was evaluated on Exchange Rate (financial) and ETTh2 (energy) benchmark datasets over horizons of 96, 192, 336, and 720 steps and compared against state-of-the-art baselines (QD, Dual-AQD, Sum-K). Results demonstrate that SA-QD achieves high (PICP: 0.961 ± 0.009 on Exchange Rate; 0.987 ± 0.008 on ETTh2), close to the nominal 95%, while maintaining competitive MPIW and outperforming baselines in IS and CRPS, with reductions of 56–62% in combined metrics, particularly against Dual-AQD and on ETTh2. Per-horizon analyses confirm robust adaptation to escalating uncertainty, with stable training and reduced overfitting. This work advances automated uncertainty quantification in LTSF, mitigating overconfidence and bolstering reliability in high-stakes applications. Future extensions may incorporate multimodal data and explore additional domains, such as biomedicine.

Description

Keywords

Long-Term Time Series Forecasting, Prediction Interval Coverage Probability, Self-Adaptive Quality-Driven, Mean Prediction Interval Width, Continuous Ranked Probability Score, Transformer Ensembles.

Citation