Papers
Topics
Authors
Recent
2000 character limit reached

Sundial: Foundation Model for Forecasting

Updated 2 December 2025
  • Sundial Foundation Model is a transformer-based architecture that unifies heterogeneous temporal imaging and remote-sensing data for applications in solar forecasting and diverse time series domains.
  • It leverages innovations like spatiotemporal transformer backbones, spectral gating, and long–short attention to facilitate zero-shot predictions and rapid fine-tuning.
  • The model demonstrates state-of-the-art performance with efficient inference, scalability across parameter sizes, and significant improvements in forecasting accuracy across solar, meteorological, and financial tasks.

The Sundial Foundation Model designates a class of large-scale, transformer-based architectures for solar and time series forecasting, unifying heterogeneous temporal imaging and remote-sensing data sources into flexible, general-purpose representations. The most cited instantiations arise in heliophysics—integrating Solar Dynamics Observatory (SDO) multi-instrument full-disk data—and in broad time-series applications spanning meteorology, environmental monitoring, and financial domains. Sundial leverages innovations in spatiotemporal transformer backbones, spectral gating, long–short attention, and flow-based probabilistic forecasting, to enable zero-shot predictions, rapid fine-tuning, and downstream scientific modeling tasks.

1. Model Architecture: Spatiotemporal Transformer Backbones

Sundial extends the Surya spatiotemporal transformer framework by incorporating patch-based tokenization, dual-scale attention, and spectral features. Given paired input frames Xt1,XtRC×H×W\mathbf{X}_{t-1}, \mathbf{X}_t \in \mathbb{R}^{C\times H\times W} (e.g., C=13C=13, H=W=4096H=W=4096 for SDO), Sundial applies signum-log normalization, divides each frame into non-overlapping P×PP\times P patches, and projects each patch to a DD-dimensional token. Learned Fourier positional embeddings preserve spatial topology (Roy et al., 18 Aug 2025). Model layers alternate spectral gating (frequency-space filtering via FFT) and dual-range attention blocks:

  • Spectral Gating: Frequency coefficients X~\widetilde{\mathbf{X}} are modulated by complex learnable weights WcW_c:

X~=X~Wc\widetilde{\mathbf{X}}' = \widetilde{\mathbf{X}} \odot W_c

with inverse transform and MLP residual update.

  • Long–Short Attention: Short-range heads operate on local windows, while long-range heads effect rank-reduced global projections,

Attnshort(QΩ,KΩ,VΩ),Attnlong(Q,Kˉ,Vˉ)\mathrm{Attn}_{\rm short}(Q_\Omega, K_\Omega, V_\Omega),\quad \mathrm{Attn}_{\rm long}(Q, \bar K, \bar V)

with the outputs concatenated and processed via a residual MLP. Outputs are mapped back to image space by a linear decoder.

Sundial supports parameter-efficient adaptation via Low-Rank Adaptation (LoRA), learning low-rank updates on frozen Transformer weights

W=W0+αrBAW = W_0 + \frac{\alpha}{r}BA

for fine-tuning downstream tasks (Roy et al., 18 Aug 2025).

2. Pretraining Objectives and Probabilistic Forecasting

Sundial is pretrained in two phases:

  1. One-Step Supervised Forecasting: Minimizing MSE between predicted and observed future frames:

L1step=1Bb=1BXt+1(b)fθ(Xt(b),Xt1(b))22\mathcal{L}_{\rm 1-step} = \frac{1}{B}\sum_{b=1}^B \left\| \mathbf{X}_{t+1}^{(b)} - f_\theta(\mathbf{X}_t^{(b)}, \mathbf{X}_{t-1}^{(b)}) \right\|_2^2

  1. Autoregressive Rollout Tuning: Successive prediction for TT future frames, with cumulative MSE loss:

Lrollout=1BTb=1Bk=1TXt+k(b)X^t+k(b)2\mathcal{L}_{\rm rollout} = \frac{1}{BT}\sum_{b=1}^B\sum_{k=1}^T \| \mathbf{X}_{t+k}^{(b)} - \hat{\mathbf{X}}_{t+k}^{(b)} \|^2

Probabilistic forecasting is realized by integrating a flow-matching objective ("TimeFlow loss"), pretraining the model to predict transport velocities between source noise and true future patches, without explicit parametric output densities (Liu et al., 2 Feb 2025). At inference, Sundial samples likely trajectories by integrating learned velocity fields over multiple steps, yielding distributions over forecasts.

3. Data Modalities, Preprocessing, and Instrument Fusion

Sundial operates on comprehensive, multi-instrument solar datasets (e.g., SDO AIA, HMI, EVE), aggregating:

  • EUV and UV imaging (AIA) at 0.6″/px, channels 94–335 Å, 12 s native cadence rebinned/stacked to 12 min.
  • Full-disk vector magnetograms and Doppler maps (HMI).
  • Sun-integrated EUV spectra (EVE).

Alignment employs limb- and WCS-based co-registration, exposure normalization, and solar-disk masking. Inputs are rescaled per-channel to zero mean and unit variance across the training corpus. Channel stacking and co-temporal pairing fuses all modalities into an $11$-channel tensor for model ingestion (see (Walsh et al., 3 Oct 2024)). For non-solar time series, Sundial is trained on the TimeBench corpus (1\sim1T points), covering meteorology, finance, sensor streams, and synthetic benchmarks (Liu et al., 2 Feb 2025).

4. Zero-Shot and Downstream Evaluation Performance

Sundial demonstrates strong zero-shot adaptation, outperforming supervised LSTM and ARIMA models on LAI prediction once sufficient historical context is provided (Tin512T_{\mathrm{in}} \gtrsim 512) (Zhang et al., 25 Nov 2025). In solar domains, Sundial achieves MSE=0.2198 for one-hour forecasting, compared to 0.5940 for persistence, and delivers 17.8% improvement at 12 hours ahead following rollout tuning (Roy et al., 18 Aug 2025). For flare prediction and segmentation tasks, Sundial+LoRA or adapter heads outperform U-Net and ResNet baselines on intersection-over-union, Dice, and classification scores.

Example Downstream Metrics

Task Baseline IoU/Dice Sundial IoU/Dice
AR segmentation (U-Net) 0.688 / 0.801 0.768 / 0.853
Flare classification (AlexNet) TSS=0.358 TSS=0.436
EUV spectra regression (FISM) MAPE=3.4% MAPE=1.48%

Sundial’s embeddings also cluster solar phenomena (active regions, flares, quiet Sun) with silhouette scores 0.61\sim0.61 (Walsh et al., 3 Oct 2024).

5. Scalability, Limitations, and Model Variants

Sundial models scale from 32M to 444M parameters; larger variants yield monotonic improvements in loss and forecasting accuracy (Liu et al., 2 Feb 2025). Inference time is highly efficient, working in sub-millisecond range per prediction. However, very long context is necessary for optimal zero-shot performance, especially on nonstationary or rare event time series (Zhang et al., 25 Nov 2025). The current implementations focus on univariate or stacked image forecasting; cross-series correlation and advanced sampling for steep trends remain ongoing research directions.

Limitations cited include:

  • High compute and VRAM requirements for training on full-resolution SDO data.
  • Conservative trend estimates under rapid change (mode-reversion artifacts).
  • Absence of explicit physics-based priors or constraints in current architectures.
  • Dependence on long, contiguous observational history for best zero-shot adaptation.

6. Fine-Tuning and Adaptation Strategies

Low-Rank Adaptation (LoRA) provides a scalable pathway for downstream adaptation, achieving task transfer within <5<5 million parameters by modifying frozen transformer blocks (Roy et al., 18 Aug 2025). Common adapters include MLP regressors, CNN classifiers, and transformer-based inpainting heads. Fine-tuning protocols maintain base capacity while targeting domain-specific predictive signals.

Prominent downstream tasks include:

  • Solar flare probability forecasting (binary classification within 24 h).
  • Solar wind speed regression (e.g., RMSE=75.9 km/s, outperforming empirical MHD models).
  • Spectral irradiance reconstruction (MAPE <2%).
  • Active region segmentation (IoU >0.75).

7. Future Directions and Design Recommendations

Building on Surya and Sundial, forthcoming models are encouraged to:

  • Integrate more diverse modalities (GOES-SUVI, GONG, DKIST) and multi-scale image patches (8×8, 32×32).
  • Employ probabilistic or diffusion-based forecast heads (e.g., CRPS loss).
  • Leverage masked spatiotemporal reconstruction and hierarchical attention for global context.
  • Utilize physics-informed tokens (rotation rate, Carrington coordinates) for improved interpretability.
  • Adopt distributed, mixed-precision computing on A100-class hardware, with sharded data and compressed pipelines.

A plausible implication is the expansion to real-time operational space weather forecasting and cross-domain environmental prediction, contingent on advances in multi-modal data fusion and efficient fine-tuning protocols.


References

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Sundial Foundation Model.