Papers
Topics
Authors
Recent
Search
2000 character limit reached

UniCA: Unified Covariate Adaptation

Updated 23 February 2026
  • UniCA is a lightweight framework that adapts large pretrained Time Series Foundation Models to heterogeneous covariate-aware forecasting tasks.
  • It employs covariate homogenization to convert categorical, image, and text inputs into unified real-valued representations aligned with the forecasting horizon.
  • UniCA utilizes a two-stage, attention-based fusion mechanism to integrate past and future covariate data, significantly enhancing forecasting performance.

UniCA (Unified Covariate Adaptation) is a lightweight, modular framework designed for adapting large, pretrained Time Series Foundation Models (TSFMs) to general covariate-aware forecasting tasks involving diverse, heterogeneous covariates, such as categorical metadata, image series, and text features. UniCA enables TSFMs—whose architectures and pretraining routines are typically restricted to real-valued inputs—to leverage arbitrary covariate information without altering the foundation model’s parameters, significantly broadening their applicability to realistic scenarios and multimodal forecasting pipelines (Han et al., 27 Jun 2025).

1. Covariate-Aware Forecasting: Problem Formulation

Covariate-aware forecasting extends classic univariate or multivariate time series forecasting by integrating external variables that can be static or time-varying, and possibly multimodal. The model's objective is to predict a future target segment yT+1:T+HRH×1y_{T+1:T+H} \in \mathbb{R}^{H \times 1}, given historical target y1:TRT×1y_{1:T} \in \mathbb{R}^{T \times 1}, static covariates sRNs \in \mathbb{R}^N (e.g., item/location IDs), and dynamic covariates C1:T+HRT+H×MC_{1:T+H} \in \mathbb{R}^{T+H \times M}. These covariates may be:

  • Homogeneous real-valued series (e.g., temperature)
  • Discrete categorical sequences (e.g., product IDs)
  • Multimodal data (e.g., satellite images, diagnostic text)

Formally, the forecasting function is:

y^T+1:T+H=f(y1:T,C1:T+H,s)\hat{y}_{T+1:T+H} = f(y_{1:T}, C_{1:T+H}, s)

Traditional TSFMs, such as Chronos-Bolt and TimesFM, accept only real-valued covariates and assume channel independence, rendering them inadequate for ingesting discrete or multimodal signals without disrupting pretrained temporal representations (Han et al., 27 Jun 2025).

2. Covariate Homogenization

UniCA addresses this limitation through covariate homogenization: a process that transforms all heterogeneous covariates into dense, real-valued series representations aligned with the forecasting horizon. The approach consists of the following mapping pipelines:

  • Categorical Sequences: Each token ck{1,...V}c_k \in \{1, ... V\} is mapped via an embedding matrix EcRV×dE_c \in \mathbb{R}^{V \times d},

hk(c)=Ec[ck]Rd.h^{(c)}_k = E_c[c_k] \in \mathbb{R}^d.

hk(i)=fimg(ik)Rd.h^{(i)}_k = f_\text{img}(i_k) \in \mathbb{R}^d.

  • Text Sequences: Segment tkt_k is transformed by an encoder (e.g., GPT2) and linear projection,

hk(t)=Wtxtgpt2(tk)+btxtRd.h^{(t)}_k = W_\text{txt} \cdot \text{gpt2}(t_k) + b_\text{txt} \in \mathbb{R}^d.

Each covariate’s embedding H(het)R(T+H)×dH^{(\text{het})} \in \mathbb{R}^{(T+H) \times d} is input to a Covariate Homogenizer (typically a single linear projection),

C~(het)=CH(H(het))=H(het)Wch+bchR(T+H)×dhet\widetilde{C}^{(\text{het})} = \text{CH}(H^{(\text{het})}) = H^{(\text{het})} W_\text{ch} + b_\text{ch} \in \mathbb{R}^{(T+H) \times d^{\text{het}}}

yielding unified covariate tensors with dimensionality dhetd^{\text{het}} (typically dhet=4d^{\text{het}}=4).

All homogenized covariates and native real-valued covariates are concatenated channel-wise to form the final input C1:T+HR(T+H)×MC_{1:T+H} \in \mathbb{R}^{(T+H) \times M}, where MM is the total effective number of covariate channels.

3. Unified Attention-Based Covariate Fusion

UniCA introduces two-stage, attention-based fusion mechanisms for conditioned covariate injection, situated before and after the frozen temporal encoder of the TSFM:

  • Pre-Fusion (Past Covariates): Past target embeddings YtokY_\text{tok} and covariate tokens CtokC_\text{tok} are fused for each time patch via a conditional attention pooling. For static features S=ρ(s)S = \rho(s), attention weights

αp,m=softmaxm(QpKp,mTd)\alpha_{p,m} = \text{softmax}_m \left( \frac{Q_p \cdot K_{p,m}^T}{\sqrt{d}} \right)

are computed with Qp=Ytok[p,:]Q_p = Y_\text{tok}[p,:], Kp,m=Linearity([Ytok[p,:];S])K_{p,m} = \text{Linearity}([Y_\text{tok}[p,:]; S]), and values Vp,m=GRN(Ctok[p,m,:])V_{p,m} = \text{GRN}(C_\text{tok}[p,m,:]).

The attended covariate pool Cpool[p]=m=1Mαp,mVp,mC_\text{pool}[p] = \sum_{m=1}^M \alpha_{p,m} V_{p,m} is fused into the patch token using a Gated Linear Unit (GLU):

Xpre[p]=Ytok[p]+GLU(Cpool[p]).X_\text{pre}[p] = Y_\text{tok}[p] + \text{GLU}(C_\text{pool}[p]).

  • Post-Fusion (Future Covariates): Future covariates, tokenized as CtokfutC_\text{tok}^\text{fut}, are fused to the temporal encoder’s output X1X^1 via the same conditional pooling. The sequence Z=[X1;Cfut]Z = [X^1; C_\text{fut}] undergoes standard Transformer attention to yield updated hidden states, with forecast tokens X^\widehat{X} passed to the predictor.

Both fusion points are modular; empirical results show negligible sensitivity to the precise fusion locations.

4. Adaptation Protocol and Training

UniCA treats the TSFM (tokenizer T\mathcal{T}, encoder E\mathcal{E}, predictor P\mathcal{P}) as an immutable backbone:

y^=PPostFusionEPreFusionT(y1:T),\widehat{y} = \mathcal{P} \circ \text{PostFusion} \circ \mathcal{E} \circ \text{PreFusion} \circ \mathcal{T}(y_{1:T}),

training only the adaptation modules: embedding layers (Ec,EmE_c, E_m), the Covariate Homogenizer, attention pooling, and GLUs—which collectively comprise 1–5% of the total parameters. The loss is the quantile loss (CRPS) across multiple quantiles q[0.1,,0.9]q \in [0.1, \ldots, 0.9]:

LQL=qτ=1H[q(yT+τy^τ,q)++(1q)(y^τ,qyT+τ)+]/H.L_{QL} = \sum_{q} \sum_{\tau=1}^H \left[ q \cdot (y_{T+\tau} - \widehat{y}_{\tau,q})_+ + (1 - q) \cdot (\widehat{y}_{\tau,q} - y_{T+\tau})_+ \right] / H.

Targets are normalized by Reversible Instance Normalization before training and inverse normalized at inference. Training proceeds with the Adam optimizer, learning rates in {103,104,105,106}\{10^{-3}, 10^{-4}, 10^{-5}, 10^{-6}\}, early stopping, and batch sizes {8,16,32,64}\{8, 16, 32, 64\} (Han et al., 27 Jun 2025).

5. Empirical Evaluation

Experiments on 12 unimodal and multiple multimodal covariate-aware forecasting benchmarks establish the following:

  • Unimodal Covariate Tasks: Electricity price forecasting (EPF subsets), retail sales (M5, PDB, Spain), BDG-2 load, and GEFCom datasets.
  • Multimodal Benchmarks: MMSP (64x64x4 satellite images + load/NWP features for solar power), Time-MMD (text reports + series in multiple domains).

Key metrics include MAE, MAPE, MSE (all normalized to naive baselines), and CRPS for probabilistic performance.

Method MSE↓ MAE↓ MAPE↓ CRPS↓
Chronos-Bolt (0-shot) 0.418 0.526 0.514 0.460
Chronos-Bolt (UniCA) 0.383 0.509 0.506 0.429
TimesFM (SFT) 0.100 0.258
TimesFM (UniCA, MMSP) 0.098 0.229
TimesFM (UniCA, Time-MMD) 0.652 0.645
Time-LLM/TTM (Time-MMD) 0.682 0.681

Ablations demonstrate:

  • Plug-in Covariate Homogenizer to TFT/TiDE on MMSP: MAE reduced by 5–55%, MAPE by 38–60%
  • Homogenization dim. (dhetd^{\text{het}}): performance saturates at dhet=4d^{\text{het}}=4
  • Homogenizer: single linear layer is optimal for most use cases

All empirical comparisons use >95% frozen parameters, isolating gains to the UniCA adaptation.

6. Implementation and Design Principles

  • Backbones: Chronos-Bolt (T5 encoder-decoder on LOTSA), TimesFM (decoder-only pretraining)
  • Covariate Homogenizer: Linear projection, ddhet=4d \rightarrow d^{\text{het}}=4
  • Image Encoder: 4-layer CNN for satellite data
  • Text Encoder: GIST embeddings
  • Optimization: Adam, ReduceLROnPlateau, 4 × RTX 3090 GPUs
  • Parameter Efficiency: Only 1–5% of the architecture is trainable.

Performance robustness is observed with respect to homogenizer architecture and fusion position. Linear projection for homogenization suffices; more complex MLPs confer negligible additional benefits.

7. Significance and Scope

UniCA establishes a practical, plug-and-play methodology for leveraging the generalization strengths of large pretrained TSFMs in heterogeneous, covariate-rich environments without sacrificing scalability or requiring end-to-end fine-tuning. By decoupling covariate processing from time series backbone dynamics, UniCA enables flexible, modular adaptation to production pipelines, supports any combination of covariate modalities, and achieves state-of-the-art results on a spectrum of real-world forecasting tasks (Han et al., 27 Jun 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to UniCA.