UniCA: Unified Covariate Adaptation

Updated 23 February 2026

UniCA is a lightweight framework that adapts large pretrained Time Series Foundation Models to heterogeneous covariate-aware forecasting tasks.
It employs covariate homogenization to convert categorical, image, and text inputs into unified real-valued representations aligned with the forecasting horizon.
UniCA utilizes a two-stage, attention-based fusion mechanism to integrate past and future covariate data, significantly enhancing forecasting performance.

UniCA (Unified Covariate Adaptation) is a lightweight, modular framework designed for adapting large, pretrained Time Series Foundation Models (TSFMs) to general covariate-aware forecasting tasks involving diverse, heterogeneous covariates, such as categorical metadata, image series, and text features. UniCA enables TSFMs—whose architectures and pretraining routines are typically restricted to real-valued inputs—to leverage arbitrary covariate information without altering the foundation model’s parameters, significantly broadening their applicability to realistic scenarios and multimodal forecasting pipelines (Han et al., 27 Jun 2025).

1. Covariate-Aware Forecasting: Problem Formulation

Covariate-aware forecasting extends classic univariate or multivariate time series forecasting by integrating external variables that can be static or time-varying, and possibly multimodal. The model's objective is to predict a future target segment $y_{T+1:T+H} \in \mathbb{R}^{H \times 1}$ , given historical target $y_{1:T} \in \mathbb{R}^{T \times 1}$ , static covariates $s \in \mathbb{R}^N$ (e.g., item/location IDs), and dynamic covariates $C_{1:T+H} \in \mathbb{R}^{T+H \times M}$ . These covariates may be:

Homogeneous real-valued series (e.g., temperature)
Discrete categorical sequences (e.g., product IDs)
Multimodal data (e.g., satellite images, diagnostic text)

Formally, the forecasting function is:

$\hat{y}_{T+1:T+H} = f(y_{1:T}, C_{1:T+H}, s)$

Traditional TSFMs, such as Chronos-Bolt and TimesFM, accept only real-valued covariates and assume channel independence, rendering them inadequate for ingesting discrete or multimodal signals without disrupting pretrained temporal representations (Han et al., 27 Jun 2025).

2. Covariate Homogenization

UniCA addresses this limitation through covariate homogenization: a process that transforms all heterogeneous covariates into dense, real-valued series representations aligned with the forecasting horizon. The approach consists of the following mapping pipelines:

Categorical Sequences: Each token $c_k \in \{1, ... V\}$ is mapped via an embedding matrix $E_c \in \mathbb{R}^{V \times d}$ ,

$h^{(c)}_k = E_c[c_k] \in \mathbb{R}^d.$

Image Sequences: Each image $i_k$ is encoded by a convolutional neural network (CNN) or pretrained vision encoder,

$h^{(i)}_k = f_\text{img}(i_k) \in \mathbb{R}^d.$

Text Sequences: Segment $t_k$ is transformed by an encoder (e.g., GPT2) and linear projection,

$h^{(t)}_k = W_\text{txt} \cdot \text{gpt2}(t_k) + b_\text{txt} \in \mathbb{R}^d.$

Each covariate’s embedding $H^{(\text{het})} \in \mathbb{R}^{(T+H) \times d}$ is input to a Covariate Homogenizer (typically a single linear projection),

$\widetilde{C}^{(\text{het})} = \text{CH}(H^{(\text{het})}) = H^{(\text{het})} W_\text{ch} + b_\text{ch} \in \mathbb{R}^{(T+H) \times d^{\text{het}}}$

yielding unified covariate tensors with dimensionality $d^{\text{het}}$ (typically $d^{\text{het}}=4$ ).

All homogenized covariates and native real-valued covariates are concatenated channel-wise to form the final input $C_{1:T+H} \in \mathbb{R}^{(T+H) \times M}$ , where $M$ is the total effective number of covariate channels.

3. Unified Attention-Based Covariate Fusion

UniCA introduces two-stage, attention-based fusion mechanisms for conditioned covariate injection, situated before and after the frozen temporal encoder of the TSFM:

Pre-Fusion (Past Covariates): Past target embeddings $Y_\text{tok}$ and covariate tokens $C_\text{tok}$ are fused for each time patch via a conditional attention pooling. For static features $S = \rho(s)$ , attention weights

$\alpha_{p,m} = \text{softmax}_m \left( \frac{Q_p \cdot K_{p,m}^T}{\sqrt{d}} \right)$

are computed with $Q_p = Y_\text{tok}[p,:]$ , $K_{p,m} = \text{Linearity}([Y_\text{tok}[p,:]; S])$ , and values $V_{p,m} = \text{GRN}(C_\text{tok}[p,m,:])$ .

The attended covariate pool $C_\text{pool}[p] = \sum_{m=1}^M \alpha_{p,m} V_{p,m}$ is fused into the patch token using a Gated Linear Unit (GLU):

$X_\text{pre}[p] = Y_\text{tok}[p] + \text{GLU}(C_\text{pool}[p]).$

Post-Fusion (Future Covariates): Future covariates, tokenized as $C_\text{tok}^\text{fut}$ , are fused to the temporal encoder’s output $X^1$ via the same conditional pooling. The sequence $Z = [X^1; C_\text{fut}]$ undergoes standard Transformer attention to yield updated hidden states, with forecast tokens $\widehat{X}$ passed to the predictor.

Both fusion points are modular; empirical results show negligible sensitivity to the precise fusion locations.

4. Adaptation Protocol and Training

UniCA treats the TSFM (tokenizer $\mathcal{T}$ , encoder $\mathcal{E}$ , predictor $\mathcal{P}$ ) as an immutable backbone:

$\widehat{y} = \mathcal{P} \circ \text{PostFusion} \circ \mathcal{E} \circ \text{PreFusion} \circ \mathcal{T}(y_{1:T}),$

training only the adaptation modules: embedding layers ( $E_c, E_m$ ), the Covariate Homogenizer, attention pooling, and GLUs—which collectively comprise 1–5% of the total parameters. The loss is the quantile loss (CRPS) across multiple quantiles $q \in [0.1, \ldots, 0.9]$ :

$L_{QL} = \sum_{q} \sum_{\tau=1}^H \left[ q \cdot (y_{T+\tau} - \widehat{y}_{\tau,q})_+ + (1 - q) \cdot (\widehat{y}_{\tau,q} - y_{T+\tau})_+ \right] / H.$

Targets are normalized by Reversible Instance Normalization before training and inverse normalized at inference. Training proceeds with the Adam optimizer, learning rates in $\{10^{-3}, 10^{-4}, 10^{-5}, 10^{-6}\}$ , early stopping, and batch sizes $\{8, 16, 32, 64\}$ (Han et al., 27 Jun 2025).

5. Empirical Evaluation

Experiments on 12 unimodal and multiple multimodal covariate-aware forecasting benchmarks establish the following:

Unimodal Covariate Tasks: Electricity price forecasting (EPF subsets), retail sales (M5, PDB, Spain), BDG-2 load, and GEFCom datasets.
Multimodal Benchmarks: MMSP (64x64x4 satellite images + load/NWP features for solar power), Time-MMD (text reports + series in multiple domains).

Key metrics include MAE, MAPE, MSE (all normalized to naive baselines), and CRPS for probabilistic performance.

Method	MSE↓	MAE↓	MAPE↓	CRPS↓
Chronos-Bolt (0-shot)	0.418	0.526	0.514	0.460
Chronos-Bolt (UniCA)	0.383	0.509	0.506	0.429
TimesFM (SFT)	0.100	0.258	—	—
TimesFM (UniCA, MMSP)	0.098	0.229	—	—
TimesFM (UniCA, Time-MMD)	—	0.652	—	0.645
Time-LLM/TTM (Time-MMD)	—	0.682	—	0.681

Ablations demonstrate:

Plug-in Covariate Homogenizer to TFT/TiDE on MMSP: MAE reduced by 5–55%, MAPE by 38–60%
Homogenization dim. ( $d^{\text{het}}$ ): performance saturates at $d^{\text{het}}=4$
Homogenizer: single linear layer is optimal for most use cases

All empirical comparisons use >95% frozen parameters, isolating gains to the UniCA adaptation.

6. Implementation and Design Principles

Backbones: Chronos-Bolt (T5 encoder-decoder on LOTSA), TimesFM (decoder-only pretraining)
Covariate Homogenizer: Linear projection, $d \rightarrow d^{\text{het}}=4$
Image Encoder: 4-layer CNN for satellite data
Text Encoder: GIST embeddings
Optimization: Adam, ReduceLROnPlateau, 4 × RTX 3090 GPUs
Parameter Efficiency: Only 1–5% of the architecture is trainable.

Performance robustness is observed with respect to homogenizer architecture and fusion position. Linear projection for homogenization suffices; more complex MLPs confer negligible additional benefits.

7. Significance and Scope

UniCA establishes a practical, plug-and-play methodology for leveraging the generalization strengths of large pretrained TSFMs in heterogeneous, covariate-rich environments without sacrificing scalability or requiring end-to-end fine-tuning. By decoupling covariate processing from time series backbone dynamics, UniCA enables flexible, modular adaptation to production pipelines, supports any combination of covariate modalities, and achieves state-of-the-art results on a spectrum of real-world forecasting tasks (Han et al., 27 Jun 2025).

Markdown Report Issue Upgrade to Chat

References (1)

UniCA: Adapting Time Series Foundation Model to General Covariate-Aware Forecasting (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to UniCA.

UniCA: Unified Covariate Adaptation

1. Covariate-Aware Forecasting: Problem Formulation

2. Covariate Homogenization

3. Unified Attention-Based Covariate Fusion

4. Adaptation Protocol and Training

5. Empirical Evaluation

6. Implementation and Design Principles

7. Significance and Scope

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

UniCA: Unified Covariate Adaptation

1. Covariate-Aware Forecasting: Problem Formulation

2. Covariate Homogenization

3. Unified Attention-Based Covariate Fusion

4. Adaptation Protocol and Training

5. Empirical Evaluation

6. Implementation and Design Principles

7. Significance and Scope

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research