UniCA: Unified Covariate Adaptation
- UniCA is a lightweight framework that adapts large pretrained Time Series Foundation Models to heterogeneous covariate-aware forecasting tasks.
- It employs covariate homogenization to convert categorical, image, and text inputs into unified real-valued representations aligned with the forecasting horizon.
- UniCA utilizes a two-stage, attention-based fusion mechanism to integrate past and future covariate data, significantly enhancing forecasting performance.
UniCA (Unified Covariate Adaptation) is a lightweight, modular framework designed for adapting large, pretrained Time Series Foundation Models (TSFMs) to general covariate-aware forecasting tasks involving diverse, heterogeneous covariates, such as categorical metadata, image series, and text features. UniCA enables TSFMs—whose architectures and pretraining routines are typically restricted to real-valued inputs—to leverage arbitrary covariate information without altering the foundation model’s parameters, significantly broadening their applicability to realistic scenarios and multimodal forecasting pipelines (Han et al., 27 Jun 2025).
1. Covariate-Aware Forecasting: Problem Formulation
Covariate-aware forecasting extends classic univariate or multivariate time series forecasting by integrating external variables that can be static or time-varying, and possibly multimodal. The model's objective is to predict a future target segment , given historical target , static covariates (e.g., item/location IDs), and dynamic covariates . These covariates may be:
- Homogeneous real-valued series (e.g., temperature)
- Discrete categorical sequences (e.g., product IDs)
- Multimodal data (e.g., satellite images, diagnostic text)
Formally, the forecasting function is:
Traditional TSFMs, such as Chronos-Bolt and TimesFM, accept only real-valued covariates and assume channel independence, rendering them inadequate for ingesting discrete or multimodal signals without disrupting pretrained temporal representations (Han et al., 27 Jun 2025).
2. Covariate Homogenization
UniCA addresses this limitation through covariate homogenization: a process that transforms all heterogeneous covariates into dense, real-valued series representations aligned with the forecasting horizon. The approach consists of the following mapping pipelines:
- Categorical Sequences: Each token is mapped via an embedding matrix ,
- Image Sequences: Each image is encoded by a convolutional neural network (CNN) or pretrained vision encoder,
- Text Sequences: Segment is transformed by an encoder (e.g., GPT2) and linear projection,
Each covariate’s embedding is input to a Covariate Homogenizer (typically a single linear projection),
yielding unified covariate tensors with dimensionality (typically ).
All homogenized covariates and native real-valued covariates are concatenated channel-wise to form the final input , where is the total effective number of covariate channels.
3. Unified Attention-Based Covariate Fusion
UniCA introduces two-stage, attention-based fusion mechanisms for conditioned covariate injection, situated before and after the frozen temporal encoder of the TSFM:
- Pre-Fusion (Past Covariates): Past target embeddings and covariate tokens are fused for each time patch via a conditional attention pooling. For static features , attention weights
are computed with , , and values .
The attended covariate pool is fused into the patch token using a Gated Linear Unit (GLU):
- Post-Fusion (Future Covariates): Future covariates, tokenized as , are fused to the temporal encoder’s output via the same conditional pooling. The sequence undergoes standard Transformer attention to yield updated hidden states, with forecast tokens passed to the predictor.
Both fusion points are modular; empirical results show negligible sensitivity to the precise fusion locations.
4. Adaptation Protocol and Training
UniCA treats the TSFM (tokenizer , encoder , predictor ) as an immutable backbone:
training only the adaptation modules: embedding layers (), the Covariate Homogenizer, attention pooling, and GLUs—which collectively comprise 1–5% of the total parameters. The loss is the quantile loss (CRPS) across multiple quantiles :
Targets are normalized by Reversible Instance Normalization before training and inverse normalized at inference. Training proceeds with the Adam optimizer, learning rates in , early stopping, and batch sizes (Han et al., 27 Jun 2025).
5. Empirical Evaluation
Experiments on 12 unimodal and multiple multimodal covariate-aware forecasting benchmarks establish the following:
- Unimodal Covariate Tasks: Electricity price forecasting (EPF subsets), retail sales (M5, PDB, Spain), BDG-2 load, and GEFCom datasets.
- Multimodal Benchmarks: MMSP (64x64x4 satellite images + load/NWP features for solar power), Time-MMD (text reports + series in multiple domains).
Key metrics include MAE, MAPE, MSE (all normalized to naive baselines), and CRPS for probabilistic performance.
| Method | MSE↓ | MAE↓ | MAPE↓ | CRPS↓ |
|---|---|---|---|---|
| Chronos-Bolt (0-shot) | 0.418 | 0.526 | 0.514 | 0.460 |
| Chronos-Bolt (UniCA) | 0.383 | 0.509 | 0.506 | 0.429 |
| TimesFM (SFT) | 0.100 | 0.258 | — | — |
| TimesFM (UniCA, MMSP) | 0.098 | 0.229 | — | — |
| TimesFM (UniCA, Time-MMD) | — | 0.652 | — | 0.645 |
| Time-LLM/TTM (Time-MMD) | — | 0.682 | — | 0.681 |
Ablations demonstrate:
- Plug-in Covariate Homogenizer to TFT/TiDE on MMSP: MAE reduced by 5–55%, MAPE by 38–60%
- Homogenization dim. (): performance saturates at
- Homogenizer: single linear layer is optimal for most use cases
All empirical comparisons use >95% frozen parameters, isolating gains to the UniCA adaptation.
6. Implementation and Design Principles
- Backbones: Chronos-Bolt (T5 encoder-decoder on LOTSA), TimesFM (decoder-only pretraining)
- Covariate Homogenizer: Linear projection,
- Image Encoder: 4-layer CNN for satellite data
- Text Encoder: GIST embeddings
- Optimization: Adam, ReduceLROnPlateau, 4 × RTX 3090 GPUs
- Parameter Efficiency: Only 1–5% of the architecture is trainable.
Performance robustness is observed with respect to homogenizer architecture and fusion position. Linear projection for homogenization suffices; more complex MLPs confer negligible additional benefits.
7. Significance and Scope
UniCA establishes a practical, plug-and-play methodology for leveraging the generalization strengths of large pretrained TSFMs in heterogeneous, covariate-rich environments without sacrificing scalability or requiring end-to-end fine-tuning. By decoupling covariate processing from time series backbone dynamics, UniCA enables flexible, modular adaptation to production pipelines, supports any combination of covariate modalities, and achieves state-of-the-art results on a spectrum of real-world forecasting tasks (Han et al., 27 Jun 2025).