ReIMTS: Recursive Multi-Scale IMTS Forecasting

Updated 4 July 2026

ReIMTS is a recursive multi-scale framework for irregular multivariate time series forecasting that preserves original timestamps by splitting samples based on real-world time periods.
It employs an irregularity-aware fusion mechanism to adaptively aggregate global-to-local dependencies, yielding an average performance improvement of 27.1% across various backbones and datasets.
Compatible with diverse architectures like GNNs, RNNs, and Transformers, ReIMTS maintains sampling-pattern integrity while outperforming traditional resampling methods in both accuracy and efficiency.

ReIMTS is a recursive multi-scale modeling approach for Irregular Multivariate Time Series forecasting that keeps timestamps unchanged and recursively splits each sample into subsamples with progressively shorter time periods, rather than resampling to construct coarse series (Li et al., 25 Feb 2026). It was introduced to address two coupled properties of IMTS: uneven intervals between timestamps, which encode sampling-pattern information, and dependencies that evolve across multiple time scales. In the reported experiments, ReIMTS is presented as a plug-and-play framework for diverse IMTS backbones, with an irregularity-aware representation fusion mechanism that aggregates global-to-local dependencies and yields an average performance improvement of $27.1\%$ across different models and real-world datasets (Li et al., 25 Feb 2026).

1. Problem setting and formal task definition

Irregular Multivariate Time Series are represented as sets of observation tuples rather than uniformly sampled grids. With a total of $T$ timestamps and $V$ variables, one sample is written as

$S := \{ (t_i, z_i, v_i) \mid i = 1, ..., Y \},$

where $t_i \in \{ 0, ..., T \}$ , $z_i \in \mathbb{R}$ , and $v_i \in \{ 1, ..., V \}$ denote timestamp, observed value, and variable indicator, respectively. Forecasting queries are defined by

$Q := \{ q_j \mid j = 1, ..., Y_Q \},$

and the forecasting objective is

$\mathcal{F}(\mathbf{S},\mathbf{Q})\rightarrow \mathbf{Z}.$

The method is motivated by the claim that many existing multi-scale IMTS approaches obtain coarse series through resampling, which can alter the original timestamps and disrupt sampling-pattern information. The paper emphasizes that this issue is not merely representational. In the PhysioNet’12 example discussed by the authors, Bilirubin is densely observed in the first 12 hours and sparsely thereafter; this dense-to-sparse pattern is itself informative. ReIMTS therefore operates directly on the original timestamps and treats time-period structure, rather than observation-count structure, as the organizing principle of its hierarchy (Li et al., 25 Feb 2026).

Training uses mean squared error over forecast queries only: $\mathcal{L}=\frac{1}{Y_Q}\sum_{j=1}^{Y_Q}(\hat{z_j}-z_j)^2.$ The implementation uses a binary mask $T$ 0 so that loss is computed only at queried positions.

2. Recursive multi-scale decomposition

The core construction in ReIMTS is a hierarchy of long-to-short time-period subsamples. Given an aligned and zero-padded input sample $T$ 1 with mask $T$ 2, the method builds scale levels $T$ 3. At each scale, $T$ 4 denotes the real-world time-period length, $T$ 5 the number of subsamples, and $T$ 6 the maximum number of observations per univariate series after split and zero-padding.

For the $T$ 7-th subsample at level $T$ 8, the time period is

$T$ 9

The set of all subsamples is

$V$ 0

with $V$ 1.

This construction is explicitly based on real-world time periods, not on the number of observations. The paper argues that splitting IMTS by observation count can generate subsamples with different actual durations, thereby obscuring sampling density information. By contrast, time-period-based splitting preserves sampling rate and dense-to-sparse patterns at every scale.

At each scale, a backbone encoder $V$ 2 is applied: $V$ 3 The framework is described as compatible with GNN-based, RNN-based, Transformer-based, set-based, and ODE-based IMTS backbones. Depending on the encoder, latent representations may take one of three forms:

temporal representations, $V$ 4;
variable representations, $V$ 5;
observation representations, $V$ 6.

This typed representation scheme is central to the framework’s plug-and-play character, because recursive fusion and shape alignment are defined for each of these encoder outputs (Li et al., 25 Feb 2026).

3. Irregularity-aware fusion and recursive control flow

ReIMTS performs global-to-local fusion: representations from an upper scale are aligned with the next finer scale and injected into that scale’s local encoder output. To prevent zero-padding artifacts from contaminating the transfer, the framework defines an IMTS-aware global representation

$V$ 7

A point-wise feed-forward layer then produces fusion weights

$V$ 8

The fused representation is

$V$ 9

The masking rule is the method’s explicit irregularity-aware component. Temporal and observation representations are masked by $S := \{ (t_i, z_i, v_i) \mid i = 1, ..., Y \},$ 0 so that padded positions do not receive transferred information. Variable representations are treated differently because they are already irregularity-aware at the encoder level. The authors interpret $S := \{ (t_i, z_i, v_i) \mid i = 1, ..., Y \},$ 1 as adaptive fusion intensity conditioned on both content and irregular sampling.

The recursive algorithm proceeds top-down over scales. At scale $S := \{ (t_i, z_i, v_i) \mid i = 1, ..., Y \},$ 2, the encoder produces $S := \{ (t_i, z_i, v_i) \mid i = 1, ..., Y \},$ 3. If there is a previous-scale global representation $S := \{ (t_i, z_i, v_i) \mid i = 1, ..., Y \},$ 4, fusion yields $S := \{ (t_i, z_i, v_i) \mid i = 1, ..., Y \},$ 5; otherwise $S := \{ (t_i, z_i, v_i) \mid i = 1, ..., Y \},$ 6 at the first level. If $S := \{ (t_i, z_i, v_i) \mid i = 1, ..., Y \},$ 7, decoding occurs. Otherwise, the sample and mask are split again into $S := \{ (t_i, z_i, v_i) \mid i = 1, ..., Y \},$ 8 and $S := \{ (t_i, z_i, v_i) \mid i = 1, ..., Y \},$ 9, and the current representation is transformed into a finer-scale carrier $t_i \in \{ 0, ..., T \}$ 0 by splitting or duplication.

The paper gives separate formulas for temporal, observation, and variable cases. For example, temporal representations are propagated by

$t_i \in \{ 0, ..., T \}$ 1

while observation representations use the analogous

$t_i \in \{ 0, ..., T \}$ 2

and variable representations are duplicated across finer subsamples: $t_i \in \{ 0, ..., T \}$ 3

4. Decoding, optimization, and comparison to resampling-based methods

At the lowest scale, the decoder consumes concatenated multi-scale representations: $t_i \in \{ 0, ..., T \}$ 4 No auxiliary cross-scale loss is introduced; supervision is applied through the forecast-query MSE. Gradients therefore propagate through the recursive stack, including all encoder, fusion, and split-or-duplicate transformations.

A central claim of the method is that it differs from both regular-time-series multi-scale models and prior IMTS multi-scale models by avoiding resampling entirely. The paper positions ReIMTS against regular MTS approaches such as Pathformer, MOIRAI, Scaleformer, and TimeMixer, as well as IMTS methods such as Warpformer, Hi-Patch, and HD-TTS. Patch-based IMTS approaches such as tPatchGNN and PrimeNet also split by time periods, but they operate at a single scale rather than through a recursive hierarchy. ReIMTS is therefore presented as a multi-scale generalization that preserves original timestamps at every level and aggregates dependencies from global to local scales without modifying the sampling lattice (Li et al., 25 Feb 2026).

The framework is trained for up to 300 epochs with early stopping of patience 10, over 5 seeds from 2024 to 2028. The main experiments use PyTorch 2.7.0 on $t_i \in \{ 0, ..., T \}$ 5 RTX 3090 GPUs, while efficiency analyses use PyTorch 2.2.2+cu118 on RTX 2080Ti. The unified pipeline is released in PyOmniTS, and the abstract reports code availability at https://github.com/Ladbaby/PyOmniTS.

5. Experimental results, ablations, and efficiency

The evaluation covers five IMTS datasets spanning healthcare, biomechanics, and climate: MIMIC-III, MIMIC-IV, PhysioNet’12, Human Activity, and USHCN. Their reported dimensions range from 5 variables and 1,114 samples in USHCN to 100 variables and 17,874 samples in MIMIC-IV. Preprocessing follows PyOmniTS v2.0.0 with an $t_i \in \{ 0, ..., T \}$ 6 train/validation/test split.

The comparison includes 26 baselines. Among IMTS multi-scale baselines are HD-TTS, Hi-Patch, and Warpformer; among IMTS baselines are TimeCHEAT, GNeuralFlow, tPatchGNN, GraFITi, PrimeNet, CRU, Raindrop, NeuralFlows, mTAN, SeFT, and GRU-D; among regular MTS baselines are Ada-MSHyper, MOIRAI, TimeMixer, Pathformer, Scaleformer, Leddam, PatchTST, TimesNet, Crossformer, and Autoformer. The main claim is that ReIMTS improves a range of backbones rather than only one architecture (Li et al., 25 Feb 2026).

The reported average improvement across models and datasets is $t_i \in \{ 0, ..., T \}$ 7. Per-backbone average improvements include $t_i \in \{ 0, ..., T \}$ 8 for PrimeNet, $t_i \in \{ 0, ..., T \}$ 9 for mTAN, $z_i \in \mathbb{R}$ 0 for TimeCHEAT, $z_i \in \mathbb{R}$ 1 for GRU-D, $z_i \in \mathbb{R}$ 2 for Raindrop, and $z_i \in \mathbb{R}$ 3 for GraFITi. With GraFITi as backbone, the ReIMTS-enhanced model reports MSE $z_i \in \mathbb{R}$ 4 of $z_i \in \mathbb{R}$ 5 on MIMIC-III, $z_i \in \mathbb{R}$ 6 on MIMIC-IV, $z_i \in \mathbb{R}$ 7 on PhysioNet’12, $z_i \in \mathbb{R}$ 8 on Human Activity, and $z_i \in \mathbb{R}$ 9 on USHCN. These values outperform the cited IMTS multi-scale baselines on most settings and match the best reported USHCN value of HD-TTS at $v_i \in \{ 1, ..., V \}$ 0.

The ablation study isolates four components. Replacing split subsamples with the original sample (“rp sample”), splitting by number of observations rather than time period (“rp split”), replacing irregularity-aware fusion with simple addition (“rp IARF”), and removing fusion entirely (“w/o IARF”) all degrade performance. The paper interprets this as evidence that time-period splitting and irregularity-aware fusion are both necessary. Sensitivity studies further report that two scale levels are best for most datasets, while PhysioNet’12 and MIMIC-III benefit from deeper hierarchies. The best time periods often correspond to half of the total sequence duration at scale level 2, and the authors note that 24-hour periods align well with daily cycles in medical datasets.

Efficiency is treated as a first-class result. On MIMIC-III with GraFITi backbone, ReIMTS is reported to achieve the lowest MSE while running fastest and using the least GPU memory compared with Warpformer, HD-TTS, and Hi-Patch. Additional efficiency analyses on MIMIC-IV, PhysioNet’12, Human Activity, and USHCN show that it typically uses significantly less memory and trains at similar or faster speeds while improving accuracy (Li et al., 25 Feb 2026).

Although ReIMTS has a direct and explicit use in irregular multivariate time-series forecasting, the acronym is not stable across arXiv. Several papers explicitly state that “ReIMTS” does not appear in their terminology and is likely a mistaken reference to another method. In neural reasoning, the relevant framework is “Recursive Inference Machines (RIMs),” not ReIMTS, and the paper states that “ReIMTS” is most likely a mistaken reference to RIMs (Komisarczyk et al., 5 Mar 2026). In dynamic computed tomography, the correct acronym is “rMIRT,” and “ReIMTS” is described as a typographical variant or misremembered acronym for the region-based motion-compensated iterative reconstruction technique (Nguyen et al., 2023).

A different usage appears in wireless communications, where the term is aligned with reconfigurable intelligent metasurface transceivers. One paper explicitly defines ReIMTS as “Reconfigurable Intelligent Metasurface Transceivers with Index Modulation,” referring to RIS apertures that directly perform modulation, beamforming, spatial multiplexing, and frequency conversion at the aperture (Hodge et al., 2023). Closely related work uses the acronym “RI-MTS” for “Intelligent Time-Varying Metasurface Transceiver for Index Modulation in 6G Wireless Networks” (Hodge et al., 2020), and an RIS-aided integrated imaging and communication framework is described as closely aligning with the notion of ReIMTS because the RIS acts both as a large-aperture passive imager and as a passive beamformer (Luo et al., 2024).

This terminological heterogeneity suggests that, in contemporary arXiv usage, “ReIMTS” should be disambiguated by domain. In time-series forecasting it denotes the recursive multi-scale IMTS framework of (Li et al., 25 Feb 2026); in several other contexts it is either absent, a mistaken label for another acronym, or a broader shorthand associated with reconfigurable intelligent metasurface transceivers.