Papers
Topics
Authors
Recent
Search
2000 character limit reached

TSI-Bench: Dual Benchmark for Imputation & TDA

Updated 12 February 2026
  • TSI-Bench is a dual-purpose benchmarking suite that standardizes evaluation protocols for deep time series imputation and topological stability in TDA.
  • It features modular workflows, reproducible pipelines, and comprehensive datasets with varying missingness structures and topological characteristics.
  • Empirical insights reveal trade-offs between model complexity and performance, guiding future enhancements in imputation and topological data analysis methodologies.

TSI-Bench is the name of two distinct but influential benchmarking platforms in contemporary research: one addresses large-scale standardized evaluation of deep time series imputation algorithms, and the other formalizes a rigorous benchmarking protocol for the Topological Stability Index (TSI) in topological data analysis (TDA). Both serve as reproducible, community-oriented suites that synthesize theory, systematic pipelines, and empirical evaluation for specialized data modeling paradigms. This entry details both TSI-Bench frameworks, highlighting their architectures, methodological foundations, benchmarking protocols, and research implications.

1. TSI-Bench for Time Series Imputation

TSI-Bench is the first comprehensive benchmark suite for time-series imputation with deep learning, enabling fair, reproducible evaluation and methodological transfer from forecasting to imputation domains (Du et al., 2024).

1.1. Pipeline Architecture

The suite is organized as a modular workflow:

  • Data Warehouse (TSDB): Centralized storage for raw time series from diverse domains.
  • PyGrinder: Automated generation of partially-observed time series (POTS) by simulating missingness via user-specified mechanisms (MCAR—random points, structured subsequences, spatio-temporal blocks).
  • BenchPOTS: Standardizes all experiments with time-based train/val/test splits, sliding-window segmentation, and per-series normalization.
  • PyPOTS: Unified implementation interface for all imputation algorithms; outputs seamlessly integrate into downstream analytical tasks.
  • Hyperparameter Optimization: Uses NNI with 100-trial parameter searches to ensure uniform performance tuning across models.
  • Experiment Reproducibility: Fixes seeds, data splits, zero-padding, and missingness masks across all runs.

1.2. Benchmark Datasets

TSI-Bench comprises eight real-world multivariate datasets covering air quality, traffic, electricity, and healthcare time-series analysis. These datasets vary in variable counts, temporal resolution, windowing strategies, and original missingness levels.

Domain Dataset Vars Window Frequency Orig. Missingness
Air Quality BeijingAir 132 24 hourly low/moderate
Air Quality ItalyAir 13 12 hourly low/moderate
Traffic PeMS 862 24 hourly low/moderate
Traffic Pedestrian 1 24 hourly low/moderate
Electricity Electricity Load 370 96 15-min low/moderate
Electricity ETT_h1 7 48 hourly low/moderate
Healthcare PhysioNet2012 35 48 hourly 80%
Healthcare PhysioNet2019 33 48 hourly 74%

This coverage includes both univariate/high-dimensional series and captures settings from environmental sensing to critical care monitoring.

1.3. Missingness Simulation Protocols

TSI-Bench systematically explores the effect of MCAR missing data using three mechanisms applied at three rates (ρ{0.1,0.5,0.9}\rho \in \{0.1, 0.5, 0.9\}):

  • Point Missing: Drops individual points randomly with probability ρ\rho.
  • Subsequence Missing: Removes contiguous segments per variable summing to ρL\rho\cdot L points.
  • Block Missing: Simultaneously masks entire windows across variables (spatio-temporal blocks) totaling ρL\rho\cdot L.
  • The missingness mask M{0,1}D×LM \in \{0,1\}^{D \times L} strictly complies with 1M1/(DL)=ρ\|1 - M\|_1/(D\cdot L) = \rho.

1.4. Algorithm Portfolio

TSI-Bench includes 28 unique algorithms:

  • Traditional: Mean, median, last observation carried forward, linear interpolation
  • Imputation-specific Deep Models: MRNN, GRU-D, BRITS (RNNs); SAITS, iTransformer, AttnImputation (Attention); TimesNet, MICN, SCINet, StemGNN (CNN/GNN); GP-VAE (VAE), US-GAN (GAN), CSDI (diffusion)
  • Forecasting Backbones Adapted for Imputation: Informer, Autoformer, Pyraformer, Crossformer, PatchTST, ETSformer, Nonstationary Transformer, DLinear, FiLM, Koopa, FreTS

Adaptation leverages SAITS’s diagonal-mask self-attention embedding, mask augmentation, and joint reconstruction loss; the encoder/decoder is preserved.

1.5. Evaluation Metrics

Model performance is assessed only on the imputed (formerly missing) positions:

  • Mean Absolute Error (MAE):

MAE=t,d(1mtd)x^tdxtdt,d(1mtd)\mathrm{MAE} = \frac{\sum_{t,d}(1-m_t^d)|\hat x_t^d - x_t^d|}{\sum_{t,d}(1-m_t^d)}

  • Mean Squared Error (MSE):

MSE=t,d(1mtd)(x^tdxtd)2t,d(1mtd)\mathrm{MSE} = \frac{\sum_{t,d}(1-m_t^d)(\hat x_t^d - x_t^d)^2}{\sum_{t,d}(1-m_t^d)}

  • Mean Relative Error (MRE):

MRE=t,d(1mtd)x^tdxtdt,d(1mtd)xtd\mathrm{MRE} = \frac{\sum_{t,d}(1-m_t^d)|\hat x_t^d - x_t^d|}{\sum_{t,d}(1-m_t^d)|x_t^d|}

  • Efficiency: Each model’s trainable parameter count and inference time per batch are logged.

1.6. Empirical Insights

Analysis across 34,804 experiments yields the following:

  • No algorithm is consistently best across settings.
  • Simple point missingness is significantly less challenging; subsequence/block patterns increase error by 20–50%.
  • High missing rates harm diffusion-based/generative models (e.g., CSDI) and RNNs, yet linear MLPs (DLinear) remain robust.
  • Adapted forecasting models (e.g., Crossformer, DLinear) can outperform canonical imputation models in some regimes.
  • Model complexity trade-off: Transformers (Crossformer) provide top accuracy but require >50>50M params and 6\sim6 s inference/batch, while MLPs (DLinear) use <1<1M params and <1<1 s inference for modestly higher error.
  • Downstream modeling (classification, regression, forecasting) consistently benefits from imputation over raw partial data. For instance, XGBoost ROC_AUC on PhysioNet2012 rises from 0.771 (raw, ρ=0.1\rho=0.1) to 0.852 (after SAITS imputation); similar improvements are observed for regression on ETT_h1.

1.7. Practical Guidance and Prospects

TSI-Bench is fully open-source and includes all data splits, configurations, code, and logs (https://github.com/WenjieDu/AwesomeImputation). Implementing new models or datasets follows a prescribed pattern: integration with PyPOTS, configuration addition, missingness simulation with PyGrinder, and workflow execution in BenchPOTS.

Planned research directions include integrating MAR/MNAR mechanisms, more domain-specific block structures, uncertainty-aware metrics (e.g., CRPS), pre-training/self-supervision, and imputation for irregularly-sampled series. The platform establishes a reproducible, standardized ecosystem for imputation, revealing nuanced cross-effects between missingness structure, model design, and downstream application (Du et al., 2024).

2. TSI-Bench for the Topological Stability Index in TDA

A separate, rigorous TSI-Bench framework has been outlined for benchmarking the Topological Stability Index, a concise, interpretable indicator of structural variability derived from persistence lifetimes in TDA (Diamantis, 17 Nov 2025).

2.1. Definition and Mathematical Foundations

  • Topological Stability Index (TSI): Given a set of persistence lifetimes L={i(k)}\mathcal{L} = \{\ell_i^{(k)}\} in homology dimensions k=0,1k=0,1 from persistent homology applied to a dataset XX with a specified distance function:

    • Raw TSI:

    TSIraw=Var(L)\mathrm{TSI}_{\text{raw}} = \operatorname{Var}(\mathcal{L}) - Normalized TSI:

    TSInorm=Var(L)L+ϵ\mathrm{TSI}_{\mathrm{norm}} = \frac{\operatorname{Var}(\mathcal{L})}{\sum_{\ell\in\mathcal{L}}\ell + \epsilon}

    where ϵ>0\epsilon > 0 is for numerical stability. - Weighted TSI can emphasize chosen topological features (e.g., H1H_1 loops).

2.2. Computational Workflow

The TSI computation pipeline spans six steps:

  1. Preprocessing: Feature normalization, optional detrending, symbolic transformation.
  2. Distance Matrix Construction: Selection of Euclidean, Mahalanobis, correlation, cosine, DTW, or symbolic distances per domain.
  3. Simplicial Complex Construction: Vietoris–Rips complex across a range of scales and resolutions.
  4. Persistent Homology Computation: Extraction of birth–death pairs in H0H_0 and H1H_1 using libraries such as Ripser, GUDHI, or giotto-tda.
  5. Lifetime Extraction: Compile all lifetimes =db\ell = d-b into L\mathcal{L}.
  6. TSI Calculation: Derive raw, normalized, or weighted TSI as per the defined metrics; rolling windows produce a TSI time series.

TSI-Bench provides explicit pseudocode for these steps, controlling for hyperparameters such as εmax\varepsilon_{\max}, mm (resolution), and window size.

2.3. Practical Guidelines for Benchmarking

Optimal parameterization is dataset/goal-specific:

  • Distance Metric: Correlation for financial/behavioral data, Euclidean for static features, DTW for sequential data (bearing in mind triangle inequality limitations).
  • Filtration Resolution (mm): Trade-off between detection granularity and computational cost; m=50m=50–200 is typical.
  • Scale Range: εmax\varepsilon_{\max} near the 90–99th percentile of pairwise distances.
  • Simplicial Complex: Vietoris–Rips for general datasets, Witness or Alpha complexes when sample size or global structure allows; landmarking for large nn.

2.4. Benchmarking Suite Design

The TSI-Bench protocol encompasses:

  • Dataset Inclusion: Synthetic point clouds with known topology, UCR time-series subsets, financial datasets (equities, FX, crypto), behavioral datasets (Google Trends), and network models (Erdős–Rényi, scale-free).
  • Performance Metrics: Runtime, memory peak, TSI reproducibility under bootstrap, sensitivity to metric/scale parameters, and TSI correlation with known volatility or ground truth regime change labels.
  • Variability Analysis: Bootstrap confidence intervals, parameter sweeps, and corresponding effect on TSI.
  • Visualization: TSI time series (with error bands), heatmaps across parameter grids, boxplots of raw/norm TSI, and cross-metric correlation matrices.
  • Comparative Protocols: Standardized comparison—establish ground-truth topological labels, baseline TSI, parameter perturbation analysis, and method ranking based on TSI coefficient of variation (CV) and discriminative power.

2.5. Illustrative Applications

Case studies demonstrate TSI-Bench’s versatility:

  • Consumer Behavior: TSI time series on sliding windows of Google Trends data reflect episodic pattern shifts and viral surges.
  • Equity Markets: TSI surges during periods of heterogeneous clustering in high-volatility market regimes (e.g., March 2024).
  • Foreign Exchange: TSI peaks during crisis periods (e.g., 2008) and subsides during calm.

TSI-Bench enables statistical analysis of TSI’s stability and correlation with domain-relevant ground truth such as volatility and regime labels (Diamantis, 17 Nov 2025).

3. Research Impact and Methodological Insights

TSI-Bench suites provide open infrastructures that serve as gold standards for heightening methodological transparency, reproducibility, and cross-model comparison in time series imputation and topological analytics. Both platforms encourage holistic evaluation—probing sensitivity to missingness structures, data modalities, tuning choices, and downstream application relevance.

Key cross-cutting themes include:

  • Systematization: Codified workflows and unified metrics resolve prior inconsistencies in experimental protocol and enable robust imputer evaluation or TDA interpretation.
  • Transferability: Forecasting-to-imputation adaptation and TSI’s applicability to nontraditional features (behavioral, financial) drive cross-pollination of methodological advances.
  • Reproducibility: Public release of code, logs, and configuration protocols lowers barriers to extension and independent validation.

4. Limitations and Prospective Directions

Notable present limitations include:

  • MAR/MNAR Omission: Current imputation benchmark focuses on MCAR; integration of more realistic missingness mechanisms (MAR, MNAR) is under consideration.
  • Metric-Specificity: TSI-Bench for TDA currently privileges specific topological summaries; extension to higher homology and uncertainty-aware metrics is planned.
  • Scalability: High computational burden for large, high-resolution complexes in both domains makes landmarking or down-sampling important for future scaling.
  • Irregularity Handling: Future pipelines aim to address asynchronous/irregular time series and domain-specific missingness patterns.

5. Conclusion

TSI-Bench, in both time series imputation and topological analytics instantiations, delivers reproducible, large-scale pipelines for standardized algorithm evaluation. These frameworks uncover nuanced dependencies among missingness models, algorithmic backbones, data regime, and downstream performance, providing critical infrastructure for model selection, practical deployment, and methodological innovation (Du et al., 2024, Diamantis, 17 Nov 2025).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TSI-Bench.