TSI-Bench: Dual Benchmark for Imputation & TDA

Updated 12 February 2026

TSI-Bench is a dual-purpose benchmarking suite that standardizes evaluation protocols for deep time series imputation and topological stability in TDA.
It features modular workflows, reproducible pipelines, and comprehensive datasets with varying missingness structures and topological characteristics.
Empirical insights reveal trade-offs between model complexity and performance, guiding future enhancements in imputation and topological data analysis methodologies.

TSI-Bench is the name of two distinct but influential benchmarking platforms in contemporary research: one addresses large-scale standardized evaluation of deep time series imputation algorithms, and the other formalizes a rigorous benchmarking protocol for the Topological Stability Index (TSI) in topological data analysis (TDA). Both serve as reproducible, community-oriented suites that synthesize theory, systematic pipelines, and empirical evaluation for specialized data modeling paradigms. This entry details both TSI-Bench frameworks, highlighting their architectures, methodological foundations, benchmarking protocols, and research implications.

1. TSI-Bench for Time Series Imputation

TSI-Bench is the first comprehensive benchmark suite for time-series imputation with deep learning, enabling fair, reproducible evaluation and methodological transfer from forecasting to imputation domains (Du et al., 2024).

1.1. Pipeline Architecture

The suite is organized as a modular workflow:

Data Warehouse (TSDB): Centralized storage for raw time series from diverse domains.
PyGrinder: Automated generation of partially-observed time series (POTS) by simulating missingness via user-specified mechanisms (MCAR—random points, structured subsequences, spatio-temporal blocks).
BenchPOTS: Standardizes all experiments with time-based train/val/test splits, sliding-window segmentation, and per-series normalization.
PyPOTS: Unified implementation interface for all imputation algorithms; outputs seamlessly integrate into downstream analytical tasks.
Hyperparameter Optimization: Uses NNI with 100-trial parameter searches to ensure uniform performance tuning across models.
Experiment Reproducibility: Fixes seeds, data splits, zero-padding, and missingness masks across all runs.

1.2. Benchmark Datasets

TSI-Bench comprises eight real-world multivariate datasets covering air quality, traffic, electricity, and healthcare time-series analysis. These datasets vary in variable counts, temporal resolution, windowing strategies, and original missingness levels.

Domain	Dataset	Vars	Window	Frequency	Orig. Missingness
Air Quality	BeijingAir	132	24	hourly	low/moderate
Air Quality	ItalyAir	13	12	hourly	low/moderate
Traffic	PeMS	862	24	hourly	low/moderate
Traffic	Pedestrian	1	24	hourly	low/moderate
Electricity	Electricity Load	370	96	15-min	low/moderate
Electricity	ETT_h1	7	48	hourly	low/moderate
Healthcare	PhysioNet2012	35	48	hourly	80%
Healthcare	PhysioNet2019	33	48	hourly	74%

This coverage includes both univariate/high-dimensional series and captures settings from environmental sensing to critical care monitoring.

1.3. Missingness Simulation Protocols

TSI-Bench systematically explores the effect of MCAR missing data using three mechanisms applied at three rates ( $\rho \in \{0.1, 0.5, 0.9\}$ ):

Point Missing: Drops individual points randomly with probability $\rho$ .
Subsequence Missing: Removes contiguous segments per variable summing to $\rho\cdot L$ points.
Block Missing: Simultaneously masks entire windows across variables (spatio-temporal blocks) totaling $\rho\cdot L$ .
The missingness mask $M \in \{0,1\}^{D \times L}$ strictly complies with $\|1 - M\|_1/(D\cdot L) = \rho$ .

1.4. Algorithm Portfolio

TSI-Bench includes 28 unique algorithms:

Traditional: Mean, median, last observation carried forward, linear interpolation
Imputation-specific Deep Models: MRNN, GRU-D, BRITS (RNNs); SAITS, iTransformer, AttnImputation (Attention); TimesNet, MICN, SCINet, StemGNN (CNN/GNN); GP-VAE (VAE), US-GAN (GAN), CSDI (diffusion)
Forecasting Backbones Adapted for Imputation: Informer, Autoformer, Pyraformer, Crossformer, PatchTST, ETSformer, Nonstationary Transformer, DLinear, FiLM, Koopa, FreTS

Adaptation leverages SAITS’s diagonal-mask self-attention embedding, mask augmentation, and joint reconstruction loss; the encoder/decoder is preserved.

1.5. Evaluation Metrics

Model performance is assessed only on the imputed (formerly missing) positions:

Mean Absolute Error (MAE):

$\mathrm{MAE} = \frac{\sum_{t,d}(1-m_t^d)|\hat x_t^d - x_t^d|}{\sum_{t,d}(1-m_t^d)}$

Mean Squared Error (MSE):

$\mathrm{MSE} = \frac{\sum_{t,d}(1-m_t^d)(\hat x_t^d - x_t^d)^2}{\sum_{t,d}(1-m_t^d)}$

Mean Relative Error (MRE):

$\mathrm{MRE} = \frac{\sum_{t,d}(1-m_t^d)|\hat x_t^d - x_t^d|}{\sum_{t,d}(1-m_t^d)|x_t^d|}$

Efficiency: Each model’s trainable parameter count and inference time per batch are logged.

1.6. Empirical Insights

Analysis across 34,804 experiments yields the following:

No algorithm is consistently best across settings.
Simple point missingness is significantly less challenging; subsequence/block patterns increase error by 20–50%.
High missing rates harm diffusion-based/generative models (e.g., CSDI) and RNNs, yet linear MLPs (DLinear) remain robust.
Adapted forecasting models (e.g., Crossformer, DLinear) can outperform canonical imputation models in some regimes.
Model complexity trade-off: Transformers (Crossformer) provide top accuracy but require $>50$ M params and $\sim6$ s inference/batch, while MLPs (DLinear) use $<1$ M params and $<1$ s inference for modestly higher error.
Downstream modeling (classification, regression, forecasting) consistently benefits from imputation over raw partial data. For instance, XGBoost ROC_AUC on PhysioNet2012 rises from 0.771 (raw, $\rho=0.1$ ) to 0.852 (after SAITS imputation); similar improvements are observed for regression on ETT_h1.

1.7. Practical Guidance and Prospects

TSI-Bench is fully open-source and includes all data splits, configurations, code, and logs (https://github.com/WenjieDu/AwesomeImputation). Implementing new models or datasets follows a prescribed pattern: integration with PyPOTS, configuration addition, missingness simulation with PyGrinder, and workflow execution in BenchPOTS.

Planned research directions include integrating MAR/MNAR mechanisms, more domain-specific block structures, uncertainty-aware metrics (e.g., CRPS), pre-training/self-supervision, and imputation for irregularly-sampled series. The platform establishes a reproducible, standardized ecosystem for imputation, revealing nuanced cross-effects between missingness structure, model design, and downstream application (Du et al., 2024).

2. TSI-Bench for the Topological Stability Index in TDA

A separate, rigorous TSI-Bench framework has been outlined for benchmarking the Topological Stability Index, a concise, interpretable indicator of structural variability derived from persistence lifetimes in TDA (Diamantis, 17 Nov 2025).

2.1. Definition and Mathematical Foundations

Topological Stability Index (TSI): Given a set of persistence lifetimes $\mathcal{L} = \{\ell_i^{(k)}\}$ $L = {ℓ_{i}^{(k)}}$ in homology dimensions $k=0,1$ $k = 0, 1$ from persistent homology applied to a dataset $X$ $X$ with a specified distance function:
- Raw TSI:
$\mathrm{TSI}_{\text{raw}} = \operatorname{Var}(\mathcal{L})$ - Normalized TSI:

$\mathrm{TSI}_{\mathrm{norm}} = \frac{\operatorname{Var}(\mathcal{L})}{\sum_{\ell\in\mathcal{L}}\ell + \epsilon}$

where $\epsilon > 0$ is for numerical stability. - Weighted TSI can emphasize chosen topological features (e.g., $H_1$ loops).

2.2. Computational Workflow

The TSI computation pipeline spans six steps:

Preprocessing: Feature normalization, optional detrending, symbolic transformation.
Distance Matrix Construction: Selection of Euclidean, Mahalanobis, correlation, cosine, DTW, or symbolic distances per domain.
Simplicial Complex Construction: Vietoris–Rips complex across a range of scales and resolutions.
Persistent Homology Computation: Extraction of birth–death pairs in $H_0$ and $H_1$ using libraries such as Ripser, GUDHI, or giotto-tda.
Lifetime Extraction: Compile all lifetimes $\ell = d-b$ into $\mathcal{L}$ .
TSI Calculation: Derive raw, normalized, or weighted TSI as per the defined metrics; rolling windows produce a TSI time series.

TSI-Bench provides explicit pseudocode for these steps, controlling for hyperparameters such as $\varepsilon_{\max}$ , $m$ (resolution), and window size.

2.3. Practical Guidelines for Benchmarking

Optimal parameterization is dataset/goal-specific:

Distance Metric: Correlation for financial/behavioral data, Euclidean for static features, DTW for sequential data (bearing in mind triangle inequality limitations).
Filtration Resolution ( $m$ ): Trade-off between detection granularity and computational cost; $m=50$ –200 is typical.
Scale Range: $\varepsilon_{\max}$ near the 90–99th percentile of pairwise distances.
Simplicial Complex: Vietoris–Rips for general datasets, Witness or Alpha complexes when sample size or global structure allows; landmarking for large $n$ .

2.4. Benchmarking Suite Design

The TSI-Bench protocol encompasses:

Dataset Inclusion: Synthetic point clouds with known topology, UCR time-series subsets, financial datasets (equities, FX, crypto), behavioral datasets (Google Trends), and network models (Erdős–Rényi, scale-free).
Performance Metrics: Runtime, memory peak, TSI reproducibility under bootstrap, sensitivity to metric/scale parameters, and TSI correlation with known volatility or ground truth regime change labels.
Variability Analysis: Bootstrap confidence intervals, parameter sweeps, and corresponding effect on TSI.
Visualization: TSI time series (with error bands), heatmaps across parameter grids, boxplots of raw/norm TSI, and cross-metric correlation matrices.
Comparative Protocols: Standardized comparison—establish ground-truth topological labels, baseline TSI, parameter perturbation analysis, and method ranking based on TSI coefficient of variation (CV) and discriminative power.

2.5. Illustrative Applications

Case studies demonstrate TSI-Bench’s versatility:

Consumer Behavior: TSI time series on sliding windows of Google Trends data reflect episodic pattern shifts and viral surges.
Equity Markets: TSI surges during periods of heterogeneous clustering in high-volatility market regimes (e.g., March 2024).
Foreign Exchange: TSI peaks during crisis periods (e.g., 2008) and subsides during calm.

TSI-Bench enables statistical analysis of TSI’s stability and correlation with domain-relevant ground truth such as volatility and regime labels (Diamantis, 17 Nov 2025).

3. Research Impact and Methodological Insights

TSI-Bench suites provide open infrastructures that serve as gold standards for heightening methodological transparency, reproducibility, and cross-model comparison in time series imputation and topological analytics. Both platforms encourage holistic evaluation—probing sensitivity to missingness structures, data modalities, tuning choices, and downstream application relevance.

Key cross-cutting themes include:

Systematization: Codified workflows and unified metrics resolve prior inconsistencies in experimental protocol and enable robust imputer evaluation or TDA interpretation.
Transferability: Forecasting-to-imputation adaptation and TSI’s applicability to nontraditional features (behavioral, financial) drive cross-pollination of methodological advances.
Reproducibility: Public release of code, logs, and configuration protocols lowers barriers to extension and independent validation.

4. Limitations and Prospective Directions

Notable present limitations include:

MAR/MNAR Omission: Current imputation benchmark focuses on MCAR; integration of more realistic missingness mechanisms (MAR, MNAR) is under consideration.
Metric-Specificity: TSI-Bench for TDA currently privileges specific topological summaries; extension to higher homology and uncertainty-aware metrics is planned.
Scalability: High computational burden for large, high-resolution complexes in both domains makes landmarking or down-sampling important for future scaling.
Irregularity Handling: Future pipelines aim to address asynchronous/irregular time series and domain-specific missingness patterns.

5. Conclusion

TSI-Bench, in both time series imputation and topological analytics instantiations, delivers reproducible, large-scale pipelines for standardized algorithm evaluation. These frameworks uncover nuanced dependencies among missingness models, algorithmic backbones, data regime, and downstream performance, providing critical infrastructure for model selection, practical deployment, and methodological innovation (Du et al., 2024, Diamantis, 17 Nov 2025).

Markdown Report Issue Upgrade to Chat

References (2)

TSI-Bench: Benchmarking Time Series Imputation (2024)

The Shape of Data: Topology Meets Analytics. A Practical Introduction to Topological Analytics and the Stability Index (TSI) in Business (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TSI-Bench.

TSI-Bench: Dual Benchmark for Imputation & TDA

1. TSI-Bench for Time Series Imputation

1.1. Pipeline Architecture

1.2. Benchmark Datasets

1.3. Missingness Simulation Protocols

1.4. Algorithm Portfolio

1.5. Evaluation Metrics

1.6. Empirical Insights

1.7. Practical Guidance and Prospects

2. TSI-Bench for the Topological Stability Index in TDA

2.1. Definition and Mathematical Foundations

2.2. Computational Workflow

2.3. Practical Guidelines for Benchmarking

2.4. Benchmarking Suite Design

2.5. Illustrative Applications

3. Research Impact and Methodological Insights

4. Limitations and Prospective Directions

5. Conclusion

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

TSI-Bench: Dual Benchmark for Imputation & TDA

1. TSI-Bench for Time Series Imputation

1.1. Pipeline Architecture

1.2. Benchmark Datasets

1.3. Missingness Simulation Protocols

1.4. Algorithm Portfolio

1.5. Evaluation Metrics

1.6. Empirical Insights

1.7. Practical Guidance and Prospects

2. TSI-Bench for the Topological Stability Index in TDA

2.1. Definition and Mathematical Foundations

2.2. Computational Workflow

2.3. Practical Guidelines for Benchmarking

2.4. Benchmarking Suite Design

2.5. Illustrative Applications

3. Research Impact and Methodological Insights

4. Limitations and Prospective Directions

5. Conclusion

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research