TSMixup: Time-Series Mixup Augmentation
- The paper introduces TSMixup variants that combine real and synthetic samples using multiple mixup operations in raw and latent spaces.
- TSMixup is a methodology that linearly interpolates time-series data to create augmented samples, improving generalization and robustness in classification tasks.
- The approach leverages semi-supervised pseudo-labeling, enhancing performance metrics like macro F1 and Cohen's Kappa, especially under limited labeled data conditions.
TSMixup is a term that encompasses methodologies extending the mixup augmentation paradigm to time-series data, with an emphasis on augmenting data for improved generalization, robustness, and data efficiency in time-series classification settings. These approaches adapt the core principles of mixup—linear interpolation between data samples and labels—to the specific statistical and structural properties of temporal sequences. The most prominent instantiations include MixUp++ and LatentMixUp++, which are designed for both supervised and semi-supervised classification tasks involving limited labeled time-series data (Aggarwal et al., 2023). Further, recent advances such as "multi-mix" provide a theoretical and empirical bridge between manifold- or latent-space mixup and rich data interpolation for time-series and similar modalities (Shen et al., 3 Jun 2024).
1. Background: Mixup and its Challenges in Time-Series
Conventional mixup, as originally introduced for vision tasks, constructs virtual examples by convexly combining two random input pairs and their associated labels:
where . While this is effective for many high-dimensional, highly redundant data sources, directly applying such schemes to time-series data can yield unrealistic or semantically meaningless interpolation results—such as destructive interference of periodic signals or loss of class-distinguishing temporal dynamics (Aggarwal et al., 2023).
2. MixUp++ and LatentMixUp++ Methodologies
To address these challenges, MixUp++ and LatentMixUp++ introduce two principal modifications:
- Retention of Original and Synthetic Data: Training proceeds on both the observed (real) data and the synthetically generated (mixed) samples, ensuring that model learning remains anchored to the true data manifold, which is critical for time-series problems where overly aggressive augmentation can lead to implausible temporal features.
- Multiple Mixup per Batch (Augmentation Depth): For every training mini-batch, the mixup operation is applied multiple () times with distinct random draws, greatly increasing the diversity of seen synthetic samples per batch. This diversified augmentation further regularizes the model (Aggarwal et al., 2023).
Two concrete implementations are defined:
MixUp++ (Raw Space):
- Performs mixup interpolation in the raw input (time-domain) space, combining the advantages of standard mixup with both modifications above.
LatentMixUp++ (Latent Space):
- Applies mixup in the latent feature space of the neural architecture. Denoting the network as with feature extractor and classifier :
and prediction .
- The latent space often affords a more linear and semantically consistent embedding, making interpolated synthetic examples more plausible for downstream discrimination.
3. Regularization Effects and Theoretical Justification
MixUp++ and LatentMixUp++ act as powerful regularizers, particularly in low-data regimes. By inserting numerous "mixed" samples along the path joining real examples in input or latent space—and by not removing any real samples—they ensure that the classifier's decision boundary is regularized to be smooth and to interpolate linearly between real data points. This is supported by observed improvements in accuracy, macro F1, and Cohen's Kappa scores across both human activity recognition (HAR) and sleep staging domains, with gains ranging from 1–15% depending on data regime and task (Aggarwal et al., 2023).
Further, recent theoretical work on "multi-mix" demonstrates that generating an ordered set of interpolations per sample pair statistically reduces the variance of the stochastic gradients in training:
with the variance decreasing as increases, implying faster convergence and improved generalization (Shen et al., 3 Jun 2024). This suggests that multi-interpolation, as in MixUp++, is fundamentally advantageous for stable training.
4. Semi-Supervised Extension via Pseudo-Labeling
MixUp++ and LatentMixUp++ naturally extend to semi-supervised learning via pseudo-labeling, which is essential given the prevalence of unlabeled time-series datasets:
- After pre-training on the (small) labeled set, the model predicts pseudo-labels for unlabeled instances.
- Pseudo-labels above a confidence threshold (e.g., ) are included as new labeled examples.
- Subsequent mixup is performed over both original labeled and pseudo-labeled data.
- Experiments indicate that this integration yields large gains when label sparsity is high. For example, with only 1% of labels available, F1 scores increase by 6–7% when using the MixUp++ + pseudo-labeling combination, compared to using pseudo-labeling in isolation (Aggarwal et al., 2023).
5. Empirical Evaluation and Regime Dependence
Empirical validation is performed on HAR and Sleep-EDF datasets, reflecting both "regular" and "sparse" labeling conditions:
- Low labeled data regime: The advantage of MixUp++ and especially LatentMixUp++ is most pronounced, indicating that model regularization via synthetic examples is crucial to preventing overfitting to limited supervision.
- High labeled data regime: Performance gains persist (typically 1–2% absolute improvement in macro-F1) but are smaller, reflecting reduced overfitting risk.
Ablation studies confirm that (a) training on both originals and multiple diverse interpolations per batch, and (b) mixup in latent space, are both necessary for optimal regularization.
6. Generalization to Other Modalities and Methods
Theoretical and empirical findings for multi-interpolation in mixup ("multi-mix") generalize beyond time-series to any modality with complex or structured data manifolds (Shen et al., 3 Jun 2024). A plausible implication is that TSMixup can benefit from multi-mix strategies by populating the temporal path between two time-series with several interpolants rather than a single one, leading to improved generalization, greater robustness against covariate shift (e.g., time warping or additive noise), and better-calibrated predictions.
Moreover, performing mixup in intermediate or latent spaces—as in LatentMixUp++—further extends to scenarios where the raw data manifold is poorly conditioned or highly non-linear.
7. Summary Table: Comparison of TSMixup Variants
Method | Mixup Location | Retain Originals | Multi-Mix per Batch | Semi-Supervised | Key Gains |
---|---|---|---|---|---|
MixUp++ | Input (raw signal) | Yes | Yes () | Yes | Robustness, Regularization (esp. low data) |
LatentMixUp++ | Latent space | Yes | Yes () | Yes | More plausible interpolants, higher gains |
Multi-mix | Input/latent/feat. | Optional | Yes () | Not specified | Reduced gradient variance, calibration, robustness |
8. Implications, Limitations, and Future Directions
TSMixup and its contemporary variants provide an effective augmentation and regularization toolkit for time-series classification under limited labeled data. Empirical results show that these techniques outperform standard supervised learning, vanilla mixup, and permutation-based augmentations, both in high-data and particularly in low-data settings (Aggarwal et al., 2023). The ability to leverage unlabeled data through pseudo-labeling expands the applicability of TSMixup in real-world, label-constrained scenarios.
Current limitations reported include the potential for synthetic mixup samples to distort the decision boundary if not combined with sufficient real data, especially as the number of interpolants increases. Parameter tuning for the number of per-batch mixups and mixup location (input vs. latent) remains task-dependent. Future research is warranted for optimizing augmentation schedules and extending multi-mix paradigms to other structured time-series learning tasks such as forecasting and anomaly detection.
Overall, TSMixup—encompassing both MixUp++, LatentMixUp++, and multi-mix strategies—demonstrates significant performance improvements in time-series classification via a principled fusion of data augmentation, interpolation, and semi-supervised learning. The theoretical justification for variance reduction and improved generalization, together with robust experimental validation, marks these approaches as important advances in the augmentation and regularization of temporal models (Aggarwal et al., 2023, Shen et al., 3 Jun 2024).