Temporal and Contextual Contrasting (TS-TCC)
- TS-TCC leverages weak and strong augmentations with cross-view future prediction to boost temporal representation performance by up to 20 percentage points.
- Its encoder combines stacked 1D-CNNs with a Transformer-style autoregressive module to extract both local features and global context.
- The CA-TCC extension uses true and pseudo-labels for supervised contrastive learning, enhancing representations in low-label and cross-domain scenarios.
Temporal and Contextual Contrasting (TS-TCC) is an unsupervised framework for time-series representation learning, characterized by the coordinated use of time-series specific weak and strong augmentations, a temporal contrasting module that leverages cross-view future prediction, and a contextual contrasting module to enforce discriminative global instance-level features. The architecture is further extensible to semi-supervised settings via the Class-Aware Temporal and Contextual Contrasting (CA-TCC) variant, integrating true and pseudo-labels for class-aware contrastive learning. TS-TCC is designed to maximize informative signal extraction from unlabeled or sparsely labeled time-series data, demonstrating robust performance in linear evaluation, few-shot, and transfer learning scenarios (Eldele et al., 2021, Eldele et al., 2022).
1. Augmentation Pipelines for Time-Series Contrastive Learning
TS-TCC defines two explicit augmentation pipelines, diverging from the stochastic paired augmentations prevalent in vision-based contrastive learning. For each time-series sample , the framework generates two views:
- Weak Augmentation : Composition of small-magnitude additive Gaussian noise (“jitter”) and amplitude “scaling”:
- Jitter: ,
- Scaling: ,
- Hyperparameters: jitter (low, e.g., $0.01$), scale ratio (e.g., $0.1$–$2.0$)
- Strong Augmentation : Segment-permutation and higher-magnitude jitter:
- Partition into random-length segments ( dataset-dependent)
- Randomly permute segments
- Apply additive Gaussian noise ,
Both augmentations are tailored to the time-series modality, preventing collapse by enforcing invariant and equivariant features through augmentation strength asymmetry. This two-view paradigm outperforms schemes relying on two weak or two strong augmentations by up to 20 percentage points in ablation studies (Eldele et al., 2022).
2. Encoder Architecture and Contextualization
Each augmented view, and , is processed by a shared encoder consisting of three stacked 1D-CNN blocks, following the precedent in [wang2017time]. This architecture produces a sequence of latent vectors:
Temporal context extraction leverages a Transformer-style autoregressive module , mapping latent sequences to contextual summary tokens , acting as global descriptors for future prediction. The Transformer comprises layers, heads, with an input projection , prepended learnable context token , and layer normalization.
3. Temporal Contrasting via Cross-View Future Prediction
TS-TCC introduces a temporal contrasting module designed to enforce temporal and view-invariance. The core mechanism is a cross-view InfoNCE objective which, for each time step and prediction horizon , tasks the context of one view to predict future latent vectors of the other view. Specifically:
- Scoring Function: For each , a head projects context to latent space.
- Loss Definition: For a mini-batch of samples, negatives are all batch elements' latents at (excluding the positive pair).
The temporal contrastive loss is the sum of the forward and reverse cross-view tasks:
This module compels the model to learn features robust to both temporal shifts and augmentation artifacts (Eldele et al., 2021, Eldele et al., 2022).
4. Contextual Contrasting for Instance Discrimination
The contextual contrasting module operates on global contexts aggregated by the autoregressive model:
- Context Projection: Each context (from both augmentations for all samples) is passed through an MLP and -normalized to yield $2N$ normalized representations.
- Contrastive Objective: For each sample , forms a positive pair; all other batch contexts are negatives. The loss is:
with and the temperature (e.g., $0.2$).
The contextual contrast encourages invariance beyond temporal structure, promoting instance-level discrimination and mitigating mode collapse.
5. Overall Objective and Algorithmic Workflow
The final unsupervised objective is a weighted combination of temporal and contextual contrastive losses: where typically , .
A high-level update step proceeds as follows:
- For each batch sample, generate weak and strong views.
- Encode both, extract temporal contexts via Transformer.
- Compute cross-view temporal losses.
- Project contexts, compute contextual contrastive loss.
- Combine, backpropagate, and update parameters.
Key hyperparameters include batch size (128 standard, 32 for few-label regime), 40 pretraining epochs, Adam optimizer, prediction horizon , and dataset-specific augmentation strengths and Transformer widths/heads.
6. Extension: Class-Aware TS-TCC (CA-TCC) for Semi-Supervised Learning
CA-TCC extends TS-TCC to leverage scarce labels with a four-phase protocol (Eldele et al., 2022):
- Unsupervised Pretraining: Standard TS-TCC loss on all unlabeled data.
- Fine-Tuning: Linear head and encoder refined with true labels.
- Pseudo-Label Generation: Predict class assignments for unlabeled sequences.
- Class-Aware Contrastive Training: Replace by supervised contrastive loss () [SupCon]: where indexes positives (same label/pseudo-label). Final loss: with , . This semi-supervised extension further improves representation quality in low-label and domain adaptation settings.
7. Empirical Evaluation: Datasets, Protocols, and Performance
TS-TCC and CA-TCC have been evaluated on diverse time-series domains, including:
- UCI-HAR (6 human activities, 9 channels, 128 steps)
- Sleep-EDF (5 sleep stages, 1 channel, 3000 steps)
- Epileptic Seizure (2 classes, 1 channel, 178 steps)
- Fault Diagnosis (multi-domain, cross-condition transfer)
Protocol overview:
- Linear probing: Freeze TS-TCC encoder, train one-layer classifier.
- Few-label fine-tuning: Fine-tune all layers using 1%, 5%, 10%, 50%, 75% labeled samples.
- Transfer learning: Pretrain on one domain, fine-tune with minimal labels on another.
Key results (mean over 5 runs) (Eldele et al., 2021, Eldele et al., 2022):
| Dataset | TS-TCC Linear Probe (Acc./MF1) | Supervised (Acc./MF1) |
|---|---|---|
| HAR | 90.37 % / 90.38 % | 90.14 % / 90.31 % |
| Sleep-EDF | 83.00 % / 73.57 % | 83.41 % / 74.78 % |
| Epilepsy | 97.23 % / 95.54 % | 96.66 % / 94.52 % |
Further findings:
- With 1% labels, TS-TCC fine-tuning outperforms from-scratch supervised training (e.g., MF1 gains of 15–29 pp).
- On fault transfer tasks, TS-TCC improves accuracy by 4–7 pp over supervised pretraining; CA-TCC adds 6 pp more on cross-domain splits.
- Ablations confirm the necessity and complementarity of both modules; cross-view temporal contrast and contextual contrast each contribute substantial gains.
8. Significance and Insights
TS-TCC demonstrates that carefully crafted two-view augmentations, coupled with a challenging cross-view temporal prediction task and global contextual contrasting, facilitate highly transferable time-series representations with minimal or no labels, rivaling or surpassing fully supervised approaches in downstream performance. The strategy of combining temporal structure learning with global context discrimination yields representations robust to both local permutations and global distortions.
CA-TCC further confirms the utility of integrating class semantics through semi-supervised extensions, substantiating the value of pseudo-labeling and supervised contrastive learning for time-series modalities with limited ground truth.
TS-TCC and its CA-TCC extension set benchmarks across standard and low-label regimes for time-series representation learning and provide generalizable architectural and methodological templates for future advances in self-supervised temporal modeling (Eldele et al., 2021, Eldele et al., 2022).