TS-TCC: Temporal & Contextual Contrast in Time-Series
- The paper demonstrates that combining cross-view temporal prediction with global contextual discrimination yields robust and transferable time-series features, outperforming previous methods.
- It details a dual-branch architecture using weak and strong augmentations to capture fine-grained temporal dynamics and preserve global context.
- Empirical evaluations on HAR, Sleep-EDF, and Epilepsy datasets show improvements in accuracy and transferability, even in few-shot scenarios.
Time-Series Representation Learning via Temporal and Contextual Contrasting (TS-TCC) is a self-supervised framework for learning expressive features from unlabeled time-series data by combining cross-view temporal prediction and global contextual discrimination. TS-TCC is specifically tailored to temporal signals, addressing the limitations of prior contrastive paradigms developed for spatially-structured domains, such as images, by integrating both robust temporal modeling and context-aware contrastive objectives (Eldele et al., 2021, Eldele et al., 2022).
1. Conceptual Motivation and Framework Overview
The core motivation for TS-TCC lies in the challenge of extracting discriminative representations from unlabeled time-series, where complex temporal dependencies coexist with high labeling costs. While established contrastive learning methods (e.g., SimCLR, MoCo, CPC) succeed in spatial domains, they insufficiently preserve temporal structure when naively applied to sequence data, especially under standard augmentations.
TS-TCC introduces a dual-branch paradigm: each raw sequence is transformed into two correlated views via distinct augmentation pipelines (a “weak” and a “strong” view), designed to perturb amplitude, order, and fine-grained temporal relations. The self-supervised signal is established by two modules:
- A temporal contrasting (TC) module that enforces cross-view future prediction to capture invariant, temporally aware features;
- A contextual contrasting (CC) module that aligns global context vectors from different views of the same sequence while discriminating across sequences.
This structure yields a representation that is both temporally robust and contextually discriminative.
2. Time-Series-Specific Augmentation Strategies
TS-TCC employs augmentation techniques specifically devised for temporal signals, in contrast to the image-centric operations typical in prior contrastive SSL.
Weak augmentations () maintain global shape and primary dynamics:
- Jitter: Additive small-variance Gaussian noise per channel, post min–max normalization ().
- Scaling: Multiply sequence by a global random factor sampled from (typically ).
- Time-shift: (Extension in (Eldele et al., 2022)) Circularly shift the sequence by up to .
Strong augmentations () introduce substantial distortion:
- Permutation: Segmentation of into contiguous parts ( dataset-dependent; e.g., for HAR, for Sleep-EDF), followed by random reordering.
- High-variance jitter: Additive Gaussian noise with .
- Compositional perturbations: e.g., simultaneous permutation and strong jitter.
Each input is transformed into: These augmentations enable TS-TCC to capture invariance to amplitude changes and time order distortions while preserving temporal coherence.
3. Temporal Contrasting Module
The temporal contrasting module is designed to encourage representations that encode temporal dependencies robust to augmentation. The flow is as follows:
- Both and are processed by a shared encoder (a 3-block 1D CNN yielding per-timestep embeddings ), optionally followed by an MLP.
- Each sequence of embeddings is passed through an autoregressive Transformer (4 layers, , 4 heads, pre-norm, dropout 0.1), prepending a learned “context token.”
- At each time , the output serves as a summary of .
Cross-View Future Prediction:
At a given time and offset (, with ), the context vector from one view is used to predict the future latent of the other view. The prediction is parameterized by linear projections : and vice versa. The prediction is optimized with an InfoNCE-style loss contrasting the pair against negatives from other sequences: The total temporal contrasting loss is .
This enforces not only invariance to augmentation but also promotes modeling of sequence evolution across perturbations.
4. Contextual Contrasting Module
Following temporal contrasting, TS-TCC extracts global context vectors—usually the output of the “CLS” token of the Transformer at the final position—for each view. These vectors, and , are further processed by an MLP projection head to produce context embeddings .
Given a batch of sequences, the set of $2N$ context vectors is used in an instance-wise InfoNCE contrastive task. For each sequence, the two views () form a positive pair ( indexes one view, the alternative view). All other vectors in the batch serve as negatives.
The contextual contrasting loss is:
This loss maximizes agreement between global summaries of the two views of each sample, thereby enhancing sample-level discrimination.
5. Joint Objective and Network Architecture
The complete TS-TCC loss is a weighted sum: Empirically, , yield stable results.
Network architecture:
- Encoder: 3-block 1D CNN (Conv–BatchNorm–ReLU–Dropout–MaxPool), per-timestep features.
- Autoregressive head: 4-layer Transformer ( for most datasets).
- Projection head: two-layer MLP for CC/SCC.
- Optimization: Adam, lr=, weight decay=, , . Batch size: 128.
- Augmentation settings: permutation segments (UCI HAR), (Epilepsy), (Sleep-EDF); scale ratio=2; jitter as above; temperature ; for future step range.
This modular design is broadly compatible with univariate and multivariate time-series and is dataset-agnostic aside from augmentation tuning.
6. Empirical Evaluation and Performance
TS-TCC was evaluated on multiple real-world datasets: UCI HAR (9-axis motion, 6 classes), Sleep-EDF EEG (single channel, 5 sleep stages), Epileptic Seizure Recognition (single channel, binary), as well as a fault diagnosis transfer setting and UCR benchmark datasets (Eldele et al., 2021, Eldele et al., 2022).
Linear evaluation (encoder frozen):
| Dataset | Random | SSL-ECG | CPC | SimCLR | TS-TCC | Supervised |
|---|---|---|---|---|---|---|
| HAR (ACC) | 57.9 | 65.3 | 83.8 | 81.0 | 90.4 | 90.1 |
| Sleep (ACC) | 35.6 | 74.6 | 82.8 | 78.9 | 83.0 | 83.4 |
| Epilepsy (ACC) | 90.3 | 93.7 | 96.6 | 96.1 | 97.2 | 96.7 |
Few-shot or semi-supervised fine-tuning:
- With 1% labels, TS-TCC achieves 70% (HAR) and 90% (Epilepsy) MF1, significantly exceeding supervised (which drops below 50%).
- With 10% labeled data, TS-TCC performance is within 2% of full-supervision on all datasets.
Transfer learning:
On the four-domain fault-diagnosis dataset (12 domain pairs):
- Supervised pretrain + fine-tune: 63.8% accuracy
- TS-TCC pretrain + fine-tune: 67.8% (+4.0% absolute gain)
This suggests that TS-TCC representations possess strong domain transferability even with minimal downstream labels.
7. Methodological Variants, Ablations, and Extensions
Ablation studies demonstrate:
- TC only: Same-view prediction yields significantly lower accuracy (e.g., HAR ACC 82.8%).
- Adding cross-aug prediction: Both strongweak and weakstrong future prediction improves accuracy (HAR 87.9%).
- Full TS-TCC (TC + CC): Further improvement (HAR 90.4%). Single-augmentation variants show a sharp decline, especially on HAR and Sleep.
Sensitivity Analysis:
Using set to 40% of sequence length balances context diversity and temporal difficulty. Loss weights , are robust to moderate perturbations.
Semi-supervised extension (CA-TCC) (Eldele et al., 2022):
With limited labels, CA-TCC leverages pseudo-labels after self-supervised pretraining. In place of the unsupervised contextual contrastive loss, it introduces a class-aware (supervised) contrastive loss:
- For batch index and pseudo-labels , positives are all other batch members with , negatives are the rest.
- Empirically, with 1% labels, CA-TCC attains 77.8% accuracy and 72.6% MF1 (10 datasets), besting baselines such as MeanTeacher or FixMatch.
Limitations:
Current TS-TCC evaluations are restricted to single-modality (univariate or multivariate) time-series. Augmentation strategies are fixed rather than learned or adaptive, and the autoregressive modeling capacity is limited to moderate-sized Transformers.
8. Conclusion and Significance
TS-TCC defines a general approach for self-supervised sequence representation learning tailored to time-series, exploiting both cross-view temporal prediction and global contextual contrasting. In benchmarks, it establishes or matches fully supervised performance, excels in few-shot and transfer settings, and serves as a foundation for extensible semi-supervised variants (CA-TCC). The empirical results underscore the efficacy of combining temporal and contextual signals via contrasting for robust, transferable time-series representations (Eldele et al., 2021, Eldele et al., 2022).