Papers
Topics
Authors
Recent
Search
2000 character limit reached

Temporal and Contextual Contrasting (TS-TCC)

Updated 18 January 2026
  • TS-TCC leverages weak and strong augmentations with cross-view future prediction to boost temporal representation performance by up to 20 percentage points.
  • Its encoder combines stacked 1D-CNNs with a Transformer-style autoregressive module to extract both local features and global context.
  • The CA-TCC extension uses true and pseudo-labels for supervised contrastive learning, enhancing representations in low-label and cross-domain scenarios.

Temporal and Contextual Contrasting (TS-TCC) is an unsupervised framework for time-series representation learning, characterized by the coordinated use of time-series specific weak and strong augmentations, a temporal contrasting module that leverages cross-view future prediction, and a contextual contrasting module to enforce discriminative global instance-level features. The architecture is further extensible to semi-supervised settings via the Class-Aware Temporal and Contextual Contrasting (CA-TCC) variant, integrating true and pseudo-labels for class-aware contrastive learning. TS-TCC is designed to maximize informative signal extraction from unlabeled or sparsely labeled time-series data, demonstrating robust performance in linear evaluation, few-shot, and transfer learning scenarios (Eldele et al., 2021, Eldele et al., 2022).

1. Augmentation Pipelines for Time-Series Contrastive Learning

TS-TCC defines two explicit augmentation pipelines, diverging from the stochastic paired augmentations prevalent in vision-based contrastive learning. For each time-series sample xRT×Cx \in \mathbb{R}^{T \times C}, the framework generates two views:

  • Weak Augmentation Tw(x)\mathcal{T}_w(x): Composition of small-magnitude additive Gaussian noise (“jitter”) and amplitude “scaling”:
    • Jitter: ϵN(0,σ2)\epsilon \sim \mathcal{N}(0, \sigma^2), x~=x+ϵ\tilde x = x + \epsilon
    • Scaling: sUniform(1α,1+α)s \sim \text{Uniform}(1-\alpha, 1+\alpha), xw=sx~x^w = s \cdot \tilde x
    • Hyperparameters: jitter σ\sigma (low, e.g., $0.01$), scale ratio α\alpha (e.g., $0.1$–$2.0$)
  • Strong Augmentation Ts(x)\mathcal{T}_s(x): Segment-permutation and higher-magnitude jitter:
    • Partition xx into MM random-length segments (MM dataset-dependent)
    • Randomly permute segments xperm\rightarrow x_{\text{perm}}
    • Apply additive Gaussian noise ϵN(0,σ2)\epsilon \sim \mathcal{N}(0, \sigma^2), xs=xperm+ϵx^s = x_{\text{perm}} + \epsilon

Both augmentations are tailored to the time-series modality, preventing collapse by enforcing invariant and equivariant features through augmentation strength asymmetry. This two-view paradigm outperforms schemes relying on two weak or two strong augmentations by up to 20 percentage points in ablation studies (Eldele et al., 2022).

2. Encoder Architecture and Contextualization

Each augmented view, xwx^w and xsx^s, is processed by a shared encoder fencf_{\mathrm{enc}} consisting of three stacked 1D-CNN blocks, following the precedent in [wang2017time]. This architecture produces a sequence of latent vectors:

zw=fenc(xw)=[z1w,,zTw]RT×d\mathbf{z}^w = f_{\mathrm{enc}}(x^w) = [z_1^w,\ldots,z_T^w] \in \mathbb{R}^{T \times d}

zs=fenc(xs)=[z1s,,zTs]RT×d\mathbf{z}^s = f_{\mathrm{enc}}(x^s) = [z_1^s,\ldots,z_T^s] \in \mathbb{R}^{T \times d}

Temporal context extraction leverages a Transformer-style autoregressive module farf_{\mathrm{ar}}, mapping latent sequences to contextual summary tokens ctRhc_t \in \mathbb{R}^h, acting as global descriptors for future prediction. The Transformer comprises LL layers, HH heads, with an input projection WTran\mathcal{W}_{\text{Tran}}, prepended learnable context token c0c_0, and layer normalization.

3. Temporal Contrasting via Cross-View Future Prediction

TS-TCC introduces a temporal contrasting module designed to enforce temporal and view-invariance. The core mechanism is a cross-view InfoNCE objective which, for each time step tt and prediction horizon k=1,,Kk=1,\ldots,K, tasks the context ctc_t of one view to predict future latent vectors of the other view. Specifically:

  • Scoring Function: For each kk, a head Wk:RhRd\mathcal{W}_k : \mathbb{R}^h \rightarrow \mathbb{R}^d projects context to latent space.

st,k(ct,zt+k):=exp([Wk(ct)]zt+k)s_{t,k}(c_t, z_{t+k}) := \exp([\mathcal{W}_k(c_t)]^\top z_{t+k})

  • Loss Definition: For a mini-batch of NN samples, negatives Nt,k\mathcal{N}_{t,k} are all batch elements' latents at t+kt+k (excluding the positive pair).

The temporal contrastive loss is the sum of the forward and reverse cross-view tasks: LTCsw=1Kk=1Klogst,k(cts,zt+kw)nNt,kst,k(cts,znw)L_{\text{TC}}^{s \to w} = -\frac{1}{K} \sum_{k=1}^K \log \frac{s_{t,k}(c_t^s, z_{t+k}^w)}{\sum_{n \in \mathcal{N}_{t,k}} s_{t,k}(c_t^s, z_n^w)}

LTCws=1Kk=1Klogst,k(ctw,zt+ks)nNt,kst,k(ctw,zns)L_{\text{TC}}^{w \to s} = -\frac{1}{K} \sum_{k=1}^K \log \frac{s_{t,k}(c_t^w, z_{t+k}^s)}{\sum_{n \in \mathcal{N}_{t,k}} s_{t,k}(c_t^w, z_n^s)}

Ltemp=LTCsw+LTCwsL_{\text{temp}} = L_{\text{TC}}^{s \to w} + L_{\text{TC}}^{w \to s}

This module compels the model to learn features robust to both temporal shifts and augmentation artifacts (Eldele et al., 2021, Eldele et al., 2022).

4. Contextual Contrasting for Instance Discrimination

The contextual contrasting module operates on global contexts aggregated by the autoregressive model:

  • Context Projection: Each context ctc_t (from both augmentations for all samples) is passed through an MLP g()g(\cdot) and 2\ell_2-normalized to yield $2N$ normalized representations.
  • Contrastive Objective: For each sample ii, (hti,w,hti,s)(h_t^{i,w}, h_t^{i,s}) forms a positive pair; all other batch contexts are negatives. The loss is:

LCC=i=1Nlogexp(sim(hti,hti+)/τ)m=12N1[mi]exp(sim(hti,htm)/τ)\mathcal{L}_{CC} = -\sum_{i=1}^N \log \frac{ \exp(\operatorname{sim}(h_t^i, h_t^{i+})/\tau) }{ \sum_{m=1}^{2N} \mathbf{1}_{[m\neq i]} \exp(\operatorname{sim}(h_t^i, h_t^{m})/\tau) }

with sim(u,v)=uvuv\operatorname{sim}(u,v) = \frac{u^\top v}{\|u\|\|v\|} and τ\tau the temperature (e.g., $0.2$).

The contextual contrast encourages invariance beyond temporal structure, promoting instance-level discrimination and mitigating mode collapse.

5. Overall Objective and Algorithmic Workflow

The final unsupervised objective is a weighted combination of temporal and contextual contrastive losses: LTS-TCC=λ1Ltemp+λ2LCC\mathcal{L}_{\text{TS-TCC}} = \lambda_1 L_{\text{temp}} + \lambda_2 L_{CC} where typically λ1=1\lambda_1 = 1, λ2=0.7\lambda_2 = 0.7.

A high-level update step proceeds as follows:

  • For each batch sample, generate weak and strong views.
  • Encode both, extract temporal contexts via Transformer.
  • Compute cross-view temporal losses.
  • Project contexts, compute contextual contrastive loss.
  • Combine, backpropagate, and update parameters.

Key hyperparameters include batch size (128 standard, 32 for few-label regime), 40 pretraining epochs, Adam optimizer, prediction horizon K=0.4TK = \lfloor 0.4\,T \rfloor, and dataset-specific augmentation strengths and Transformer widths/heads.

6. Extension: Class-Aware TS-TCC (CA-TCC) for Semi-Supervised Learning

CA-TCC extends TS-TCC to leverage scarce labels with a four-phase protocol (Eldele et al., 2022):

  1. Unsupervised Pretraining: Standard TS-TCC loss on all unlabeled data.
  2. Fine-Tuning: Linear head and encoder refined with true labels.
  3. Pseudo-Label Generation: Predict class assignments for unlabeled sequences.
  4. Class-Aware Contrastive Training: Replace LCC\mathcal{L}_{CC} by supervised contrastive loss (LSCC\mathcal{L}_{\text{SCC}}) [SupCon]: SCC(i)=1P(i)pP(i)logexp(sim(c~i,c~p)/τ)miexp(sim(c~i,c~m)/τ)\ell_{\mathrm{SCC}}(i) = -\frac{1}{|P(i)|} \sum_{p \in P(i)} \log \frac{ \exp( \operatorname{sim}(\tilde c_i, \tilde c_p)/\tau ) }{ \sum_{m\ne i} \exp( \operatorname{sim}(\tilde c_i, \tilde c_m)/\tau ) } where P(i)P(i) indexes positives (same label/pseudo-label). Final loss: Lsemi=λ3Ltemp+λ4LSCCL_{\text{semi}} = \lambda_3 L_{\text{temp}} + \lambda_4 L_{\mathrm{SCC}} with λ30.01\lambda_3 \approx 0.01, λ40.7\lambda_4 \approx 0.7. This semi-supervised extension further improves representation quality in low-label and domain adaptation settings.

7. Empirical Evaluation: Datasets, Protocols, and Performance

TS-TCC and CA-TCC have been evaluated on diverse time-series domains, including:

  • UCI-HAR (6 human activities, 9 channels, 128 steps)
  • Sleep-EDF (5 sleep stages, 1 channel, 3000 steps)
  • Epileptic Seizure (2 classes, 1 channel, 178 steps)
  • Fault Diagnosis (multi-domain, cross-condition transfer)

Protocol overview:

  • Linear probing: Freeze TS-TCC encoder, train one-layer classifier.
  • Few-label fine-tuning: Fine-tune all layers using 1%, 5%, 10%, 50%, 75% labeled samples.
  • Transfer learning: Pretrain on one domain, fine-tune with minimal labels on another.

Key results (mean over 5 runs) (Eldele et al., 2021, Eldele et al., 2022):

Dataset TS-TCC Linear Probe (Acc./MF1) Supervised (Acc./MF1)
HAR 90.37 % / 90.38 % 90.14 % / 90.31 %
Sleep-EDF 83.00 % / 73.57 % 83.41 % / 74.78 %
Epilepsy 97.23 % / 95.54 % 96.66 % / 94.52 %

Further findings:

  • With 1% labels, TS-TCC fine-tuning outperforms from-scratch supervised training (e.g., MF1 gains of 15–29 pp).
  • On fault transfer tasks, TS-TCC improves accuracy by 4–7 pp over supervised pretraining; CA-TCC adds 6 pp more on cross-domain splits.
  • Ablations confirm the necessity and complementarity of both modules; cross-view temporal contrast and contextual contrast each contribute substantial gains.

8. Significance and Insights

TS-TCC demonstrates that carefully crafted two-view augmentations, coupled with a challenging cross-view temporal prediction task and global contextual contrasting, facilitate highly transferable time-series representations with minimal or no labels, rivaling or surpassing fully supervised approaches in downstream performance. The strategy of combining temporal structure learning with global context discrimination yields representations robust to both local permutations and global distortions.

CA-TCC further confirms the utility of integrating class semantics through semi-supervised extensions, substantiating the value of pseudo-labeling and supervised contrastive learning for time-series modalities with limited ground truth.

TS-TCC and its CA-TCC extension set benchmarks across standard and low-label regimes for time-series representation learning and provide generalizable architectural and methodological templates for future advances in self-supervised temporal modeling (Eldele et al., 2021, Eldele et al., 2022).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Temporal and Contextual Contrasting (TS-TCC).