Papers
Topics
Authors
Recent
Search
2000 character limit reached

Dynamic Multi-Views Contrastive Framework (DMCF)

Updated 19 January 2026
  • The paper presents DMCF, which dynamically weights multiple augmented views of time-series data to focus on diagnostically important segments.
  • DMCF employs contrastive learning with anomaly scores from pretrained estimators to guide representation, improving diagnostic accuracy under low-label conditions.
  • It integrates hybrid convolution-attention architectures in a staged training protocol, achieving state-of-the-art results in medical time-series analysis.

The Dynamic Multi-views Contrastive Framework (DMCF) is a class of algorithms designed to advance self-supervised learning for medical time-series analysis. DMCF instantiates a methodology in which multiple augmented representations ("views") of time-series data are dynamically weighted according to anomaly scores, typically produced by a pretrained discrepancy estimator or generative model. The central principle is to direct contrastive learning—using InfoNCE-style objectives—toward segments of data most likely to harbor diagnostically informative or pathological patterns. DMCF has notably appeared as key modules in frameworks such as CoDAC (Tanaka et al., 12 Jan 2026) and LMCF (Wang et al., 30 Jan 2025), achieving state-of-the-art results for disease diagnosis under data-scarce and low-label conditions.

1. Motivation and Conceptual Framework

Medical time-series (e.g., EEG, ECG) diagnosis faces challenges related to annotation scarcity and the inability of conventional contrastive schemes to highlight complex temporal pathologies. DMCF is motivated by the need to focus representation learning on regions with high contextual discrepancy—those suspected to be most relevant for diagnosis.

In CoDAC (Tanaka et al., 12 Jan 2026), a Transformer-based Contextual Discrepancy Estimator (CDE) produces stepwise anomaly scores for each input sample. DMCF consumes these scores to dynamically weight multiple stochastic views generated by augmentations (cropping, jitter, scaling), thereby adapting the learning signal toward discrepant regions. The role of DMCF is thus both robustness-promoting (via multi-view augmentation) and pathology-aware (via weighted emphasis).

Conversely, LMCF (Wang et al., 30 Jan 2025) leverages an AE-GAN extractor trained on external healthy data to yield reconstruction-based abnormality scores, which are concatenated to raw features and inform multi-head attention driven view extraction. DMCF here automates the learning of contrastive views, bypassing manual pair curation and tailored to disease-specific features.

2. Mathematical Formulation

Let xRT×Dx \in \mathbb{R}^{T\times D} represent a time-series instance. Augmented views are defined as

{x(k)}k=1K,x(k)=Aug(k)(x)\{x^{(k)}\}_{k=1}^K, \quad x^{(k)} = \mathsf{Aug}^{(k)}(x)

where each Aug(k)\mathsf{Aug}^{(k)} signifies a distinct temporal augmentation.

The CDE outputs anomaly scores SCDE(x)=[s1,,sT]S_{\mathrm{CDE}}(x) = [s_1, \dots, s_T]; each score

st=F(xtx^t2, At)s_t = \mathcal{F}\Big(\Vert x_t - \hat{x}_t \Vert_2, ~A_t\Big)

combines reconstruction error and an attention-derived indicator.

View-kk discrepancy is aggregated as

δ(k)=1T(k)tT(k)st\delta^{(k)} = \frac{1}{|\mathcal{T}^{(k)}|} \sum_{t \in \mathcal{T}^{(k)}} s_t

with T(k)\mathcal{T}^{(k)} indexing time steps active in view kk.

Discrepancy scores are normalized by softmax to yield view weights

w(k)=exp(δ(k))j=1Kexp(δ(j))w^{(k)} = \frac{\exp(\delta^{(k)})}{\sum_{j=1}^K \exp(\delta^{(j)})}

Given encoder EE, projection head PP, and pooled outputs zi(k)=P(Pool(hi(k)))z_i^{(k)} = P(\mathrm{Pool}(h_i^{(k)})), the dynamic InfoNCE loss is

LDMCF=1Ni=1Nk=1Kwi(k)logexp(sim(zi,zi(k))/τ)j=1Nexp(sim(zi,zj)/τ)L_{\mathrm{DMCF}} = -\frac{1}{N}\sum_{i=1}^N \sum_{k=1}^K w_i^{(k)} \log \frac{\exp\left(\mathrm{sim}(z_i, z_{i}^{(k)})/\tau\right)} {\sum_{j=1}^N \exp(\mathrm{sim}(z_i, z_j)/\tau)}

where sim(u,v)=uv/(uv)\mathrm{sim}(u,v) = u^\top v / (\|u\|\|v\|).

A pre-trained AE-GAN computes for each sample reconstruction discrepancy

Ei=MSE(Ggen(xˉi),xˉi)\mathcal{E}_i = \mathrm{MSE}(G_{\mathrm{gen}}(\bar{x}_i), \bar{x}_i)

treated as an abnormality score and concatenated, xi=[xˉi;Ei]x_i = [\bar{x}_i; \mathcal{E}_i].

Encoder EDMCFE_{\mathrm{DMCF}} employs both a dilated convolutional backbone and a multi-head attention (MHA) view generator. For each head-vv,

Qv=XWvQ,  Kv=XWvK,  Vv=XWvV headv(X)=softmax(QvKv/d)Vv\begin{aligned} Q_v &= X W_v^Q,~~K_v = X W_v^K,~~V_v = X W_v^V \ \mathrm{head}_v(X) &= \mathrm{softmax}(Q_v K_v^\top/\sqrt{d}) V_v \end{aligned}

producing GiRT×V×dG_i \in \mathbb{R}^{T \times V \times d}.

Contrastive losses are defined for inter-view (LIRVL_{IRV}) and intra-view (LIAVL_{IAV}) pairings: LIRV=1Vv=1Vlogexp(sim(Gi,v1,Gi,v2)/τ)u=1Vexp(sim(Gi,v1,Gi,u2)/τ)L_{IRV} = -\frac{1}{V} \sum_{v=1}^V \log \frac{\exp\left(\mathrm{sim}(G^1_{i,v}, G^2_{i,v})/\tau\right)}{\sum_{u=1}^V \exp\left(\mathrm{sim}(G^1_{i,v}, G^2_{i,u})/\tau\right)}

LIAV=1MVv=1Vi=1Mlogexp(sim(gi,v1,gj,v2)/τ)jSi+kSiexp(sim(gi,v1,gk,v2)/τ)L_{IAV} = -\frac{1}{MV} \sum_{v=1}^V \sum_{i=1}^M \log \frac{\exp(\mathrm{sim}(g^1_{i,v}, g^2_{j,v})/\tau)_{j \in S_i^+}}{\sum_{k \in S_i^-} \exp(\mathrm{sim}(g^1_{i,v}, g^2_{k,v})/\tau)}

These are further enriched with hierarchical contrastive terms (subject, trial, epoch, temporal).

3. Encoder Design and Feature Pipeline

Both CoDAC and LMCF DMCFs employ hybrid convolution-attention architectures to extract temporal features.

  • Dilated Convolutional Layers: Expanding receptive fields with exponentially increasing dilation rates (kernel size 3, rates 1,2,4,…), these layers efficiently model long-range dependencies without downsampling, crucial for temporal pattern extraction in medical signals (Tanaka et al., 12 Jan 2026, Wang et al., 30 Jan 2025).
  • Multi-Head Self-Attention Blocks: Following convolution, multiple Transformer-style attention blocks (parameterized by number of heads HH, depth LL, model dimension dd) provide the capacity for fine-grained temporal subspace partitioning and dynamic view generation.
  • Projection Heads: A two-layer MLP (hidden dimension d/2d/2, output dd) is employed after temporal pooling to map features into a compact contrastive embedding space.

Multi-view generation proceeds by applying augmentations and encoding each view with the same backbone. In CoDAC, anomaly weights may be applied at the feature level for fine granularity; aggregated view-level weights are preferred in practice.

4. Training Protocols and Algorithmic Details

Training occurs in structured stages to exploit prior knowledge and maximize generalization in under-annotated contexts.

  • Pre-training Estimator/Extractor: CoDAC trains the CDE on external healthy time-series data, optimizing

LCDE=ExHealthyxfCDE(x)22L_{\mathrm{CDE}} = \mathbb{E}_{x \sim \text{Healthy}} \| x - f_{\mathrm{CDE}}(x) \|_2^2

LMCF analogously pre-trains the AE-GAN (Ggen,DdisG_{\mathrm{gen}}, D_{\mathrm{dis}}) using adversarial and reconstruction losses.

  • Feature Augmentation: Each target sample receives discrepancy/abnormality scores, providing diagnostic context.
  • Self-Supervised Contrastive Training: DMCF is optimized for 100–200 epochs (CoDAC) or N2N_2 epochs (LMCF), leveraging unlabeled mixes of healthy and target data. Optimization employs Adam or similar, with temperature hyperparameters (τ=0.1\tau=0.1–$0.07$), batch size 64, and learning rates 1×1031 \times 10^{-3}1×1041 \times 10^{-4}.
  • Supervised Fine-Tuning: A linear classifier is attached; options include partial fine-tuning (encoder frozen) or full fine-tuning (joint optimization), using cross-entropy loss on the limited label set.

Training dynamically recalculates view weights based on fixed estimator scores but variable feature representations, ensuring that gradient updates disproportionately affect representations of views sampling higher-pathology segments.

5. Empirical Performance and Ablation Studies

DMCF yields superior diagnostic performance across datasets, particularly in small-sample or low-label regimes.

Model Accuracy (%) AUROC (%) AUPRC (%)
CoDAC w/o DMCF (single view) 93.50 ± 2.00 97.20 ± 1.35 97.25 ± 1.30
CoDAC w/o dynamic weighting 94.00 ± 1.80 97.80 ± 1.15 97.85 ± 1.10
CoDAC (full DMCF) 94.90 ± 1.70 98.35 ± 1.10 98.40 ± 1.05
LMCF (AD, 100% labeled) 93.23 ± 5.25 98.03 ± 1.71
COMET (AD, 100% labeled) 84.50 ± 4.46 94.44 ± 2.37

Dynamic weighting boosts AUROC/AUPRC by up to +0.55 compared to uniform weighting (Tanaka et al., 12 Jan 2026). In LMCF, the framework outperforms seven baselines on AD, TDBrain, and PTB datasets under both full label and 10% label regimes (Wang et al., 30 Jan 2025). Metric definitions follow standard conventions.

6. Extensions, Interpretation, and Applications

DMCF generalizes beyond pretext contrastive learning and manual view design. Multi-head attention enables data-driven partitioning of the feature space, facilitating the extraction of views that best represent the underlying temporal and pathological structure. It eliminates manual pair engineering and is robust to patient-specific variations.

Applications extend beyond medical signals to sensor-based activity recognition, speech, and financial time-series, wherever hierarchical or multi-channel dynamics are present. There is potential for integrating multi-modality signals and for augmenting view-weighting with Bayesian uncertainty quantification.

A plausible implication is that dynamic, context-aware contrastive frameworks, as instantiated in DMCF, could become baseline methodology for contrastive representation learning in diagnosis-centric domains with limited labeled data. The ability to adaptively highlight regions of interest, guided by pretrained anomaly or discrepancy estimators, distinguishes DMCF from earlier static or handcrafted multi-view contrastive approaches.

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Dynamic Multi-views Contrastive Framework (DMCF).