Dynamic Multi-Views Contrastive Framework (DMCF)
- The paper presents DMCF, which dynamically weights multiple augmented views of time-series data to focus on diagnostically important segments.
- DMCF employs contrastive learning with anomaly scores from pretrained estimators to guide representation, improving diagnostic accuracy under low-label conditions.
- It integrates hybrid convolution-attention architectures in a staged training protocol, achieving state-of-the-art results in medical time-series analysis.
The Dynamic Multi-views Contrastive Framework (DMCF) is a class of algorithms designed to advance self-supervised learning for medical time-series analysis. DMCF instantiates a methodology in which multiple augmented representations ("views") of time-series data are dynamically weighted according to anomaly scores, typically produced by a pretrained discrepancy estimator or generative model. The central principle is to direct contrastive learning—using InfoNCE-style objectives—toward segments of data most likely to harbor diagnostically informative or pathological patterns. DMCF has notably appeared as key modules in frameworks such as CoDAC (Tanaka et al., 12 Jan 2026) and LMCF (Wang et al., 30 Jan 2025), achieving state-of-the-art results for disease diagnosis under data-scarce and low-label conditions.
1. Motivation and Conceptual Framework
Medical time-series (e.g., EEG, ECG) diagnosis faces challenges related to annotation scarcity and the inability of conventional contrastive schemes to highlight complex temporal pathologies. DMCF is motivated by the need to focus representation learning on regions with high contextual discrepancy—those suspected to be most relevant for diagnosis.
In CoDAC (Tanaka et al., 12 Jan 2026), a Transformer-based Contextual Discrepancy Estimator (CDE) produces stepwise anomaly scores for each input sample. DMCF consumes these scores to dynamically weight multiple stochastic views generated by augmentations (cropping, jitter, scaling), thereby adapting the learning signal toward discrepant regions. The role of DMCF is thus both robustness-promoting (via multi-view augmentation) and pathology-aware (via weighted emphasis).
Conversely, LMCF (Wang et al., 30 Jan 2025) leverages an AE-GAN extractor trained on external healthy data to yield reconstruction-based abnormality scores, which are concatenated to raw features and inform multi-head attention driven view extraction. DMCF here automates the learning of contrastive views, bypassing manual pair curation and tailored to disease-specific features.
2. Mathematical Formulation
CoDAC-style DMCF (Tanaka et al., 12 Jan 2026)
Let represent a time-series instance. Augmented views are defined as
where each signifies a distinct temporal augmentation.
The CDE outputs anomaly scores ; each score
combines reconstruction error and an attention-derived indicator.
View- discrepancy is aggregated as
with indexing time steps active in view .
Discrepancy scores are normalized by softmax to yield view weights
Given encoder , projection head , and pooled outputs , the dynamic InfoNCE loss is
where .
LMCF-style DMCF (Wang et al., 30 Jan 2025)
A pre-trained AE-GAN computes for each sample reconstruction discrepancy
treated as an abnormality score and concatenated, .
Encoder employs both a dilated convolutional backbone and a multi-head attention (MHA) view generator. For each head-,
producing .
Contrastive losses are defined for inter-view () and intra-view () pairings:
These are further enriched with hierarchical contrastive terms (subject, trial, epoch, temporal).
3. Encoder Design and Feature Pipeline
Both CoDAC and LMCF DMCFs employ hybrid convolution-attention architectures to extract temporal features.
- Dilated Convolutional Layers: Expanding receptive fields with exponentially increasing dilation rates (kernel size 3, rates 1,2,4,…), these layers efficiently model long-range dependencies without downsampling, crucial for temporal pattern extraction in medical signals (Tanaka et al., 12 Jan 2026, Wang et al., 30 Jan 2025).
- Multi-Head Self-Attention Blocks: Following convolution, multiple Transformer-style attention blocks (parameterized by number of heads , depth , model dimension ) provide the capacity for fine-grained temporal subspace partitioning and dynamic view generation.
- Projection Heads: A two-layer MLP (hidden dimension , output ) is employed after temporal pooling to map features into a compact contrastive embedding space.
Multi-view generation proceeds by applying augmentations and encoding each view with the same backbone. In CoDAC, anomaly weights may be applied at the feature level for fine granularity; aggregated view-level weights are preferred in practice.
4. Training Protocols and Algorithmic Details
Training occurs in structured stages to exploit prior knowledge and maximize generalization in under-annotated contexts.
- Pre-training Estimator/Extractor: CoDAC trains the CDE on external healthy time-series data, optimizing
LMCF analogously pre-trains the AE-GAN () using adversarial and reconstruction losses.
- Feature Augmentation: Each target sample receives discrepancy/abnormality scores, providing diagnostic context.
- Self-Supervised Contrastive Training: DMCF is optimized for 100–200 epochs (CoDAC) or epochs (LMCF), leveraging unlabeled mixes of healthy and target data. Optimization employs Adam or similar, with temperature hyperparameters (–$0.07$), batch size 64, and learning rates –.
- Supervised Fine-Tuning: A linear classifier is attached; options include partial fine-tuning (encoder frozen) or full fine-tuning (joint optimization), using cross-entropy loss on the limited label set.
Training dynamically recalculates view weights based on fixed estimator scores but variable feature representations, ensuring that gradient updates disproportionately affect representations of views sampling higher-pathology segments.
5. Empirical Performance and Ablation Studies
DMCF yields superior diagnostic performance across datasets, particularly in small-sample or low-label regimes.
| Model | Accuracy (%) | AUROC (%) | AUPRC (%) |
|---|---|---|---|
| CoDAC w/o DMCF (single view) | 93.50 ± 2.00 | 97.20 ± 1.35 | 97.25 ± 1.30 |
| CoDAC w/o dynamic weighting | 94.00 ± 1.80 | 97.80 ± 1.15 | 97.85 ± 1.10 |
| CoDAC (full DMCF) | 94.90 ± 1.70 | 98.35 ± 1.10 | 98.40 ± 1.05 |
| LMCF (AD, 100% labeled) | 93.23 ± 5.25 | 98.03 ± 1.71 | — |
| COMET (AD, 100% labeled) | 84.50 ± 4.46 | 94.44 ± 2.37 | — |
Dynamic weighting boosts AUROC/AUPRC by up to +0.55 compared to uniform weighting (Tanaka et al., 12 Jan 2026). In LMCF, the framework outperforms seven baselines on AD, TDBrain, and PTB datasets under both full label and 10% label regimes (Wang et al., 30 Jan 2025). Metric definitions follow standard conventions.
6. Extensions, Interpretation, and Applications
DMCF generalizes beyond pretext contrastive learning and manual view design. Multi-head attention enables data-driven partitioning of the feature space, facilitating the extraction of views that best represent the underlying temporal and pathological structure. It eliminates manual pair engineering and is robust to patient-specific variations.
Applications extend beyond medical signals to sensor-based activity recognition, speech, and financial time-series, wherever hierarchical or multi-channel dynamics are present. There is potential for integrating multi-modality signals and for augmenting view-weighting with Bayesian uncertainty quantification.
A plausible implication is that dynamic, context-aware contrastive frameworks, as instantiated in DMCF, could become baseline methodology for contrastive representation learning in diagnosis-centric domains with limited labeled data. The ability to adaptively highlight regions of interest, guided by pretrained anomaly or discrepancy estimators, distinguishes DMCF from earlier static or handcrafted multi-view contrastive approaches.