SECL: Self-Evolution Contrastive Loss
- SECL is a self-referential contrastive loss that leverages past network states (latency models) to form anchor–positive–negative triplets, improving depth estimation in challenging weather conditions.
- It integrates interval-based depth distribution encoding with dynamic margin thresholding and Jensen–Shannon divergence to stabilize predictions under environmental degradations.
- Empirical results show significant reductions in AbsRel metrics, attesting to SECL's effectiveness in enhancing model robustness and prediction sharpness in rain, fog, and snow.
Self-Evolution Contrastive Loss (SECL) is a self-referential regularization technique introduced for robust self-supervised monocular depth estimation, particularly under adverse weather conditions that impair visibility and degrade standard photometric supervision. SECL operates within the SEC-Depth framework, which leverages temporally evolving “latency” models—snapshots of the network from earlier training stages— to construct contrastive losses without the need for external teachers or handcrafted curricula. SECL combines interval-based depth distribution encoding, dynamic margin thresholding, and adaptive integration to stabilize and enhance depth prediction performance in challenging scenarios such as rain, fog, and snow (Cao et al., 19 Nov 2025).
1. SEC-Depth Framework and Training Objective
The core task addressed is self-supervised monocular depth estimation, traditionally optimized with a photometric reconstruction loss:
where is the warped counterpart of using predicted disparities, and denotes the structural similarity index.
SEC-Depth extends this by periodically introducing weather-corrupted inputs (e.g., rain/fog/snow) every iterations. The full objective is:
where is a dynamic scalar controlling the contrastive loss term . The contrastive component exploits the model’s parameter history, forming anchor–positive–negative triplets from current and prior states. Notably, this design removes dependence on external negative sampling or specialized synthetic weather curricula.
2. Mathematical Definition and Construction of SECL
Sample Construction and Depth Distribution
For each training input, SECL defines:
- Anchor disparity: from the current model.
- Positive disparity: from the current model applied to the clean image.
- Negative disparities: from historical “latency” models.
Normalized disparities in are partitioned into fixed-width bins centered at . For each pixel disparity , the assignment to bin uses a Gaussian kernel:
Averaging across all pixels yields a discrete distribution for .
SECL Formula
Let denote the Jensen–Shannon divergence. SECL is given by:
where for ,
is a dynamic margin:
with as the current step, total steps, and decay parameters; is a fixed margin (default 0.005). The factor (default ) modulates the “hardness” penalty for negatives.
3. Latency Model Queue and Dynamic Updating
Historical models (“latency models”) are maintained in a circular queue of length (typically ). This queue is updated every steps (default ) or conditionally if the variability of current anchor-negative pairs drops below that of anchor–positive. Updates use exponential moving average (EMA) smoothing:
- Given pointer ,
- For each step :
- If update triggered: (with momentum ), replace , advance mod .
Negative examples for SECL are generated by forwarding through all models in . This mechanism ensures controlled diversity and progressive difficulty for contrastive learning.
4. Adaptive Weighting and Integration into the Learning Process
To avoid destabilization at early optimization stages, the weight on is initialized at and linearly increased after epoch until :
- For :
- For :
SECL adaptively senses the severity of weather-induced degradation based on the observed : larger values indicate that current predictions diverge strongly from historical negatives, typically signifying more severe degradation (e.g., heavy rain/fog), and thus trigger stronger contrastive gradients. This enables the learning objectives to shift in response to the evolving difficulty of adverse conditions, reducing the need for manual intervention in curriculum design.
5. Implementation Details and Key Hyperparameters
Critical training hyperparameters include:
| Parameter | Default Value | Purpose |
|---|---|---|
| Degradation injection interval | steps | Frequency of weather corruption |
| Number of negatives | Size of latency queue | |
| Depth bins | Binning for interval-based distributions | |
| Kernel parameter | For Gaussian binning | |
| Margin decay | , | Dynamic margin for contrastive separation |
| Fixed negative margin | Threshold for negative diversity | |
| Hardness penalty | Negative component weight | |
| Contrastive schedule | Initial weight for | |
| EMA momentum | Latency model smoothing | |
| Queue update interval | Refreshing interval for queue |
Other training settings (learning rate, batch size, image size) align with the self-supervised backbone, such as MonoViT or PlaneDepth (Cao et al., 19 Nov 2025).
6. Empirical Evaluation: Ablation and Zero-Shot Robustness
Ablation studies underscore the incremental impact of SECL components under the MonoViT backbone on WeatherKITTI and diverse zero-shot scenarios:
- Adding contrastive learning reduces AbsRel from 0.120 to 0.106 (zero-shot: 0.169→0.146)
- Interval-based depth distribution further reduces AbsRel to 0.105 (zero-shot: 0.144)
- Full SECL (with margin terms ) yields AbsRel 0.104 (zero-shot: 0.142)
SECL-specific hyperparameter ablations identified:
- Exponential decay for outperforms linear decay (AbsRel 0.142 vs. 0.147).
- Best fixed margin: .
- Best .
- bins achieves a favorable balance between computational cost and accuracy.
Zero-shot robustness is demonstrated on six unseen adverse weather datasets (e.g., DrivingStereo rain/fog, Cityscapes snow/rain/fog), achieving the following results:
- MonoViT + SEC-Depth: AbsRel reduced from 0.169 to 0.142, outperforming WeatherDepth and Robust-Depth baselines.
- PlaneDepth + SEC-Depth: AbsRel from 0.215 to 0.168 versus WeatherDepth. Qualitative evidence shows sharper object boundaries and fewer “collapse failures” under heavy weather.
7. Significance and Relation to Prior Work
SECL defines a plug-and-play, model-agnostic loss informed solely by the network’s own evolving representations. By exploiting model “past selves” as negatives, SECL circumvents reliance on external data, handcrafted curricula, or auxiliary teacher models for curriculum scheduling. This approach generalizes across architectures and task domains characterized by variation in degradation type or severity, providing a principled mechanism for robust depth estimation under real-world adverse conditions (Cao et al., 19 Nov 2025).
A plausible implication is that such latency-driven contrastive objectives could extend beyond depth regression to other domains where robustness to evolving data distributions and degradation is critical.