SECL: Self-Evolution Contrastive Loss

Updated 22 November 2025

SECL is a self-referential contrastive loss that leverages past network states (latency models) to form anchor–positive–negative triplets, improving depth estimation in challenging weather conditions.
It integrates interval-based depth distribution encoding with dynamic margin thresholding and Jensen–Shannon divergence to stabilize predictions under environmental degradations.
Empirical results show significant reductions in AbsRel metrics, attesting to SECL's effectiveness in enhancing model robustness and prediction sharpness in rain, fog, and snow.

Self-Evolution Contrastive Loss (SECL) is a self-referential regularization technique introduced for robust self-supervised monocular depth estimation, particularly under adverse weather conditions that impair visibility and degrade standard photometric supervision. SECL operates within the SEC-Depth framework, which leverages temporally evolving “latency” models—snapshots of the network from earlier training stages— to construct contrastive losses without the need for external teachers or handcrafted curricula. SECL combines interval-based depth distribution encoding, dynamic margin thresholding, and adaptive integration to stabilize and enhance depth prediction performance in challenging scenarios such as rain, fog, and snow (Cao et al., 19 Nov 2025).

1. SEC-Depth Framework and Training Objective

The core task addressed is self-supervised monocular depth estimation, traditionally optimized with a photometric reconstruction loss:

$L_{\text{ph}} = \beta_1 \cdot (1 - \mathrm{SSIM}(I, \tilde{I}')) + \beta_2 \cdot |I - \tilde{I}'|$

where $\tilde{I}'$ is the warped counterpart of $I'$ using predicted disparities, and $\mathrm{SSIM}$ denotes the structural similarity index.

SEC-Depth extends this by periodically introducing weather-corrupted inputs $I_{\text{aug}}$ (e.g., rain/fog/snow) every $S$ iterations. The full objective is:

$L = L_{\text{ph}}(\text{clean}) + L_{\text{ph}}(\text{degraded}) + w \cdot L_c$

where $w$ is a dynamic scalar controlling the contrastive loss term $L_c$ . The contrastive component exploits the model’s parameter history, forming anchor–positive–negative triplets from current and prior states. Notably, this design removes dependence on external negative sampling or specialized synthetic weather curricula.

2. Mathematical Definition and Construction of SECL

Sample Construction and Depth Distribution

For each training input, SECL defines:

Anchor disparity: $D_A = F_t(I_{\text{aug}})$ from the current model.
Positive disparity: $D_P = F_t(I)$ from the current model applied to the clean image.
Negative disparities: $\{ D_N^k = F_{N_k}(I_{\text{aug}}) \}_{k=1}^M$ from $M$ historical “latency” models.

Normalized disparities in $[0,1]$ are partitioned into $N$ fixed-width bins centered at $c_n = (n+0.5)/N$ . For each pixel disparity $d$ , the assignment to bin $n$ uses a Gaussian kernel:

$w_n(d) = \frac{1}{\sigma\sqrt{2\pi}} \exp\left(-\frac{(d-c_n)^2}{2\sigma^2}\right), \quad \sigma = \frac{1}{2N}$

Averaging $w_n(d)$ across all pixels yields a discrete distribution $P_X = [p_X^1, \ldots, p_X^N]$ for $X \in \{A, P, N^k\}$ .

SECL Formula

Let $\mathrm{JS}(P\|Q)$ denote the Jensen–Shannon divergence. SECL is given by:

$L_c = \mathrm{JS}(P_A \| P_P) + \frac{1}{M}\sum_{k=1}^M \left[ \delta \cdot \Delta_1^k + \mathrm{JS}(P_A \| P_{N^k}) \cdot \Delta_2^k \right]$

where for $i = 1,2$ ,

$\Delta_i^k = \max(\alpha_i - \mathrm{JS}(P_A \| P_{N^k}),\ 0)$

$\alpha_1$ is a dynamic margin:

$\alpha_1(t) = a \cdot \exp(-15 \cdot t / T) + c$

with $t$ as the current step, $T$ total steps, and $a, c$ decay parameters; $\alpha_2$ is a fixed margin (default 0.005). The factor $\delta$ (default $1 \times 10^{-4}$ ) modulates the “hardness” penalty for negatives.

3. Latency Model Queue and Dynamic Updating

Historical models (“latency models”) are maintained in a circular queue $Q$ of length $M$ (typically $M=3$ ). This queue is updated every $T_v$ steps (default $T_v=200$ ) or conditionally if the variability of current anchor-negative pairs drops below that of anchor–positive. Updates use exponential moving average (EMA) smoothing:

Given pointer $n \leftarrow 0$ ,
For each step $t$ $t$ :
- If update triggered: $\theta^* = \omega \cdot Q[n] + (1-\omega) \cdot \theta_{\text{current}}$ (with momentum $\omega=0.01$ ), replace $Q[n] \leftarrow \theta^*$ , advance $n$ mod $M$ .

Negative examples for SECL are generated by forwarding $I_{\text{aug}}$ through all models in $Q$ . This mechanism ensures controlled diversity and progressive difficulty for contrastive learning.

4. Adaptive Weighting and Integration into the Learning Process

To avoid destabilization at early optimization stages, the weight $w$ on $L_c$ is initialized at $w_s = 0.01$ and linearly increased after epoch $e_a = 5$ until $e_b = 15$ :

For $e \leq e_b$ : $w = w_s \cdot (1 + \max(0, e-e_a))$
For $e > e_b$ : $w = w_s \cdot (e_b - e_a)$

SECL adaptively senses the severity of weather-induced degradation based on the observed $\mathrm{JS}(P_A\|P_{N^k})$ : larger values indicate that current predictions diverge strongly from historical negatives, typically signifying more severe degradation (e.g., heavy rain/fog), and thus trigger stronger contrastive gradients. This enables the learning objectives to shift in response to the evolving difficulty of adverse conditions, reducing the need for manual intervention in curriculum design.

5. Implementation Details and Key Hyperparameters

Critical training hyperparameters include:

Parameter	Default Value	Purpose
Degradation injection interval	$S=5$ steps	Frequency of weather corruption
Number of negatives	$M=3$	Size of latency queue
Depth bins	$N=32$	Binning for interval-based distributions
Kernel parameter	$\sigma=1/(2N)$	For Gaussian binning
Margin decay	$a=0.05$ , $c=0.001$	Dynamic margin for contrastive separation
Fixed negative margin	$\alpha_2=0.005$	Threshold for negative diversity
Hardness penalty	$\delta=10^{-4}$	Negative component weight
Contrastive schedule	$w_s=0.01$	Initial weight for $L_c$
EMA momentum	$\omega=0.01$	Latency model smoothing
Queue update interval	$T_v=200$	Refreshing interval for queue

Other training settings (learning rate, batch size, image size) align with the self-supervised backbone, such as MonoViT or PlaneDepth (Cao et al., 19 Nov 2025).

6. Empirical Evaluation: Ablation and Zero-Shot Robustness

Ablation studies underscore the incremental impact of SECL components under the MonoViT backbone on WeatherKITTI and diverse zero-shot scenarios:

Adding contrastive learning reduces AbsRel from 0.120 to 0.106 (zero-shot: 0.169→0.146)
Interval-based depth distribution further reduces AbsRel to 0.105 (zero-shot: 0.144)
Full SECL (with margin terms $\Delta_1, \Delta_2$ ) yields AbsRel 0.104 (zero-shot: 0.142)

SECL-specific hyperparameter ablations identified:

Exponential decay for $\alpha_1$ outperforms linear decay (AbsRel 0.142 vs. 0.147).
Best fixed margin: $\alpha_2 = 0.005$ .
Best $\delta = 10^{-4}$ .
$N=32$ bins achieves a favorable balance between computational cost and accuracy.

Zero-shot robustness is demonstrated on six unseen adverse weather datasets (e.g., DrivingStereo rain/fog, Cityscapes snow/rain/fog), achieving the following results:

MonoViT + SEC-Depth: AbsRel reduced from 0.169 to 0.142, outperforming WeatherDepth and Robust-Depth baselines.
PlaneDepth + SEC-Depth: AbsRel from 0.215 to 0.168 versus WeatherDepth. Qualitative evidence shows sharper object boundaries and fewer “collapse failures” under heavy weather.

7. Significance and Relation to Prior Work

SECL defines a plug-and-play, model-agnostic loss informed solely by the network’s own evolving representations. By exploiting model “past selves” as negatives, SECL circumvents reliance on external data, handcrafted curricula, or auxiliary teacher models for curriculum scheduling. This approach generalizes across architectures and task domains characterized by variation in degradation type or severity, providing a principled mechanism for robust depth estimation under real-world adverse conditions (Cao et al., 19 Nov 2025).

A plausible implication is that such latency-driven contrastive objectives could extend beyond depth regression to other domains where robustness to evolving data distributions and degradation is critical.

PDF Markdown Chat (Pro)

References (1)

Learning Depth from Past Selves: Self-Evolution Contrast for Robust Depth Estimation (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Self-Evolution Contrastive Loss (SECL).