Papers
Topics
Authors
Recent
2000 character limit reached

SECL: Self-Evolution Contrastive Loss

Updated 22 November 2025
  • SECL is a self-referential contrastive loss that leverages past network states (latency models) to form anchor–positive–negative triplets, improving depth estimation in challenging weather conditions.
  • It integrates interval-based depth distribution encoding with dynamic margin thresholding and Jensen–Shannon divergence to stabilize predictions under environmental degradations.
  • Empirical results show significant reductions in AbsRel metrics, attesting to SECL's effectiveness in enhancing model robustness and prediction sharpness in rain, fog, and snow.

Self-Evolution Contrastive Loss (SECL) is a self-referential regularization technique introduced for robust self-supervised monocular depth estimation, particularly under adverse weather conditions that impair visibility and degrade standard photometric supervision. SECL operates within the SEC-Depth framework, which leverages temporally evolving “latency” models—snapshots of the network from earlier training stages— to construct contrastive losses without the need for external teachers or handcrafted curricula. SECL combines interval-based depth distribution encoding, dynamic margin thresholding, and adaptive integration to stabilize and enhance depth prediction performance in challenging scenarios such as rain, fog, and snow (Cao et al., 19 Nov 2025).

1. SEC-Depth Framework and Training Objective

The core task addressed is self-supervised monocular depth estimation, traditionally optimized with a photometric reconstruction loss:

Lph=β1(1SSIM(I,I~))+β2II~L_{\text{ph}} = \beta_1 \cdot (1 - \mathrm{SSIM}(I, \tilde{I}')) + \beta_2 \cdot |I - \tilde{I}'|

where I~\tilde{I}' is the warped counterpart of II' using predicted disparities, and SSIM\mathrm{SSIM} denotes the structural similarity index.

SEC-Depth extends this by periodically introducing weather-corrupted inputs IaugI_{\text{aug}} (e.g., rain/fog/snow) every SS iterations. The full objective is:

L=Lph(clean)+Lph(degraded)+wLcL = L_{\text{ph}}(\text{clean}) + L_{\text{ph}}(\text{degraded}) + w \cdot L_c

where ww is a dynamic scalar controlling the contrastive loss term LcL_c. The contrastive component exploits the model’s parameter history, forming anchor–positive–negative triplets from current and prior states. Notably, this design removes dependence on external negative sampling or specialized synthetic weather curricula.

2. Mathematical Definition and Construction of SECL

Sample Construction and Depth Distribution

For each training input, SECL defines:

  • Anchor disparity: DA=Ft(Iaug)D_A = F_t(I_{\text{aug}}) from the current model.
  • Positive disparity: DP=Ft(I)D_P = F_t(I) from the current model applied to the clean image.
  • Negative disparities: {DNk=FNk(Iaug)}k=1M\{ D_N^k = F_{N_k}(I_{\text{aug}}) \}_{k=1}^M from MM historical “latency” models.

Normalized disparities in [0,1][0,1] are partitioned into NN fixed-width bins centered at cn=(n+0.5)/Nc_n = (n+0.5)/N. For each pixel disparity dd, the assignment to bin nn uses a Gaussian kernel:

wn(d)=1σ2πexp((dcn)22σ2),σ=12Nw_n(d) = \frac{1}{\sigma\sqrt{2\pi}} \exp\left(-\frac{(d-c_n)^2}{2\sigma^2}\right), \quad \sigma = \frac{1}{2N}

Averaging wn(d)w_n(d) across all pixels yields a discrete distribution PX=[pX1,,pXN]P_X = [p_X^1, \ldots, p_X^N] for X{A,P,Nk}X \in \{A, P, N^k\}.

SECL Formula

Let JS(PQ)\mathrm{JS}(P\|Q) denote the Jensen–Shannon divergence. SECL is given by:

Lc=JS(PAPP)+1Mk=1M[δΔ1k+JS(PAPNk)Δ2k]L_c = \mathrm{JS}(P_A \| P_P) + \frac{1}{M}\sum_{k=1}^M \left[ \delta \cdot \Delta_1^k + \mathrm{JS}(P_A \| P_{N^k}) \cdot \Delta_2^k \right]

where for i=1,2i = 1,2,

Δik=max(αiJS(PAPNk), 0)\Delta_i^k = \max(\alpha_i - \mathrm{JS}(P_A \| P_{N^k}),\ 0)

α1\alpha_1 is a dynamic margin:

α1(t)=aexp(15t/T)+c\alpha_1(t) = a \cdot \exp(-15 \cdot t / T) + c

with tt as the current step, TT total steps, and a,ca, c decay parameters; α2\alpha_2 is a fixed margin (default 0.005). The factor δ\delta (default 1×1041 \times 10^{-4}) modulates the “hardness” penalty for negatives.

3. Latency Model Queue and Dynamic Updating

Historical models (“latency models”) are maintained in a circular queue QQ of length MM (typically M=3M=3). This queue is updated every TvT_v steps (default Tv=200T_v=200) or conditionally if the variability of current anchor-negative pairs drops below that of anchor–positive. Updates use exponential moving average (EMA) smoothing:

  • Given pointer n0n \leftarrow 0,
  • For each step tt:
    • If update triggered: θ=ωQ[n]+(1ω)θcurrent\theta^* = \omega \cdot Q[n] + (1-\omega) \cdot \theta_{\text{current}} (with momentum ω=0.01\omega=0.01), replace Q[n]θQ[n] \leftarrow \theta^*, advance nn mod MM.

Negative examples for SECL are generated by forwarding IaugI_{\text{aug}} through all models in QQ. This mechanism ensures controlled diversity and progressive difficulty for contrastive learning.

4. Adaptive Weighting and Integration into the Learning Process

To avoid destabilization at early optimization stages, the weight ww on LcL_c is initialized at ws=0.01w_s = 0.01 and linearly increased after epoch ea=5e_a = 5 until eb=15e_b = 15:

  • For eebe \leq e_b: w=ws(1+max(0,eea))w = w_s \cdot (1 + \max(0, e-e_a))
  • For e>ebe > e_b: w=ws(ebea)w = w_s \cdot (e_b - e_a)

SECL adaptively senses the severity of weather-induced degradation based on the observed JS(PAPNk)\mathrm{JS}(P_A\|P_{N^k}): larger values indicate that current predictions diverge strongly from historical negatives, typically signifying more severe degradation (e.g., heavy rain/fog), and thus trigger stronger contrastive gradients. This enables the learning objectives to shift in response to the evolving difficulty of adverse conditions, reducing the need for manual intervention in curriculum design.

5. Implementation Details and Key Hyperparameters

Critical training hyperparameters include:

Parameter Default Value Purpose
Degradation injection interval S=5S=5 steps Frequency of weather corruption
Number of negatives M=3M=3 Size of latency queue
Depth bins N=32N=32 Binning for interval-based distributions
Kernel parameter σ=1/(2N)\sigma=1/(2N) For Gaussian binning
Margin decay a=0.05a=0.05, c=0.001c=0.001 Dynamic margin for contrastive separation
Fixed negative margin α2=0.005\alpha_2=0.005 Threshold for negative diversity
Hardness penalty δ=104\delta=10^{-4} Negative component weight
Contrastive schedule ws=0.01w_s=0.01 Initial weight for LcL_c
EMA momentum ω=0.01\omega=0.01 Latency model smoothing
Queue update interval Tv=200T_v=200 Refreshing interval for queue

Other training settings (learning rate, batch size, image size) align with the self-supervised backbone, such as MonoViT or PlaneDepth (Cao et al., 19 Nov 2025).

6. Empirical Evaluation: Ablation and Zero-Shot Robustness

Ablation studies underscore the incremental impact of SECL components under the MonoViT backbone on WeatherKITTI and diverse zero-shot scenarios:

  • Adding contrastive learning reduces AbsRel from 0.120 to 0.106 (zero-shot: 0.169→0.146)
  • Interval-based depth distribution further reduces AbsRel to 0.105 (zero-shot: 0.144)
  • Full SECL (with margin terms Δ1,Δ2\Delta_1, \Delta_2) yields AbsRel 0.104 (zero-shot: 0.142)

SECL-specific hyperparameter ablations identified:

  • Exponential decay for α1\alpha_1 outperforms linear decay (AbsRel 0.142 vs. 0.147).
  • Best fixed margin: α2=0.005\alpha_2 = 0.005.
  • Best δ=104\delta = 10^{-4}.
  • N=32N=32 bins achieves a favorable balance between computational cost and accuracy.

Zero-shot robustness is demonstrated on six unseen adverse weather datasets (e.g., DrivingStereo rain/fog, Cityscapes snow/rain/fog), achieving the following results:

  • MonoViT + SEC-Depth: AbsRel reduced from 0.169 to 0.142, outperforming WeatherDepth and Robust-Depth baselines.
  • PlaneDepth + SEC-Depth: AbsRel from 0.215 to 0.168 versus WeatherDepth. Qualitative evidence shows sharper object boundaries and fewer “collapse failures” under heavy weather.

7. Significance and Relation to Prior Work

SECL defines a plug-and-play, model-agnostic loss informed solely by the network’s own evolving representations. By exploiting model “past selves” as negatives, SECL circumvents reliance on external data, handcrafted curricula, or auxiliary teacher models for curriculum scheduling. This approach generalizes across architectures and task domains characterized by variation in degradation type or severity, providing a principled mechanism for robust depth estimation under real-world adverse conditions (Cao et al., 19 Nov 2025).

A plausible implication is that such latency-driven contrastive objectives could extend beyond depth regression to other domains where robustness to evolving data distributions and degradation is critical.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Self-Evolution Contrastive Loss (SECL).