Papers
Topics
Authors
Recent
Search
2000 character limit reached

Adverse Weather Distillation

Updated 4 July 2026
  • Adverse weather distillation is a technique that transfers clear-weather model knowledge to systems operating under rain, fog, or night conditions.
  • It employs diverse distillation targets including disparity maps, restoration residuals, and cost-volume statistics to overcome domain shift and weak supervision.
  • The method’s efficacy is demonstrated across tasks such as image restoration, monocular depth estimation, optical flow, and LiDAR object detection with measurable performance gains.

Adverse weather distillation denotes a family of distillation procedures used to transfer clean-scene, clear-weather, or teacher-model knowledge into models that must operate under haze, rain, snow, fog, night, or related degradations. In the recent literature, the term covers both conventional knowledge distillation—such as teacher–student alignment of outputs, features, cost volumes, or detector responses—and data distillation procedures that construct supervisory pairs from degraded imagery itself. The underlying motivation is consistent across tasks: adverse weather weakens direct supervision, violates photometric assumptions, or induces severe domain shift, so training is reformulated around invariances between clean and degraded observations or around pseudo-labels generated in a more reliable domain (Wang et al., 23 Sep 2025, Jiang et al., 18 May 2025, Zhou et al., 2024, Huang et al., 2024, Tan et al., 2023, Lin et al., 2019, Cheng et al., 2024).

1. Terminological scope and task coverage

Within the literature, adverse weather distillation is not restricted to a single task or a single form of supervision. In self-supervised stereo matching, RoSe names a second training stage “adverse-weather distillation,” in which a teacher trained with scene-correspondence priors on clear/adverse pairs produces pseudo-disparities on clear stereo inputs, and a fresh student is trained on mixed clear or degraded inputs to match those predictions (Wang et al., 23 Sep 2025). In monocular depth estimation, ACDepth uses a multi-granularity knowledge distillation strategy in which a student absorbs knowledge from a clear-trained teacher model and pretrained Depth Anything V2, with feature-wise distillation, ordinal guidance, and feature consistency across degradation types (Jiang et al., 18 May 2025). In adverse weather removal, distillation appears both as continual knowledge replay on a unified network structure and as soft residual transfer from CLIP features into a restoration backbone (Cheng et al., 2024, Tan et al., 2023).

The same logic extends beyond restoration and depth. In adverse weather optical flow, synthetic-domain motion statistics are distilled into a real-domain network by aligning cost-volume correlation histograms and by using pseudo-labels from a synthetic-degraded encoder (Zhou et al., 2024). In LiDAR-based 3D object detection, Sunny-to-Rainy Knowledge Distillation aligns RoI instance features and final detector responses between a sunny teacher and rainy student while adding a noise-aware correction term (Huang et al., 2024). Earlier real-image deraining work uses “data distillation” rather than teacher–student logits: a rainy image is paired first with a coarsely derained soft label and then with a clean image onto which the extracted rain residual is re-applied, yielding a hard rainy–clean pair for shared-network training (Lin et al., 2019).

Setting Source of distilled knowledge Distillation target
Continual all-in-one weather removal Frozen old model and replay buffer Predictions and principal features
Stereo matching Clear-pair teacher pseudo-disparity Student disparity on mixed inputs
Monocular depth estimation Clear-trained teacher and Depth Anything V2 Multi-scale features, ordinal relations, feature consistency
Optical flow Synthetic-degraded encoder Cost-volume histogram and pseudo-flow
3D object detection Sunny detector RoI features and detector responses
Real-image deraining Filtered rainy image and extracted rain residual Soft and hard supervisory pairs

A common misconception is that adverse weather distillation is merely “logit matching under bad weather.” The surveyed methods contradict that interpretation. Their distilled objects include per-pixel restorations, compressed mid-level embeddings, residual spatial features, disparity maps, cost-volume statistics, RoI features, and even re-synthesized rainy images (Cheng et al., 2024, Wang et al., 23 Sep 2025, Tan et al., 2023, Zhou et al., 2024, Huang et al., 2024, Lin et al., 2019).

2. Distillation targets and objective functions

A central axis of variation is the object being matched. In continual all-in-one adverse weather removal, the old model at stage t1t-1 and the current model at stage tt are both evaluated on a stored degraded sample Iˉ\bar I, yielding

y=ϕ(t1)(F(t1)(Iˉ)),y+=ϕt(Ft(Iˉ)).y^- = \phi^{(t-1)}(F^{(t-1)}(\bar I)), \qquad y^+ = \phi^t(F^t(\bar I)).

Prediction-level distillation combines an 1\ell_1 term with a contrastive reconstruction regularizer,

LKD(Iˉ)=yy+1+β2LCT(y+,y,Iˉ),L_{KD}(\bar I)=\|y^- - y^+\|_1 + \beta_2 \cdot L_{CT}(y^+,y^-,\bar I),

while principal-feature distillation aligns compressed features,

LPKD(Iˉ)=ψ(F(t1)(Iˉ))ψ(Ft(Iˉ))1.L_{PKD}(\bar I)=\|\psi(F^{(t-1)}(\bar I))-\psi(F^t(\bar I))\|_1.

The total continual objective adds these replay losses to the standard single-weather restoration loss on current-task samples (Cheng et al., 2024).

RoSe’s stereo formulation is structurally simpler at the final stage. After Step 1 self-supervised scene-correspondence learning, the model is frozen as a teacher ff. For a clear pair cic_i, the teacher produces a masked pseudo-disparity Di=f(ci)D_i^*=f(c_i). A re-initialized student tt0 is trained on mixed inputs tt1 with

tt2

where tt3 is obtained from a left–right consistency check. The key property is that the student is supervised by the teacher’s clear-scene prediction even when the input is foggy, rainy, or nocturnal (Wang et al., 23 Sep 2025).

ACDepth introduces a broader distillation stack. Its overall objective is

tt4

with tt5 and tt6. The feature-wise term tt7 aligns multi-scale teacher and student features on mixed clear or degraded samples. The consistency term tt8 explicitly ties student degraded features to stop-gradient teacher-clear and student-clear features. The ordinal guidance distillation term tt9 focuses the model on uncertain regions defined by the normalized inverse-depth disagreement between teacher and student, with threshold Iˉ\bar I0 and ordinal tolerance Iˉ\bar I1 (Jiang et al., 18 May 2025).

Other domains use different alignment spaces. In CLIP-based adverse weather removal, the teacher signal is the residual feature

Iˉ\bar I2

computed from CLIP image-encoder features on clean and weathered images; the SAR encoder’s residuals are matched to this quantity via an Iˉ\bar I3 loss after channel-wise normalization (Tan et al., 2023). In CHIˉ\bar I4DA-Flow, the distillation target is not the output flow alone but the distribution of sampled cost-volume correlations. Synthetic and real degraded histograms are aligned with

Iˉ\bar I5

and pseudo-flow supervision is added through Iˉ\bar I6 (Zhou et al., 2024). In SRKD for 3D object detection, instance-feature matching, response distillation, and noise-aware prediction correction are combined as

Iˉ\bar I7

with Iˉ\bar I8, Iˉ\bar I9, and y=ϕ(t1)(F(t1)(Iˉ)),y+=ϕt(Ft(Iˉ)).y^- = \phi^{(t-1)}(F^{(t-1)}(\bar I)), \qquad y^+ = \phi^t(F^t(\bar I)).0 (Huang et al., 2024).

These formulations show that adverse weather distillation is often closer to structured correspondence transfer than to classical classification KD. RoSe explicitly distinguishes its procedure from standard classification distillation by noting that it distills per-pixel restoration outputs rather than class probabilities and adds a contrastive term to pull the new output toward the old output while pushing it away from the degraded input (Cheng et al., 2024). A plausible implication is that weather-robust distillation methods tend to encode restoration or geometry priors directly in the supervisory object, rather than relying on softened categorical outputs.

3. Teacher construction, paired data, and pseudo-supervision

Because real paired labels under adverse weather are scarce, most methods devote substantial effort to constructing teacher signals or pseudo-pairs whose geometry remains valid. RoSe begins from clear-weather stereo datasets—DrivingStereo, MS2, and KITTI—and trains three CycleGAN-Turbo translators y=ϕ(t1)(F(t1)(Iˉ)),y+=ϕt(Ft(Iˉ)).y^- = \phi^{(t-1)}(F^{(t-1)}(\bar I)), \qquad y^+ = \phi^t(F^t(\bar I)).1, y=ϕ(t1)(F(t1)(Iˉ)),y+=ϕt(Ft(Iˉ)).y^- = \phi^{(t-1)}(F^{(t-1)}(\bar I)), \qquad y^+ = \phi^t(F^t(\bar I)).2, and y=ϕ(t1)(F(t1)(Iˉ)),y+=ϕt(Ft(Iˉ)).y^- = \phi^{(t-1)}(F^{(t-1)}(\bar I)), \qquad y^+ = \phi^t(F^t(\bar I)).3 on unpaired samples. Each clear pair y=ϕ(t1)(F(t1)(Iˉ)),y+=ϕt(Ft(Iˉ)).y^- = \phi^{(t-1)}(F^{(t-1)}(\bar I)), \qquad y^+ = \phi^t(F^t(\bar I)).4 is converted into y=ϕ(t1)(F(t1)(Iˉ)),y+=ϕt(Ft(Iˉ)).y^- = \phi^{(t-1)}(F^{(t-1)}(\bar I)), \qquad y^+ = \phi^t(F^t(\bar I)).5 in the target style, and because the translator changes appearance rather than geometry, the original ground-truth disparity is preserved. This yields Adverse-DrivingStereo with y=ϕ(t1)(F(t1)(Iˉ)),y+=ϕt(Ft(Iˉ)).y^- = \phi^{(t-1)}(F^{(t-1)}(\bar I)), \qquad y^+ = \phi^t(F^t(\bar I)).6 images in Clear/Fog/Rain/Night, Adverse-MS2 with y=ϕ(t1)(F(t1)(Iˉ)),y+=ϕt(Ft(Iˉ)).y^- = \phi^{(t-1)}(F^{(t-1)}(\bar I)), \qquad y^+ = \phi^t(F^t(\bar I)).7, and Adverse-KITTI with y=ϕ(t1)(F(t1)(Iˉ)),y+=ϕt(Ft(Iˉ)).y^- = \phi^{(t-1)}(F^{(t-1)}(\bar I)), \qquad y^+ = \phi^t(F^t(\bar I)).8 (Wang et al., 23 Sep 2025).

ACDepth similarly treats adverse weather generation as a prerequisite for robust distillation, but uses a one-step diffusion model built on Stable Diffusion Turbo with LoRA adapters. For each weather condition y=ϕ(t1)(F(t1)(Iˉ)),y+=ϕt(Ft(Iˉ)).y^- = \phi^{(t-1)}(F^{(t-1)}(\bar I)), \qquad y^+ = \phi^t(F^t(\bar I)).9, a clear image 1\ell_10 and prompt 1\ell_11 are processed to obtain

1\ell_12

Cycle-consistency, adversarial, and identity regularization losses are jointly optimized so that the translated image preserves scene content while adopting target weather statistics. Separate LoRA sets are trained for day1\ell_13night, day1\ell_14rain, and related conditions, using approximately 1\ell_15 clear/rain/night examples per domain and 1\ell_16 for RobotCar night (Jiang et al., 18 May 2025).

In adverse weather optical flow and LiDAR detection, teacher generation is explicitly domain-adaptive. CH1\ell_17DA-Flow bridges clean, synthetic degraded, and real degraded domains. Synthetic fog and rain are used in Clean-Degraded Motion Adaptation, and the resulting synthetic-degraded encoder becomes the source model for Synthetic-Real Motion Adaptation on real degraded imagery (Zhou et al., 2024). SRKD augments sunny Waymo point clouds with DRET, a rain simulation pipeline that combines particle-based splashes in Unity3D with physically based LiDAR scattering and attenuation, producing approximately 1\ell_18 rainy scans offline (Huang et al., 2024).

The oldest of the surveyed methods, “Rain O’er Me,” uses no external teacher at all. Instead, it creates supervision from the rainy image itself. A rainy image 1\ell_19 is downsampled, super-resolved by a pretrained SRDN, combined by element-wise minimum, and refined with a guided filter:

LKD(Iˉ)=yy+1+β2LCT(y+,y,Iˉ),L_{KD}(\bar I)=\|y^- - y^+\|_1 + \beta_2 \cdot L_{CT}(y^+,y^-,\bar I),0

The result LKD(Iˉ)=yy+1+β2LCT(y+,y,Iˉ),L_{KD}(\bar I)=\|y^- - y^+\|_1 + \beta_2 \cdot L_{CT}(y^+,y^-,\bar I),1 is a blurred rain-free soft label. The network predicts a rain residual LKD(Iˉ)=yy+1+β2LCT(y+,y,Iˉ),L_{KD}(\bar I)=\|y^- - y^+\|_1 + \beta_2 \cdot L_{CT}(y^+,y^-,\bar I),2, enhances it to LKD(Iˉ)=yy+1+β2LCT(y+,y,Iˉ),L_{KD}(\bar I)=\|y^- - y^+\|_1 + \beta_2 \cdot L_{CT}(y^+,y^-,\bar I),3, and adds it to a different clean image LKD(Iˉ)=yy+1+β2LCT(y+,y,Iˉ),L_{KD}(\bar I)=\|y^- - y^+\|_1 + \beta_2 \cdot L_{CT}(y^+,y^-,\bar I),4 to form LKD(Iˉ)=yy+1+β2LCT(y+,y,Iˉ),L_{KD}(\bar I)=\|y^- - y^+\|_1 + \beta_2 \cdot L_{CT}(y^+,y^-,\bar I),5, thereby generating a hard rainy–clean pair for a second supervisory branch (Lin et al., 2019).

Continual adverse weather removal replaces synthetic pairing with replay. Its memory buffer LKD(Iˉ)=yy+1+β2LCT(y+,y,Iˉ),L_{KD}(\bar I)=\|y^- - y^+\|_1 + \beta_2 \cdot L_{CT}(y^+,y^-,\bar I),6 stores only degraded old images, not clean targets, and at the end of each task a uniform random subset of newly seen data is added with equal budget per task. Distillation is then performed against the frozen old model on each replayed sample (Cheng et al., 2024). This suggests that in continual settings, the pseudo-labeling problem is shifted from “how to obtain a clean target” to “how to preserve the old model’s behavior on degraded inputs.”

4. Architectural couplings between distillation and weather robustness

Adverse weather distillation is rarely a stand-alone auxiliary loss; it is usually embedded in architectures designed to expose weather-invariant or degradation-sensitive representations. The continual all-in-one restoration framework adopts FFA-Net as a unified backbone, abstracts it into a feature extractor LKD(Iˉ)=yy+1+β2LCT(y+,y,Iˉ),L_{KD}(\bar I)=\|y^- - y^+\|_1 + \beta_2 \cdot L_{CT}(y^+,y^-,\bar I),7 and image projector LKD(Iˉ)=yy+1+β2LCT(y+,y,Iˉ),L_{KD}(\bar I)=\|y^- - y^+\|_1 + \beta_2 \cdot L_{CT}(y^+,y^-,\bar I),8, and adds an auxiliary auto-encoder LKD(Iˉ)=yy+1+β2LCT(y+,y,Iˉ),L_{KD}(\bar I)=\|y^- - y^+\|_1 + \beta_2 \cdot L_{CT}(y^+,y^-,\bar I),9 pretrained on features of all stored old samples. Its encoder LPKD(Iˉ)=ψ(F(t1)(Iˉ))ψ(Ft(Iˉ))1.L_{PKD}(\bar I)=\|\psi(F^{(t-1)}(\bar I))-\psi(F^t(\bar I))\|_1.0 is then frozen and used as a PCA-style projector from LPKD(Iˉ)=ψ(F(t1)(Iˉ))ψ(Ft(Iˉ))1.L_{PKD}(\bar I)=\|\psi(F^{(t-1)}(\bar I))-\psi(F^t(\bar I))\|_1.1 to LPKD(Iˉ)=ψ(F(t1)(Iˉ))ψ(Ft(Iˉ))1.L_{PKD}(\bar I)=\|\psi(F^{(t-1)}(\bar I))-\psi(F^t(\bar I))\|_1.2, with implementation based on multi-head channel self-attention and a learnable channel selection layer so as to mimic PCA at linear instead of quadratic cost (Cheng et al., 2024).

RoSe couples distillation to a feature extractor enhanced with frozen visual foundation model priors. Outputs from pre-trained ViT blocks are fused with an FPN encoder at scales LPKD(Iˉ)=ψ(F(t1)(Iˉ))ψ(Ft(Iˉ))1.L_{PKD}(\bar I)=\|\psi(F^{(t-1)}(\bar I))-\psi(F^t(\bar I))\|_1.3, and an Anti-Adverse Feature Enhancement Module operates through instance normalization, batch normalization, channel-attention fusion, and Fourier-domain amplitude filtering:

LPKD(Iˉ)=ψ(F(t1)(Iˉ))ψ(Ft(Iˉ))1.L_{PKD}(\bar I)=\|\psi(F^{(t-1)}(\bar I))-\psi(F^t(\bar I))\|_1.4

LPKD(Iˉ)=ψ(F(t1)(Iˉ))ψ(Ft(Iˉ))1.L_{PKD}(\bar I)=\|\psi(F^{(t-1)}(\bar I))-\psi(F^t(\bar I))\|_1.5

The subsequent FFT/iFFT path filters degradation-related amplitude components while preserving phase and structure (Wang et al., 23 Sep 2025).

In CLIP-based adverse weather removal, the Spatially-Adaptive Residual Encoder and CLIP Weather Prior module make distillation explicitly architectural. Each SAR Transformer block applies multi-head self-attention followed by a SARFFN. The SAR module predicts location-varying combinations of basis depthwise kernels,

LPKD(Iˉ)=ψ(F(t1)(Iˉ))ψ(Ft(Iˉ))1.L_{PKD}(\bar I)=\|\psi(F^{(t-1)}(\bar I))-\psi(F^t(\bar I))\|_1.6

LPKD(Iˉ)=ψ(F(t1)(Iˉ))ψ(Ft(Iˉ))1.L_{PKD}(\bar I)=\|\psi(F^{(t-1)}(\bar I))-\psi(F^t(\bar I))\|_1.7

so the residual branch concentrates on degraded regions. The CWP module then injects sample-specific CLIP priors LPKD(Iˉ)=ψ(F(t1)(Iˉ))ψ(Ft(Iˉ))1.L_{PKD}(\bar I)=\|\psi(F^{(t-1)}(\bar I))-\psi(F^t(\bar I))\|_1.8 and distribution-specific embeddings LPKD(Iˉ)=ψ(F(t1)(Iˉ))ψ(Ft(Iˉ))1.L_{PKD}(\bar I)=\|\psi(F^{(t-1)}(\bar I))-\psi(F^t(\bar I))\|_1.9 through cross-attention, with a side cross-entropy weather-classification loss to regularize the prior (Tan et al., 2023).

CHff0DA-Flow and SRKD exhibit analogous couplings in motion and detection systems. CHff1DA-Flow uses a FlowFormer backbone with cost-volume correlations, a DispNet stereo network, and PoseNet, so that syntheticff2real distillation operates directly on motion statistics in the cost space (Zhou et al., 2024). SRKD uses identical teacher and student detector architectures, but ties distillation to object-level similarity measures, high-confidence response selection, and explicit noise ratios inside predicted boxes (Huang et al., 2024). A plausible implication is that adverse weather distillation is most effective when the intermediate representation chosen for transfer corresponds to the physical locus of the weather corruption: image residuals for restoration, disparities for stereo, cost volumes for flow, and point-density responses for LiDAR detection.

5. Quantitative behavior across representative systems

The strongest quantitative evidence in the surveyed literature comes from settings where weaker replay or weaker self-supervision is directly compared to stronger distillation. In continual all-in-one adverse weather removal on the three-task sequence Haze ff3 Rain ff4 Snow, Joint-M, which uses replay without distillation, reaches ff5 dB average PSNR; adding only prediction KD raises this to ff6 dB; adding principal-feature KD further raises it to ff7 dB, only ff8 dB below the “oracle” Individual-task upper bound of ff9 dB. On the two-task sequence Haze cic_i0 Rain, the full model scores cic_i1 dB versus cic_i2 dB for AFC and cic_i3 dB for Individual-task training. The same study reports stable results down to cic_i4–cic_i5 exemplars, and that after a second task the first-task performance drops by only approximately cic_i6 dB, while after the third task each earlier task retains approximately cic_i7 of its original PSNR (Cheng et al., 2024).

RoSe reports clear gains from its Step 2 adverse-weather distillation. On DrivingStereo weather validation, the mixed-training Step 2 model achieves Clear cic_i8 EPE and cic_i9 Bad-3.0, Fog Di=f(ci)D_i^*=f(c_i)0 and Di=f(ci)D_i^*=f(c_i)1, Rain Di=f(ci)D_i^*=f(c_i)2 and Di=f(ci)D_i^*=f(c_i)3, and Night Di=f(ci)D_i^*=f(c_i)4 and Di=f(ci)D_i^*=f(c_i)5. In zero-shot generalization, it attains Bad-3.0 of Di=f(ci)D_i^*=f(c_i)6 on DrivingStereo Clear/Fog/Rain and Di=f(ci)D_i^*=f(c_i)7 on MS2 Night. Ablations attribute approximately Di=f(ci)D_i^*=f(c_i)8–Di=f(ci)D_i^*=f(c_i)9 relative improvement to Step 2 distillation after prior gains from VFM fusion, scene-correspondence losses, and AFEM (Wang et al., 23 Sep 2025).

ACDepth reports improvements in monocular depth under rain and night on nuScenes. Against md4all-DD, rainy absRel decreases from tt00 to tt01 and night absRel from tt02 to tt03, corresponding to tt04 and tt05. Its ablations show that on nuScenes night, a baseline without distillation has absRel tt06, adding tt07 reduces this to tt08, adding tt09 gives tt10, and adding tt11 gives tt12; on rain, the sequence is tt13 (Jiang et al., 18 May 2025).

For weather removal, CLIP-based residual distillation also shows measurable gains. On Snow100K-L, Test1, and RainDrop, the full method achieves mean PSNR tt14 dB and SSIM tt15, compared with TransWeather’s tt16. In ablations, a baseline at tt17 PSNR improves to tt18 with CutMix, tt19 with CWP, and tt20 with SAR plus CLIP-SRD. On SPA-rain without retraining, it reports approximately tt21 versus TransWeather’s tt22 (Tan et al., 2023).

Motion and detection studies report similarly consistent trends. CHtt23DA-Flow achieves EPE approximately tt24 on Weather-GOF rain versus tt25 for the best prior method, and tt26 on DenseFog versus tt27; on Real-Weather World it obtains EPE approximately tt28–tt29 and F1-all approximately tt30–tt31, compared with tt32–tt33 EPE and tt34–tt35 F1 for direct baselines (Zhou et al., 2024). In LiDAR 3D detection on WOD-DA, DSVT improves from All(L2) tt36 mAP/mAPH to tt37 with DRET-Aug plus SRKD; Voxel-RCNN improves from tt38 to tt39; PV-RCNN++ improves from tt40 to tt41. The same framework also slightly improves sunny All(L2-mAP), for example DSVT tt42 (Huang et al., 2024).

“Rain O’er Me” does not report paired PSNR/SSIM on real scenes because no paired real ground truth exists, but emphasizes model compactness and speed: tt43 parameters, CPU inference time tt44 s, and GPU inference time tt45 s on a tt46 image (Lin et al., 2019).

6. Conceptual distinctions, limitations, and open issues

Several conceptual distinctions recur across the literature. First, adverse weather distillation is not equivalent to domain adaptation, although the two are frequently combined. CHtt47DA-Flow explicitly frames its method as cumulative homogeneous-heterogeneous adaptation, yet the synthetictt48real transfer stage is implemented through knowledge distillation on cost-volume histograms and pseudo-flow labels (Zhou et al., 2024). SRKD likewise relies on rainy augmentation and detector supervision, but its key cross-weather transfer mechanism is teacher–student alignment between sunny and rainy scans (Huang et al., 2024).

Second, the “teacher” need not be an externally supervised large model. It may be a frozen previous checkpoint in continual learning, a clear-weather model in cross-weather stereo or depth, a synthetic-domain encoder in optical flow, a pretrained multimodal encoder such as CLIP, or a soft label obtained by filtering the rainy image itself (Cheng et al., 2024, Wang et al., 23 Sep 2025, Jiang et al., 18 May 2025, Tan et al., 2023, Lin et al., 2019). This suggests that adverse weather distillation is better understood as a transfer principle—moving supervision into a representation or domain where it is more stable—than as a fixed architecture.

Third, the surveyed results also delimit the method’s current boundaries. RoSe’s night results remain worse than its clear-weather results, with Bad-3.0 increasing from tt49 on Clear to tt50 on Night even after Step 2 distillation (Wang et al., 23 Sep 2025). ACDepth’s ablation on nuScenes rain shows that adding the ordinal guidance term after tt51 changes absRel from tt52 to tt53, so the benefit of individual submodules can be non-monotonic until the full objective is assembled (Jiang et al., 18 May 2025). The CLIP-based restoration paper identifies a key challenge as reconciling CLIP’s global alignment objective with pixel-level restoration, and therefore delays the distillation loss until epoch tt54 (Tan et al., 2023). “Rain O’er Me” notes that extremely heavy rain or rain plus mist may still leave residual fog, and that the soft-label path can over-smooth textures (Lin et al., 2019). SRKD reports no inference-time penalty, but does incur approximately tt55–tt56 extra GPU training time and tt57–tt58 more VRAM, while DRET is not end-to-end and requires offline preprocessing (Huang et al., 2024).

A final misconception is that distillation under adverse weather always requires task IDs or specialized branches. The continual all-in-one weather removal framework explicitly states that no task ID or specialized branch is needed at test time, because haze, rain, snow, or mixtures are processed by the same tt59–tt60 pipeline (Cheng et al., 2024). Comparable unification is visible in RoSe’s mixed-input student and in ACDepth’s mixed clear plus synthetic degradation training (Wang et al., 23 Sep 2025, Jiang et al., 18 May 2025). A plausible implication is that, as the field matures, the most durable use of adverse weather distillation may be not as a bolt-on regularizer but as a way to train unified backbones whose intermediate representations are anchored by clean-scene priors even when the observable input is severely degraded.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Adverse Weather Distillation.