How Hard Is Snow? A Paired Domain Adaptation Dataset for Clear and Snowy Weather: CADC+

Published 19 Jun 2025 in cs.CV | (2506.16531v1)

Abstract: The impact of snowfall on 3D object detection performance remains underexplored. Conducting such an evaluation requires a dataset with sufficient labelled data from both weather conditions, ideally captured in the same driving environment. Current driving datasets with LiDAR point clouds either do not provide enough labelled data in both snowy and clear weather conditions, or rely on de-snowing methods to generate synthetic clear weather. Synthetic data often lacks realism and introduces an additional domain shift that confounds accurate evaluations. To address these challenges, we present CADC+, the first paired weather domain adaptation dataset for autonomous driving in winter conditions. CADC+ extends the Canadian Adverse Driving Conditions dataset (CADC) using clear weather data that was recorded on the same roads and in the same period as CADC. To create CADC+, we pair each CADC sequence with a clear weather sequence that matches the snowy sequence as closely as possible. CADC+ thus minimizes the domain shift resulting from factors unrelated to the presence of snow. We also present some preliminary results using CADC+ to evaluate the effect of snow on 3D object detection performance. We observe that snow introduces a combination of aleatoric and epistemic uncertainties, acting as both noise and a distinct data domain.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper presents a novel paired dataset (CADC+) that isolates snow as a domain shift in LiDAR-based 3D object detection.
It employs a multi-objective pairing strategy and sparse labeling with semi-supervised learning to achieve near parity with fully annotated models.
Empirical results confirm that snow induces both aleatoric and epistemic uncertainties, underscoring the need for robust, weather-adaptive detection models.

Overview of "How Hard Is Snow? A Paired Domain Adaptation Dataset for Clear and Snowy Weather: CADC+" (2506.16531)

This paper introduces CADC+, an extension of the Canadian Adverse Driving Conditions (CADC) dataset, designed explicitly for robust, empirical evaluation of the impact of snow on LiDAR-based 3D object detection in autonomous driving. The distinctive contribution of CADC+ is its provision of paired sequences: for each snowy weather sequence, a closely matched clear weather sequence is provided, minimizing confounding domain shifts unrelated to snow. This design counters the limitations of previous datasets and synthetic approaches, which fail to isolate the effect of snow from other environmental and traffic variations or introduce additional, hard-to-quantify domain gaps.

Motivation and Dataset Construction

The challenge of evaluating object detectors under adverse snow conditions is compounded by the inadequacy of available datasets. Existing real-world datasets either lack sufficient labels in both domains or provide only unpaired data, introducing irrelevant domain shifts (e.g., scene, traffic, or seasonality differences) that confuse analysis. Synthetic approaches, such as de-snowing or snow simulation, have known shortcomings, including poor reconstruction of occluded structure and nonphysical approximations, especially regarding accumulated snow and attenuation effects.

To address this, the authors leverage a large reserve of previously unused clear-weather data, recorded in temporal proximity (sometimes a day apart) and on the same vehicle platform as CADC. The pairing methodology combines spatial interpolation of trajectory data and a multi-objective matching strategy, considering factors like road and lane alignment, trajectory overlap, and surrounding traffic dynamics. When optimal automated pairing is infeasible, manual selection ensures environmental similarity.

The dataset labeling strategy maximizes both annotation efficacy and diversity under budget constraints. Full-label annotation is reserved for validation splits, while a sparse frame-level labeling strategy is adopted for the training set, augmented with semi-supervised learning (SSL) leveraging pseudo-labels. Empirical results indicate that this approach achieves near-parity, and occasionally superior, performance compared to fully human-annotated models.

Dataset Statistics and Characteristics

CADC+ comprises 74 snowy and 74 clear sequences (with rare exceptions for partial matches), each consisting of at least 100 LiDAR frames and 3D bounding box annotations. Analysis of object category distributions, object counts, and kinematic properties demonstrates strong alignment between the clear and snowy domains, with only minor exceptions in pedestrian prevalence attributed to unmatched campus sequences.

Quantitative evaluation confirms that the distributions of object count, motion, and LiDAR point count per object are statistically similar, supporting the use of CADC+ as a controlled domain adaptation testbed. Consequently, changes in model performance can be ascribed primarily to domain shift induced by snow, rather than environmental or distributional confounders.

Experimental Evaluation: Snow as Aleatoric and Epistemic Uncertainty

The paper presents systematic experiments assessing the effect of snow on 3D object detection performance. Models (using VoxelNeXt with 5-frame aggregation) are trained with varying proportions of clear and snowy data and evaluated under both domains, using both supervised and semi-supervised settings.

Key empirical findings are:

Models trained solely on clear weather data exhibit a significant decrease in AP ( $\sim$ 5–7%) when evaluated on snowy weather, and vice versa, substantiating snow as a source of both aleatoric (measurement-level) and epistemic (domain-level) uncertainty.
Increasing the fraction of snowy data in the training set monotonically improves snowy-weather detection up to a saturation point, while degrading clear-weather performance, especially for supervised models.
Mixed-domain models (∼50:50) provide the most balanced performance across both domains, especially under SSL, suggesting that domain adaptation strategies or robust multi-domain learning are necessary for weather-agnostic perception.

De-snowing Methods: Limitations and Performance

The study further assesses two paradigmatic de-snowing techniques: DROR (statistical filtering) and LiSnowNet (learning-based). Models trained on de-snowed (synthetic clear) data are benchmarked against both real clear and snowy models.

Notably:

De-snowed models only marginally approach the performance of real clear models on snowy data, but underperform on real clear evaluation, indicating a lack of true domain transfer. Even advanced learning-based de-snowing cannot generate data that sufficiently bridges the domain gap.
Results highlight the inadequacy of de-snowing for generating realistic, annotation-preserving clear-weather samples and confirm the necessity of real, paired, and diverse datasets like CADC+ for robust evaluation and domain adaptation research.

Implications and Future Directions

The introduction of CADC+ fills a critical gap in empirical research on adverse weather domain adaptation. Its paired, closely matched sequences allow controlled studies of sensor and model robustness, domain adaptation, and multi-modal fusion techniques under snow-induced uncertainty. The dataset design, sparse labeling protocol, and SSL pipeline provide benchmarks for annotation-efficient learning and data curation strategies.

The results suggest multiple research directions:

Developing weather-invariant 3D object detection architectures that can explicitly account for both aleatoric and epistemic uncertainty.
Integrating domain adaptation techniques—such as adversarial training or domain-invariant feature learning—grounded in the empirical findings enabled by CADC+.
Improving the realism and transferability of synthetic weather perturbation methods, either by higher-fidelity simulation, multimodal augmentation, or generative models.
Investigating more flexible annotation protocols, robust SSL methods, and leveraging unlabelled or weakly labelled data to minimize annotation cost.

On the dataset side, addressing residual limitations—such as frame-rate disparities and nonuniform scene diversity in unmatched sequences—will further increase research utility. The provision of high-frequency, unlabelled raw sensor data supports future advances in sequence-based SSL, self-supervised pretraining, and multimodal perception research.

Conclusion

CADC+ establishes a new empirical foundation for evaluating and advancing 3D object detection under adverse weather. The dataset's design and associated analyses decisively show that snow introduces both measurement noise and a hard domain shift, unbridgeable by current synthetic data generation approaches. This work will enable systematic investigation of domain adaptation, robustness, and learning efficiency in perception pipelines for autonomous vehicles operating in challenging environments.