CarDD: Car Damage Detection Dataset
- CarDD is a large-scale, high-resolution dataset featuring 4,000 images and 9,163 annotated instances across six detailed damage categories.
- It supports multiple computer vision tasks including classification, object detection, instance segmentation, and salient object detection with precise COCO-style evaluations.
- The dataset’s rigorous annotation and validation protocols, along with comprehensive evaluation metrics, drive advancements in detecting fine-grained, real-world car damages.
CarDD refers both to a state-of-the-art benchmark dataset for vision-based car damage detection and segmentation, and to a chunk-context aware resemblance detection framework for deduplication delta compression. This article focuses on the Car Damage Detection (CarDD) dataset, a large-scale, high-resolution benchmark for automotive damage analysis in computer vision and machine learning, while also noting the distinct usage of CarDD in the deduplication context.
1. Definition and Scope
CarDD is the first publicly released, large-scale dataset specifically designed for vision-based car damage detection and segmentation. It comprises 4,000 high-resolution images annotated with over 9,000 instances covering six fine-grained damage categories: dent, scratch, crack, glass shatter, lamp broken, and tire flat. The images capture diverse viewpoints, lighting conditions, vehicle colors, and real-world damage scenarios. The dataset supports multiple tasks, including classification, object detection, instance segmentation, and salient object detection (SOD) (Wang et al., 2022). By providing high-quality annotations and category granularity, CarDD enables rigorous evaluation and advancement of machine learning models for fine-grained segmentation and robust detection under challenging conditions.
Separately, CarDD has also been used to denote "Chunk-Context Aware Resemblance Detection," a neural-based method for deduplication delta compression in large-scale storage systems (Ye et al., 2021).
2. Dataset Construction and Annotation Protocol
CarDD’s collection prioritizes diversity, resolution, and annotation accuracy. Raw images were sourced from online repositories (Flickr, Shutterstock) and filtered for quality and damage relevance. After aggressive deduplication, candidate images underwent a VGG-16-based binary classification to isolate likely damage cases from over 10,000 photos, yielding 4,000 for annotation.
The annotation pipeline utilized the CVAT platform with a team of 20 human annotators, five of whom were insurance domain experts. A multi-stage validation process, including expertise-driven review and iterative correction, ensured precise mask quality, consistent bounding boxes, and accurate class labeling. Annotation rules were strictly enforced: overlapping damages were hierarchically prioritized (crack > dent > scratch), boundaries were split at component edges, and adjacent same-class damages on a single component were merged as one instance. All data was exported in COCO JSON format, with additional binary masks for SOD tasks (Wang et al., 2022).
3. Dataset Statistics and Analytical Overview
CarDD comprises 4,000 images and 9,163 annotated damage instances. Image resolutions vary from approximately 1,000×413 pixels to multi-megapixel, with an average area of 684,231 pixels—substantially larger than prior datasets (mean JPEG size 739 KB). The dataset offers significantly greater diversity in angle and vehicle color compared to previous public or private corpora.
Instance scale analysis shows that 38.6% of damages are “small” (< px), 32.6% are “medium,” and 28.8% are “large” according to COCO conventions. Categories such as scratch and crack are highly skewed toward small, fine-scale annotations: over 45% of scratches and over 90% of cracks are small instances, reflecting real-world detection challenges. The dataset’s 70.4%/20.25%/9.35% split for train/val/test ensures balanced representation of all six damage types. Near-duplicates were removed across splits to prevent evaluation leakage (Wang et al., 2022).
4. Supported Tasks and Evaluation Methodology
CarDD is designed for four principal tasks: (a) classification, (b) object detection, (c) instance segmentation, and (d) salient object detection. For detection and segmentation, a model must predict the bounding box, category, and pixel mask per damaged region.
The evaluation protocol follows COCO-style AP (Average Precision) metrics, computed over multiple IoU (Intersection-over-Union) thresholds from 0.50 to 0.95 in increments of 0.05:
where is the precision at recall . Metrics reported include overall mask AP, bounding box AP, AP, AP, and scale-specific AP (AP, AP, AP). Testing on an augmented set of undamaged images assesses robustness to fraud detection and real-world insurance scenarios (Wang et al., 2022).
5. Baseline Methods, Training Protocols, and Results
Models benchmarked on CarDD include Mask R-CNN, Cascade Mask R-CNN, GCNet, HTC, Deformable Convolution Networks (DCN), and DCN+, a DCN variant featuring multi-scale data augmentation and a focal classification loss:
where and .
All baselines were implemented in MMDetection, pretrained on COCO, and fine-tuned on CarDD with ResNet-50/101 backbones, SGD optimizer, and size-augmented multispectral input. Models were trained for 24 epochs with decayed learning rates and batch size 8 on Tesla P100/RTX 3090 GPUs.
DCN+ with ResNet-101 backbone achieves the best overall instance segmentation mask AP at 57.0%, with , , and on small regions. Category-specific AP improvements were observed for dent (32.0% → 40.5%), scratch (24.0% → 34.3%), and crack (9.8% → 16.6%). CarDD exposes limitations of standard detection architectures on slender, low-contrast, or overlapping damages. On undamaged-image-augmented tests, AP fell by only 1.2%, with just 10 false positives, underscoring practical robustness (Wang et al., 2022).
6. Challenges, Recommendations, and Research Impact
CarDD presents unique difficulties absent in general-purpose datasets: (i) the prevalence of small, slender, or irregular damage shapes; (ii) ambiguous inter-class boundaries, notably crack versus scratch; and (iii) the necessity for error tolerance to undamaged cases or intertwined damage types.
Best practices identified include leveraging high-resolution and multi-scale input, class-balanced/focal losses for extreme imbalance, and hybrid approaches that integrate category-agnostic SOD methods with instance segmentation to refine boundaries. CarDD is positioned as a realistic testbed for insurance deployment, emphasizing fraud/edge cases and challenging scene compositions.
Open research directions cited include (a) anomaly detection for fine-grained subclasses, (b) hybrid SOD-category segmentation architectures, (c) temporal/multi-view consistency exploitation, and (d) advanced metrics attuned to the evaluation of tiny or irregular objects (Wang et al., 2022).
7. Alternative Usage: CarDD in Data Deduplication
In deduplication, CarDD denotes "Chunk-Context Aware Resemblance Detection" (Ye et al., 2021), an approach that enhances delta compression by combining N-sub-chunk shingle feature extraction with neural-context embedding. The method integrates a chunk’s internal hash-based structure and its neighbors via a BP (backpropagation) neural network, increasing redundancy detection by up to 75.03% and accelerating operations by 5.6×–17.8× over prior methods such as N-transform or Finesse. A plausible implication is that in the vision domain, leveraging contextual relationships (scene or instance-wise) may analogously improve robustness to minor, local perturbations.
Table: CarDD Dataset Statistics and Breakdown
| Attribute | Value | Notes |
|---|---|---|
| Total images | 4,000 | High-resolution |
| Annotated instances | 9,163 | Six damage categories |
| Train/Val/Test split | 70.4% / 20.25% / 9.35% | Balanced per class |
| Small instances () | 38.6% | Dominant in crack/scratch |
| Avg. image area (pixels) | 684,231 | 740 KB JPEG size |
| Best Mask AP (DCN+) | 57.0% | On ResNet-101 backbone |
CarDD is a pivotal resource for advancing fine-grained car damage detection models, addressing core challenges in insurance automation and robust, real-world object segmentation. It also denotes a distinct neural deduplication methodology in data storage, reflecting the importance of context-aware machine learning in diverse domains.