DrivAerML Dataset Overview
- DrivAerML is an open-access, high-fidelity dataset for automotive aerodynamics featuring 500 parametric geometry variants and validated CFD results.
- It employs state-of-the-art CFD techniques with a hybrid RANS–LES approach and a modified lattice DoE to ensure precise, industrial-scale morphing and simulation fidelity.
- The dataset supports surrogate modeling, physics-informed machine learning, and design optimization through benchmark-quality data and standardized evaluation protocols.
The DrivAerML dataset is an open-access, high-fidelity resource for automotive aerodynamics, containing parametric geometry variants and validated computational fluid dynamics (CFD) solutions for realistic road vehicles. It is recognized as the first large-scale, public-domain dataset generated via state-of-the-art, scale-resolving CFD specifically for complex automotive configurations, designed to fill the critical gap in open-source, benchmark-quality data for surrogate modeling, design optimization, and physics-informed machine learning in external car aerodynamics (Ashton et al., 21 Aug 2024).
1. Dataset Structure and Parametric Coverage
DrivAerML is based on the notchback variant of the Ford OCDA DrivAer research model. It consists of 500 distinct geometry configurations generated through parametric morphing, where the morphing procedure employs “morphing boxes” placed around the baseline geometry using ANSA software. Each design instance is specified by 16 geometric parameters, which include dimensional adjustments (length, width, height, ride height, pitch angle), body panel angles (hood, windscreen, backlight), decklid height, rear tapering, and overhangs. Range constraints are imposed to avoid unrealistic morphologies.
All simulation outputs and data are organized on a per-variant basis in a folder structure, making the dataset consistent with precedent projects (e.g., AhmedML, WindsorML). The provided data for each variant includes:
- STL geometry file.
- Full-domain volume solutions in VTK (.vtu, ~160 million cells per case).
- Body surface solutions in VTK (.vtp).
- 2D slices in the x, y, and z directions for flow-field visualization.
- CSV files containing time-averaged aerodynamic force and moment coefficients with both geometry-specific and baseline reference values.
- PNG images depicting surface contours (pressure, friction, representative slices).
- Auxiliary files: parametric specification, normalized reference geometry values.
This design ensures comprehensive coverage of relevant industrial car features and provides rich input for both full-field analysis and reduced-order modeling.
2. CFD Methodology and Simulation Fidelity
Simulations are performed with OpenFOAM (v2212) using the finite volume method and a hybrid RANS–LES (HRLES) approach, specifically the SA-σ-DDES variant with “enhanced protection” shielding adapted from the ZDES methodology. This protocol maintains RANS modeling near attached boundaries (Spalart–ALLMaras) and transitions to LES (Nicoud σ) in separated flow regions, validated for automotive Reynolds numbers (Re ≈ 7.19×10⁶). Automation in mesh generation is achieved with ANSA HeXtreme, producing hexahedral-dominant/polyhedral meshes at a resolution of approximately 160 million cells per simulation, with refined boundary layers and local mesh size enhancement near wakes, underbody, and mirrors.
Time-averaging uses the Meancalc tool—a dynamically triggered workflow that monitors transient evolution for statistical convergence, targeting ±1.5 drag counts. Simulations are executed on large HPC clusters (e.g., Amazon EC2, typically 1536 cores), with runs taking ~40 hours in double precision mode and parallelized communications. The CFD protocol is explicitly validated against wind tunnel measurements, including pressure coefficients and velocity profiles.
3. Morphing Algorithm and Experimental Design
Parametric sampling uses a Modified Extensible Lattice Sequence-based design of experiments (DoE), ensuring a uniform and mathematically guaranteed spread over the feasible parameter domain. Each geometry is automatically recorded with its morphing parameters, computed reference values (frontal area, wheelbase), and all derived aerodynamic coefficients. This process enables reproducible, systematic coverage of realistic industrial variations and provides a robust training set, suitable for surrogate modeling and sensitivity analysis.
4. Data Accessibility and Licensing
DrivAerML is distributed under the Creative Commons CC-BY-SA v4.0 license, permitting unrestricted academic and commercial use with attribution and share-alike requirements for derivatives. Data is stored in universal open-source formats: STL for geometry, VTK (.vtu/.vtp) for CFD fields, CSV for coefficients, and PNG for visualizations. It is hosted on Amazon S3 (us-east-1), accessible directly via the AWS CLI (no AWS account required with “no-sign-request”). Accompanying documentation includes download instructions, sample scripts, and license details, facilitating broad adoption across research and industrial domains.
5. Machine Learning and Engineering Applications
DrivAerML enables a wide range of applications:
- Surrogate modeling: Full-field CFD data and aerodynamic integrals offer robust ground truth for fast, data-driven predictions of drag, lift, pressure distributions, and complex flow phenomena.
- Physics-informed machine learning: The dataset supports hybrid models (e.g., neural operators, physics-informed nets) by providing high-fidelity, statistically converged training targets covering realistic design space.
- Design optimization: With systematic parametric coverage and dense flow solutions, it supports iterative engineering workflows where rapid aerodynamic feedback is required.
- Benchmarking: DrivAerML is designed as a reference dataset for validating ML model generalization and comparing against prior datasets, including transfer learning between CFD fidelity levels, and is formally utilized in frameworks for AI benchmarking in automotive aerodynamics (Tangsali et al., 14 Jul 2025).
- Flow physics analysis: Detailed time-averaged 3D velocity, pressure, and friction data, along with slices and visualizations, facilitate conventional CFD studies of turbulence, separation, and wake characterization.
A plausible implication is that DrivAerML’s breadth and fidelity position it as the modern standard for ML-based aerodynamic prediction and benchmarking in the automotive sector.
6. Benchmarking, Validation, and Impact
DrivAerML serves as a foundational evaluation set for physical and machine learning models. Its statistical validation against wind tunnel data instills confidence in the accuracy and industrial relevance of predictions. In benchmarking initiatives (see (Tangsali et al., 14 Jul 2025)), the dataset is used to assess a variety of AI models—including point cloud-based neural operators (DoMINO), mesh graph networks (X-MeshGraphNet), and implicit convolutional nets (FIGConvNet)—across metrics such as L₂ and area-weighted errors, aerodynamic force integration, trend conformity (Spearman coefficient), and out-of-distribution generalization. Specifically:
- The benchmarking framework provides standardized train-validation splits, post-processing protocols (e.g., .vtp/.vtu output alignment), and metric computation procedures, promoting transparent cross-model comparisons.
- DoMINO, when evaluated, achieves R² ≈ 0.96 for drag force regression, demonstrating robust integral prediction accuracy (Ranade et al., 23 Jan 2025).
- Surface and volume field predictions, centerline plots, and integral error metrics are supplied for rigorous scrutiny of ML model fidelity.
- Guidelines for integrating further models and datasets, and physical consistency in metric definition, enhance reproducibility and future extensibility.
This comprehensive protocol and reference dataset establish best practices and standards for deploying, validating, and advancing AI models in engineering simulation cycles.
7. Key Findings and Contributions
The DrivAerML project contributes:
- The first open, high-fidelity CFD dataset for realistic, complex road-car shapes, confirmed against experimental data.
- Systematic, parametric coverage via design-of-experiments morphing—enhancing ML model generalization.
- Rich, physically relevant data content—enabling full-field and integral predictions, and bridging CFD and ML domains.
- Practical impacts—accelerating surrogate modeling, enabling rapid design evaluation, and reducing reliance on costly simulation runs.
- Standardized benchmarking infrastructure—driving transparent, reproducible AI development and evaluation for automotive aerodynamics.
This initiative marks a substantive advance in the intersection of CFD, data-driven modeling, and automotive engineering, establishing a platform for continued research and innovation in physics-consistent ML for aerodynamic simulation and optimization.