Drivora: A Unified and Extensible Infrastructure for Search-based Autonomous Driving Testing

Published 9 Jan 2026 in cs.SE | (2601.05685v1)

Abstract: Search-based testing is critical for evaluating the safety and reliability of autonomous driving systems (ADSs). However, existing approaches are often built on heterogeneous frameworks (e.g., distinct scenario spaces, simulators, and ADSs), which require considerable effort to reuse and adapt across different settings. To address these challenges, we present Drivora, a unified and extensible infrastructure for search-based ADS testing built on the widely used CARLA simulator. Drivora introduces a unified scenario definition, OpenScenario, that specifies scenarios using low-level, actionable parameters to ensure compatibility with existing methods while supporting extensibility to new testing designs (e.g., multi-autonomous-vehicle testing). On top of this, Drivora decouples the testing engine, scenario execution, and ADS integration. The testing engine leverages evolutionary computation to explore new scenarios and supports flexible customization of core components. The scenario execution can run arbitrary scenarios using a parallel execution mechanism that maximizes hardware utilization for large-scale batch simulation. For ADS integration, Drivora provides access to 12 ADSs through a unified interface, streamlining configuration and simplifying the incorporation of new ADSs. Our tools are publicly available at https://github.com/MingfeiCheng/Drivora.

Abstract PDF Upgrade to Chat

Summary

The paper introduces Drivora, a unified infrastructure that integrates scenario definition, testing engine, and ADS integration for search-based autonomous driving testing.
It employs evolutionary search techniques and parallel scenario execution to effectively identify safety-critical violations across 12 ADSs.
The framework’s modular design and OpenScenario standard enhance reproducibility, extensibility, and large-scale industrial validation.

Drivora: A Unified and Extensible Infrastructure for Search-based Autonomous Driving Testing

Motivation and Context

Autonomous driving systems (ADSs) demand rigorous validation, especially for safety and reliability, as their deployment carries substantial practical risk. Search-based simulation testing has emerged as a standard methodology to generate and identify safety-critical corner cases and violations in a controlled, reproducible manner. Existing research platforms for search-based ADS testing exhibit significant fragmentation, mainly due to heterogeneous scenario definitions, simulator bindings, single-system architectures, and limited extensibility. Migration between platforms—or conducting evaluations across multiple ADSs—remains cumbersome and error-prone due to this fragmentation.

Drivora addresses these deficiencies by introducing an extensible infrastructure that unifies scenario definition, decouples testing components, and supports multi-system integration, all built on the well-established CARLA simulation backend. The framework explicitly targets compatibility, extensibility to support new methods, and maximal execution efficiency required for large-scale search-based validation.

Architectural Overview

Drivora's architecture is constructed on distinct modular layers that cleanly separate key concerns: scenario definition, testing engine, scenario execution, and ADS integration. This separation not only accommodates direct integration of heterogeneous ADSs but also enables rapid prototyping and replacement of search algorithms. Figure 1 encapsulates the architecture, delineating the flow from resource ingestion to parallelized scenario execution and downstream analysis.

Figure 1: Drivora framework architecture, depicting the modular separation of ADS integration, testing configuration, evolutionary testing engine, and parallelized scenario execution.

The core contributions of Drivora are:

Unified Scenario Definition (OpenScenario): Scenarios are encoded using low-level, actionable parameters (trajectories, start times, weather, traffic lights, actor types), which serve as a lingua franca between disparate scenario-generation techniques.
Testing Engine: Implements classical evolutionary search, including scenario mutation, test oracle evaluation, feedback-driven selection, and flexible integration for arbitrary search algorithms.
Parallel Scenario Execution Layer: Supports $K$ -way parallel execution, maximizing hardware throughput. Each worker executes arbitrary multi-AV scenarios in isolation, critical for efficient exploration of high-dimensional input spaces.
ADS Integration Interface: Provides a standardized API (setup_env and run_step) for any ADS binding, facilitating evaluation and method transfer across 12 integrated ADSs, including both module-based and end-to-end architectures.

Scenario Definition and Execution

OpenScenario defines scenario entities by low-level attributes, enabling a canonical internal representation for mutation, compatibility, and downstream analysis. The main entities are ego vehicles (with full route and ADS config), dynamic/static NPCs, map regions, weather, and traffic lights. This low-level description unifies the scenario spaces of a diverse set of existing search-based testing tools, laying the basis for broad compatibility and extensibility.

Scenario mutation explores this parameter space using evolutionary operators, supporting arbitrary search objectives defined via custom feedback functions and test oracles. Drivora’s parallelism in scenario execution supports both single- and multi-AV cases, capturing emergent risks inherent in interactions among multiple autonomous agents.

Empirical Results and Practical Capabilities

Demo results highlight Drivora's capacity to discover diverse categories of safety violations:

Inter-vehicle Collision: Drivora identifies competitive acceleration behaviors resulting in side collisions at junctions.
Behavioral Failures: Failure to decelerate appropriately in presence of NPCs, capturing front collision cases.
Perception/Decision-Making Faults: Vision–language-based ADS misinterpreting traffic signs, resulting in prolonged stops (see Figure 2).
Figure 2: Representative types of violation cases uncovered by Drivora, including collisions and misbehavior in complex scenarios.

Execution throughput scales linearly as the degree of parallelization $K$ increases, confirming that the scenario execution layer is not bottlenecked by synchronization or resource contention. The architecture natively supports multi-AV scenarios, elucidating complex, higher-order emergent behaviors—such as deadlocks among several AVs—which are fundamentally unobservable in single-vehicle paradigms.

Implications for Autonomous Driving Validation

The unification and modularization offered in Drivora allow for reproducibility, extensibility, and efficient large-scale testing critical for modern evolutionary, reinforcement learning–based, and diversity-guided techniques. The integration with 12 ADSs underscores Drivora's effectiveness as a benchmarking and validation platform, facilitating head-to-head comparison using identical scenarios and test harnesses.

The extensibility of OpenScenario may facilitate future experimentation with LLM–driven or behavior-distribution–guided scenario generators. The ability to plug in advanced test search strategies (e.g., RL, many-objective) and new oracles at minimal engineering cost provides a runtime foundation suitable for both software engineering researchers and industrial validation pipelines.

Future Directions

Further planned work includes comprehensive empirical evaluation across the integrated ADSs, quantifying the coverage and diversity actually achieved by different search strategies in the unifying infrastructure. The current design can be extended to incorporate new scenario mutation pipelines (e.g., RL-based generative adversaries), to support formal requirements-driven test case generation, and integration with incident databases or natural-language scenario descriptions.

Conclusion

Drivora represents a concrete step toward infrastructural convergence in the evaluation of autonomous driving systems. By offering a robust, extensible, and parallelized framework grounded on actionable scenario definitions and modular architecture, Drivora addresses critical gaps in the state-of-the-art for ADS testing. The framework’s ability to surface diverse categories of violations and emergent behaviors demonstrates its value for both researchers and practitioners, and positions Drivora as a central validation platform for ongoing ADS safety research and engineering (2601.05685).