Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SafeBench: A Benchmarking Platform for Safety Evaluation of Autonomous Vehicles (2206.09682v4)

Published 20 Jun 2022 in cs.RO

Abstract: As shown by recent studies, machine intelligence-enabled systems are vulnerable to test cases resulting from either adversarial manipulation or natural distribution shifts. This has raised great concerns about deploying machine learning algorithms for real-world applications, especially in safety-critical domains such as autonomous driving (AD). On the other hand, traditional AD testing on naturalistic scenarios requires hundreds of millions of driving miles due to the high dimensionality and rareness of the safety-critical scenarios in the real world. As a result, several approaches for autonomous driving evaluation have been explored, which are usually, however, based on different simulation platforms, types of safety-critical scenarios, scenario generation algorithms, and driving route variations. Thus, despite a large amount of effort in autonomous driving testing, it is still challenging to compare and understand the effectiveness and efficiency of different testing scenario generation algorithms and testing mechanisms under similar conditions. In this paper, we aim to provide the first unified platform SafeBench to integrate different types of safety-critical testing scenarios, scenario generation algorithms, and other variations such as driving routes and environments. Meanwhile, we implement 4 deep reinforcement learning-based AD algorithms with 4 types of input (e.g., bird's-eye view, camera) to perform fair comparisons on SafeBench. We find our generated testing scenarios are indeed more challenging and observe the trade-off between the performance of AD agents under benign and safety-critical testing scenarios. We believe our unified platform SafeBench for large-scale and effective autonomous driving testing will motivate the development of new testing scenario generation and safe AD algorithms. SafeBench is available at https://safebench.github.io.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Chejian Xu (18 papers)
  2. Wenhao Ding (43 papers)
  3. Weijie Lyu (10 papers)
  4. Zuxin Liu (43 papers)
  5. Shuai Wang (466 papers)
  6. Yihan He (19 papers)
  7. Hanjiang Hu (23 papers)
  8. Ding Zhao (172 papers)
  9. Bo Li (1107 papers)
Citations (38)

Summary

Essay on "SafeBench: A Benchmarking Platform for Safety Evaluation of Autonomous Vehicles"

The paper "SafeBench: A Benchmarking Platform for Safety Evaluation of Autonomous Vehicles" presents a systematic approach to addressing the pressing need for robust and efficient safety evaluations in the field of autonomous driving. As the deployment of machine learning algorithms in safety-critical applications like autonomous driving becomes more prevalent, ensuring their robustness against adversarial manipulations and natural distribution shifts is imperative. The challenge is heightened by the scarcity of safety-critical scenarios in real-world conditions, necessitating millions of miles of vehicle testing to encounter rare but potentially catastrophic situations.

A significant contribution of this work is the development and introduction of SafeBench, the first unified platform that integrates a wide array of safety-critical testing scenarios and scenario generation algorithms to evaluate autonomous vehicle (AV) algorithms under diverse and controlled circumstances. SafeBench is built upon the scalable and flexible CARLA simulator and incorporates four modular components: Agent Node, Ego Vehicle, Scenario Node, and Evaluation Node. This modular architecture not only enables varied testing and evaluation of AV algorithms but also facilitates continued enhancement to adapt to evolving requirements in autonomous vehicle testing.

The paper delineates the use of eight pre-crash safety-critical scenarios defined by the National Highway Traffic Safety Administration (NHTSA), including Lane Changing, Vehicle Passing, and Red-light Running, among others. Furthermore, SafeBench employs four quality-assured scenario generation algorithms encompassing both adversary-based and knowledge-based methods. These algorithms produce scenarios that pose substantial challenges to AV systems, thereby enabling a comprehensive evaluation of their safety and robustness.

An additional key focus of the paper lies in the robust benchmarking of AV algorithms using deep reinforcement learning (DRL). SafeBench is designed to test four DRL-based AV algorithms with varied perceptual capabilities derived from different input states such as bird’s-eye view (BEV) and camera images. The paper’s authors highlight the apparent trade-offs observed between benign and safety-critical scenario performances, emphasizing the platform's ability to reveal critical vulnerabilities in AV algorithms that conventional benign scenario testing might overlook.

The results demonstrate a substantial performance drop from benign to safety-critical scenario testing, emphasizing the necessity of adversarial assessments as integral to comprehensive AV evaluation. Furthermore, the SafeBench platform reveals inconsistencies in the transferability of scenario generation methods across different AV algorithms, underscoring the varying robustness across models.

In terms of implications, SafeBench serves as a pivotal tool for advancing the understanding and development of safe AV systems. It provides the ability to systematically compare and interpret the effectiveness of diverse testing mechanisms, thereby enabling researchers to propose improved algorithmic strategies and better testing paradigms. This, in turn, contributes to the broader field by laying a foundation for developing safer AV systems, thereby extending the SafeBench utility beyond research and moving closer to real-world applications.

Looking ahead, further integration of multi-sensor fusion models and enhancement of simulation fidelity could lead to even more realistic and challenging testing conditions. As more advanced and diverse scenarios are developed and integrated into SafeBench, the platform promises to continue providing valuable insights into AV safety evaluations, fostering the development of more robust and reliable AV systems.

Overall, SafeBench not only addresses existing limitations in autonomous vehicle evaluation but also sets a benchmark for future studies aiming to enhance the safety and reliability of autonomous driving systems.