Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 41 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 21 tok/s Pro
GPT-5 High 20 tok/s Pro
GPT-4o 91 tok/s Pro
Kimi K2 178 tok/s Pro
GPT OSS 120B 474 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems (2503.20756v3)

Published 26 Mar 2025 in cs.CL, cs.AI, cs.CV, cs.LG, and cs.MM

Abstract: Recent advancements in Large Multimodal Models (LMMs) have shown promise in Autonomous Driving Systems (ADS). However, their direct application to ADS is hindered by challenges such as misunderstanding of traffic knowledge, complex road conditions, and diverse states of vehicle. To address these challenges, we propose the use of Knowledge Editing, which enables targeted modifications to a model's behavior without the need for full retraining. Meanwhile, we introduce ADS-Edit, a multimodal knowledge editing dataset specifically designed for ADS, which includes various real-world scenarios, multiple data types, and comprehensive evaluation metrics. We conduct comprehensive experiments and derive several interesting conclusions. We hope that our work will contribute to the further advancement of knowledge editing applications in the field of autonomous driving. Code and data are available in https://github.com/zjunlp/EasyEdit/blob/main/examples/ADSEdit.md.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces a multimodal benchmark (ADS-Edit) that enables efficient knowledge editing in ADS models to tackle traffic knowledge misunderstandings.
  • It employs a tri-axis framework covering perception, understanding, and decision-making with diverse modalities such as video and multi-view images.
  • Experimental results reveal that methods like GRACE and WISE show trade-offs in reliability, generality, and locality, highlighting areas for improvement.

ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems

Introduction

Large Multimodal Models (LMMs) have demonstrated potential in various domains, including Autonomous Driving Systems (ADS). However, directly applying LMMs to ADS poses significant challenges due to traffic knowledge misunderstandings, complex road conditions, and diverse vehicular motion states. The paper proposes a knowledge editing approach to address these challenges by enabling model updates without extensive retraining. The authors introduce ADS-Edit, a multimodal knowledge editing dataset specifically designed for ADS, which includes various real-world scenarios, data types, and evaluation metrics. Figure 1

Figure 1: Direct application of LMMs in Autonomous Driving Systems faces several challenges, including the misunderstanding of traffic knowledge, the complex and varied road conditions, and the diversity of vehicle's motion states. Knowledge Editing that enables efficient, continuous, and precise updates to knowledge can effectively address these challenges.

Benchmark Construction

ADS-Edit is constructed around three key scenario types and data types. The benchmark is organized along three axes: perception, understanding, and decision-making, which assess LMM capabilities from basic recognition to complex reasoning. The input modalities include video, multi-view images, and single images. Metrics such as reliability, generality, and locality are designed to evaluate knowledge editing methods within this framework. Figure 2

Figure 2: The overview of ADS-Edit construction pipeline.

Scenario Type and Data Collection

The authors designed a tri-axis framework to comprehensively evaluate LMMs. The scenario types include perception, which examines basic visual capabilities; understanding, which involves recognition of traffic rules; and decision making, which requires the model to synthesize information for driving decisions. The data types are categorized as video, multi-view images, and single images. Data were sourced from established autonomous driving datasets, LingoQA and DriveLM. Scenario classifications ensure that data selected cover a spectrum of real-world challenges faced by ADS. Figure 3

Figure 3: The statistics of scenario types for ADS-Edit.

Evaluation Metrics

The metrics include reliability, which assesses the success of modifying a model's behavior; generality, which evaluates performance across new but related scenarios; and locality, which checks whether unintended knowledge alterations occur after editing. Experiments were conducted on modern LMMs such as LLaVA-OneVision and Qwen2-VL. Performance was measured using editing evaluation (token-level comparison) and real-world evaluation (free-form generation). Figure 4

Figure 4: The average generality metric of single editing across different scenarios.

Experimental Results

In experiments, the memory-based editing methods GRACE and WISE showed high effectiveness, with GRACE achieving a perfect modification rate but demonstrating poor generalization. WISE offered a balanced performance. Prompt and AdaLora methods exhibited suboptimal reliability and generality, with AdaLora severely affecting locality due to parameter modifications. The analysis highlighted that, although editing methods performed differently, there was a need for improvements to fully meet ADS requirements. Figure 5

Figure 5: Cases analysis of editing LLaVA-OneVision with WISE.

Analysis

Experiments revealed that decision-making scenario data posed the greatest challenge due to complexity, while simpler perception and understanding scenarios were less demanding. The adaptation of models across different scenarios was analyzed, and the findings indicated that LLMs retained original outputs, displaying resistance to edits, leading to consistency concerns. Reducing video frame inputs demonstrated potential for balancing performance and processing speed, crucial for real-time autonomous applications. Figure 6

Figure 6: The average generality metric of single editing across different data types.

Conclusion

The work provides a structured methodology for applying knowledge editing techniques in ADS, culminating in the construction of the ADS-Edit benchmark. This work serves as a foundational effort to bridge LMMs with practical challenges in autonomous driving, advocating for continued exploration and improvement of knowledge editing methods specifically tailored for domain-specific applications such as ADS. The paper concludes that current techniques require refinement to fully meet the operational demands of autonomous systems. Future work could focus on enhancing the robustness and adaptability of editing methods to better integrate the varied data modalities and scenarios encountered in real-world autonomous driving systems.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com