Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases (2404.10595v5)
Abstract: Large Vision-LLMs (LVLMs) have received widespread attention for advancing the interpretable self-driving. Existing evaluations of LVLMs primarily focus on multi-faceted capabilities in natural circumstances, lacking automated and quantifiable assessment for self-driving, let alone the severe road corner cases. In this work, we propose CODA-LM, the very first benchmark for the automatic evaluation of LVLMs for self-driving corner cases. We adopt a hierarchical data structure and prompt powerful LVLMs to analyze complex driving scenes and generate high-quality pre-annotations for the human annotators, while for LVLM evaluation, we show that using the text-only LLMs as judges reveals even better alignment with human preferences than the LVLM judges. Moreover, with our CODA-LM, we build CODA-VLM, a new driving LVLM surpassing all open-sourced counterparts on CODA-LM. Our CODA-VLM performs comparably with GPT-4V, even surpassing GPT-4V by +21.42% on the regional perception task. We hope CODA-LM can become the catalyst to promote interpretable self-driving empowered by LVLMs.
- Yanze Li (2 papers)
- Wenhua Zhang (13 papers)
- Kai Chen (512 papers)
- Yanxin Liu (7 papers)
- Pengxiang Li (24 papers)
- Ruiyuan Gao (18 papers)
- Lanqing Hong (72 papers)
- Meng Tian (25 papers)
- Xinhai Zhao (6 papers)
- Zhenguo Li (195 papers)
- Dit-Yan Yeung (78 papers)
- Huchuan Lu (199 papers)
- Xu Jia (57 papers)