A Comprehensive Evaluation of Four End-to-End AI Autopilots Using CCTest and the Carla Leaderboard

Published 21 Jan 2025 in cs.SE | (2501.12090v3)

Abstract: End-to-end AI autopilots for autonomous driving systems have emerged as a promising alternative to traditional modular autopilots, offering the potential to reduce development costs and mitigate defects arising from module composition. However, they suffer from the well-known problems of AI systems such as non-determinism, non-explainability, and anomalies. This naturally raises the question of their evaluation and, in particular, their comparison with existing modular solutions. This work extends a study of the critical configuration testing (CCTest) approach that has been applied to four open modular autopilots. This approach differs from others in that it generates test cases ensuring safe control policies are possible for the tested autopilots. This enables an accurate assessment of the ability to drive safely in critical situations, as any incident observed in the simulation involves the failure of a tested autopilot. The contribution of this paper is twofold. Firstly, we apply the CCTest approach to four end-to-end open autopilots, InterFuser, MILE, Transfuser, and LMDrive, and compare their test results with those of the four modular open autopilots previously tested with the same approach implemented in the Carla simulation environment. This comparison identifies both differences and similarities in the failures of the two autopilot types in critical configurations. Secondly, we compare the evaluations of the four autopilots carried out in the Carla Leaderboard with the CCTest results. This comparison reveals significant discrepancies, reflecting differences in test case generation criteria and risk assessment methods. It underlines the need to work towards the development of objective assessment methods combining qualitative and quantitative criteria.