Conditional Affordance Learning for Driving in Urban Environments (1806.06498v3)

Published 18 Jun 2018 in cs.RO, cs.LG, and cs.SY

Abstract: Most existing approaches to autonomous driving fall into one of two categories: modular pipelines, that build an extensive model of the environment, and imitation learning approaches, that map images directly to control outputs. A recently proposed third paradigm, direct perception, aims to combine the advantages of both by using a neural network to learn appropriate low-dimensional intermediate representations. However, existing direct perception approaches are restricted to simple highway situations, lacking the ability to navigate intersections, stop at traffic lights or respect speed limits. In this work, we propose a direct perception approach which maps video input to intermediate representations suitable for autonomous navigation in complex urban environments given high-level directional inputs. Compared to state-of-the-art reinforcement and conditional imitation learning approaches, we achieve an improvement of up to 68 % in goal-directed navigation on the challenging CARLA simulation benchmark. In addition, our approach is the first to handle traffic lights and speed signs by using image-level labels only, as well as smooth car-following, resulting in a significant reduction of traffic accidents in simulation.

Authors (3)

Axel Sauer (14 papers)
Nikolay Savinov (16 papers)
Andreas Geiger (136 papers)

Citations (183)

View on Semantic Scholar

Summary

The paper introduces Conditional Affordance Learning (CAL) as a novel direct perception framework that enhances urban autonomous navigation.
It computes low-dimensional affordances from video inputs, integrating high-level directional commands for efficient decision-making.
Empirical results in CARLA demonstrate up to a 68% performance improvement and a significant reduction in traffic infractions compared to other methods.

Conditional Affordance Learning for Autonomous Driving in Urban Environments

The discussed paper introduces an innovative approach in the domain of autonomous driving named Conditional Affordance Learning (CAL), aimed at enhancing navigation capabilities in complex urban settings. CAL is positioned alongside existing methodologies—modular pipelines, imitation learning, and direct perception—asserting unique advantages in handling the intricacies of urban environments, which typically include navigating intersections, adhering to traffic signals, and managing variable speed limits.

Methodology and Contributions

CAL leverages a direct perception strategy to construct intermediate representations from video inputs, integrating high-level directional commands akin to those from consumer-grade navigation systems. This hybridization addresses the inadequacies of current methodologies, particularly those of direct perception approaches currently limited to straightforward highway scenarios. The paper highlights the following contributions:

Intermediate Representations: The authors develop low-dimensional intermediate representations, termed affordances, suitable for urban driving. These affordances encapsulate crucial environmental attributes like proximity to vehicles and traffic signals, thus ensuring that decision-making encompasses comprehensive situational awareness.
Conditional Model: CAL incorporates a conditional model framework, enabling adaptation to high-level driving objectives, such as directional commands, which are critical for executing navigational tasks.
Enhanced Control Algorithms: The proposed control system mitigates common deficiencies of jerkiness in driving and promotes adherence to traffic norms, utilizing image-level labels instead of complete pixel annotations, thus simplifying the training dataset requirements.
Video-based Learning: Demonstrating the efficacy of video-based learning, CAL highlights its superiority over frame-based approaches, showcasing improved navigation performance through dynamic environmental understanding.

Results and Evaluation

Empirical evaluations were conducted utilizing the CARLA simulation benchmark, employing tasks that span basic road following to complex urban navigations, including dynamic obstacles. CAL achieves a notable performance improvement, up to 68%, over both state-of-the-art reinforcement and imitation learning models in goal-directed navigation challenges. These results underscore CAL's superior generalization across unseen urban domains and adverse weather conditions. Additionally, significant reductions in traffic infractions, such as red light violations and vehicle collisions, were reported, demonstrating improved safety adherence.

Implications and Future Directions

The implementation of CAL signifies a robust progression toward reducing the operational complexities inherent in urban autonomous driving. By focusing on affordance-based learning, CAL simplifies the transition from perception to action, providing a clear path for executing navigation commands effectively.

The paper also opens avenues for further explorations in affordance learning, such as expanding the training on diverse environments and refining network architecture to bolster the generalization capabilities of autonomous systems. Moreover, integrating stereo vision or advanced sensor modalities might enhance distance estimation and object detection accuracy.

The research illustrates potential innovations in control strategies, particularly the integration of model predictive control (MPC) algorithms, which could significantly enhance real-time decision-making dynamics and vehicle handling precision in complex traffic scenarios. Overall, while the CAL framework presents a promising avenue for improving autonomous navigation capabilities in urban spaces, continued refinement and expansion into real-world applications remain vital to its evolution and widespread adoption.