Direct Perception for Autonomous Driving: An Intermediate Representation Approach
The paper "DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving" proposes a third paradigm for vision-based autonomous driving systems that bridges the gap between mediated perception and behavior reflex approaches. This novel methodology is termed as "direct perception," wherein the system maps an input image to a small set of key perception indicators that directly relate to the affordance of driving, thereby simplifying the decision-making process for vehicle control.
Existing Paradigms in Autonomous Driving
The two major paradigms within vision-based autonomous driving systems are:
- Mediated Perception Approaches: These systems parse an entire scene to reconstruct a semantic 3D world before making driving decisions. While these systems can achieve a high-level understanding of the environment, they introduce unnecessary complexity by requiring solutions to multiple challenging vision tasks.
- Behavior Reflex Approaches: These methods directly map sensory inputs to driving actions. Despite their elegance, they can struggle with traffic or complex driving maneuvers due to the ill-posed nature of the problem when multiple plausible actions exist for a given scenario.
Proposed Direct Perception Approach
The direct perception approach introduced in this paper aims to estimate the affordance for driving actions by mapping an input image to specific perception indicators. This approach strikes a balance between the complexity of mediated perception and the simplicity of behavior reflex by encoding the scene into a compact, yet comprehensive set of indicators.
Key components of the direct perception model include:
- Affordance Indicators: These indicators encompass the car's relative angle to the road, distances to lane markings, and distances to other vehicles. The representation provides sufficient information to allow a simple controller to make driving decisions.
- Convolutional Neural Network (ConvNet): A deep ConvNet is employed to learn the mapping from input images to affordance indicators, trained on 12 hours of driving data from a car racing video game (TORCS).
Evaluation and Performance
The efficacy of the direct perception approach is demonstrated across various virtual environments and real-world datasets, including the KITTI dataset. The results indicate that the proposed model generalizes well to real driving scenarios. Specifically:
- TORCS Evaluation: The system is capable of driving autonomously in diverse, simulated environments, with reliable lane and car perception modules. The accuracy of the affordance indicators is evaluated using a set of tracks and cars not included in the training set.
- Real-world Testing: Testing on real driving videos and the KITTI dataset showed that the ConvNet-based system could predict the distance to preceding vehicles with a mean absolute error comparable to state-of-the-art methods.
Comparative Analysis
Comparative studies included several baseline methods, such as behavior reflex systems and mediated perception approaches based on the DPM (Deformable Part Model) car detector. The direct perception ConvNet demonstrated superior or comparable performance, especially in accurately estimating distances to nearby vehicles without necessitating complex intermediate representations.
Implications and Future Work
The introduction of the direct perception paradigm represents a significant advancement in simplifying the architecture of autonomous driving systems. This compact scene representation, which is directly tied to actionable driving indicators, can potentially lead to more efficient and cost-effective autonomous vehicles.
The research suggests potential future work in the enhancement and generalization of the direct perception model, such as:
- Extending the Training Dataset: Acquiring more diverse real-world driving data to further improve the robustness and accuracy of affordance predictions.
- Exploring Additional Indicators: Expanding the set of affordance indicators to include more complex driving scenarios and maneuvers.
Conclusion
The paper presents a compelling case for the direct perception approach as a middle ground between mediated perception and behavior reflex. By focusing on key perception indicators relevant to driving actions, this methodology simplifies the control process, potentially paving the way for more proficient and scalable autonomous driving systems.