Computer Vision for Autonomous Vehicles: Problems, Datasets and State of the Art (1704.05519v3)

Published 18 Apr 2017 in cs.CV and cs.RO

Abstract: Recent years have witnessed enormous progress in AI-related fields such as computer vision, machine learning, and autonomous vehicles. As with any rapidly growing field, it becomes increasingly difficult to stay up-to-date or enter the field as a beginner. While several survey papers on particular sub-problems have appeared, no comprehensive survey on problems, datasets, and methods in computer vision for autonomous vehicles has been published. This book attempts to narrow this gap by providing a survey on the state-of-the-art datasets and techniques. Our survey includes both the historically most relevant literature as well as the current state of the art on several specific topics, including recognition, reconstruction, motion estimation, tracking, scene understanding, and end-to-end learning for autonomous driving. Towards this goal, we analyze the performance of the state of the art on several challenging benchmarking datasets, including KITTI, MOT, and Cityscapes. Besides, we discuss open problems and current research challenges. To ease accessibility and accommodate missing references, we also provide a website that allows navigating topics as well as methods and provides additional information.

PDF Abstract

Computer Vision for Autonomous Vehicles: Problems, Datasets, and State of the Art

The field of autonomous vehicles has seen notable advancements in recent years, driven largely by the convergence of computer vision and machine learning technologies. Despite this progress, several challenges remain unsolved, requiring ongoing research and innovation. This survey provides a comprehensive overview of the current state of various sub-problems in computer vision for autonomous vehicles, categorizes the methods addressing these problems, and identifies core datasets fueling research advancements.

Introduction

Autonomous vehicles must navigate complex, dynamic environments, necessitating robust models that generalize to unpredictable situations and can perform timely reasoning. This survey categorizes existing approaches into classical modular pipelines and modern monolithic end-to-end learning approaches. Traditional modular pipelines excel in interpretability and parallel development but suffer from non-optimal intermediate representations and lack of holistic learning. Conversely, end-to-end learning approaches offer integrated models but face challenges related to overfitting and interpretability.

Core Challenges in Autonomous Vision

The survey identifies several key sub-problems pertinent to autonomous driving, which include:

Object Detection and Recognition: Identifying and classifying objects within a scene.
Segmentation: Assigning each pixel to a class, providing detailed scene understanding.
3D Reconstruction: Transforming 2D images into 3D models to understand the environment's spatial layout.
Ego-Motion Estimation and Localization: Determining the vehicle's position and movement within its environment.
Tracking: Following objects across frames to understand their movement over time.
Scene Understanding: Holistically interpreting scenes to infer relationships between objects and predict future states.

Datasets Driving Research

Various datasets serve as benchmarks to evaluate approaches for these sub-problems:

KITTI: Widely adopted for tasks like stereo vision, optical flow, object detection, and tracking, thanks to its comprehensive and real-world driving scenarios.
Cityscapes: Focuses on urban scene understanding, providing high-quality annotations for semantic and instance segmentation.
Microsoft COCO: A versatile dataset used for object detection, segmentation, and keypoint estimation.
Synthetic Datasets: Such as SYNTHIA and Virtual KITTI, these datasets enable training on large-scale data, addressing the need for diverse and dense annotations.

Key Approaches and Findings

Object Detection and Recognition

Methods have evolved from classical pipelines employing hand-crafted features and sliding window approaches to deep learning models like region-based convolutional neural networks (R-CNNs) and their variants (Fast R-CNN, Faster R-CNN). Advances in one-stage detectors, such as YOLO and SSD, highlight the trend towards end-to-end learning, offering improved accuracy and real-time performance.

Segmentation

Semantic segmentation has seen significant improvements with fully convolutional networks (FCNs), DeepLab, and U-Net architectures. These models have leveraged dilated convolutions and pyramid pooling to enhance spatial accuracy. Joint learning of semantics and geometry, seen in models like SemanticFPN and PSPNet, has further improved performance.

3D Reconstruction

Multi-view stereo (MVS) and structure-from-motion (SfM) approaches have progressively incorporated deep learning to handle large scale and complexity, with methods like COLMAP setting the benchmark for dense 3D reconstruction.

Ego-Motion Estimation and Localization

Visual odometry and SLAM techniques have been pivotal for real-time localization, with methods ranging from feature-based approaches to direct methods that utilize photometric error minimization. The advent of hybrid methods combining visual and inertial measurements shows promise in enhancing robustness and accuracy.

Tracking

Multi-object tracking benefits from advancements in object detection and employs data association techniques to follow objects across frames. Modern approaches leverage deep learning for feature extraction, with methods integrating re-identification models and leveraging graph-based and continuous optimization techniques.

Scene Understanding

Approaches that integrate multiple cues (e.g., semantics, geometry, and motion) into a unified model provide a more holistic scene understanding. Probabilistic models and deep learning-based methods continue to enhance the ability to infer road topology, traffic participant behavior, and predict future states.

Implications and Future Directions

The integration of computer vision techniques into autonomous driving systems shows substantial potential but also faces challenges. Existing methods need to address issues related to generalization across different environments, robustness to occlusions, and the ability to interpret complex, dynamic scenes in real-time.

The continued development and refinement of large-scale, diverse datasets will be crucial in driving progress. Future research is likely to benefit from hybrid approaches that combine modular interpretability with the integrated learning capabilities of end-to-end models, as well as the incorporation of multi-modal sensor data. Current trends indicate a growing emphasis on explainability and robustness, ensuring that autonomous systems can operate safely and effectively in diverse real-world conditions.

In summary, this survey provides a detailed analysis of the state of the art in computer vision for autonomous vehicles. By highlighting specific challenges, reviewing datasets, and summarizing key methods, it serves as a critical resource for researchers aiming to further advance this transformative field.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Joel Janai (4 papers)
Fatma Güney (27 papers)
Aseem Behl (3 papers)
Andreas Geiger (136 papers)

Citations (740)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos