Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

World Models for Autonomous Driving: An Initial Survey (2403.02622v3)

Published 5 Mar 2024 in cs.LG, cs.AI, and cs.RO

Abstract: In the rapidly evolving landscape of autonomous driving, the capability to accurately predict future events and assess their implications is paramount for both safety and efficiency, critically aiding the decision-making process. World models have emerged as a transformative approach, enabling autonomous driving systems to synthesize and interpret vast amounts of sensor data, thereby predicting potential future scenarios and compensating for information gaps. This paper provides an initial review of the current state and prospective advancements of world models in autonomous driving, spanning their theoretical underpinnings, practical applications, and the ongoing research efforts aimed at overcoming existing limitations. Highlighting the significant role of world models in advancing autonomous driving technologies, this survey aspires to serve as a foundational reference for the research community, facilitating swift access to and comprehension of this burgeoning field, and inspiring continued innovation and exploration.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (81)
  1. Yann LeCun. A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27. Open Review, 62(1), 2022.
  2. Le Chang and Doris Y Tsao. The code for facial identity in the primate brain. Cell, 169(6):1013–1028, 2017.
  3. Invariant visual representation by single neurons in the human brain. Nature, 435(7045):1102–1107, 2005.
  4. Model predictive heuristic control: application to industrial processes. Automatica, 14(5):413–428, 1978.
  5. Arthur Earl Bryson. Applied optimal control: optimization, estimation and control. Routledge, 2018.
  6. Arthur E Bryson. Optimal control-1950 to 1985. IEEE Control Systems Magazine, 16(3):26–33, 1996.
  7. Identification and control of dynamical systems using neural networks. IEEE Transactions on neural networks, 1(1):4–27, 1990.
  8. Control of nonlinear dynamical systems using neural networks: Controllability and stabilization. IEEE Transactions on neural networks, 4(2):192–206, 1993.
  9. Model predictive control using neural networks. IEEE Control Systems Magazine, 15(5):61–66, 1995.
  10. Jürgen Schmidhuber. An on-line algorithm for dynamic reinforcement learning and planning in reactive environments. In 1990 IJCNN international joint conference on neural networks, pages 253–258. IEEE, 1990.
  11. Jürgen Schmidhuber. On learning to think: Algorithmic information theory for novel combinations of reinforcement learning controllers and recurrent neural world models. arXiv preprint arXiv:1511.09249, 2015.
  12. World models. arXiv preprint arXiv:1803.10122, 2018a.
  13. Recurrent world models facilitate policy evolution. Advances in neural information processing systems, 31, 2018b.
  14. Transformers are sample efficient world models. arXiv preprint arXiv:2209.00588, 2022.
  15. Model-based reinforcement learning for atari. arXiv preprint arXiv:1903.00374, 2019.
  16. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  17. Carl Doersch. Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908, 2016.
  18. Stochastic backpropagation and approximate inference in deep generative models. In International conference on machine learning, pages 1278–1286. PMLR, 2014.
  19. Mc-jepa: A joint-embedding predictive architecture for self-supervised learning of motion and content features. arXiv preprint arXiv:2307.12698, 2023a.
  20. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
  21. Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.
  22. David J Foster. Replay comes of age. Annual review of neuroscience, 40:581–602, 2017.
  23. Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603, 2019a.
  24. Learning latent dynamics for planning from pixels. In International conference on machine learning, pages 2555–2565. PMLR, 2019b.
  25. Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193, 2020.
  26. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
  27. Discovering and achieving goals via world models. Advances in Neural Information Processing Systems, 34:24379–24391, 2021.
  28. A-jepa: Joint-embedding predictive architecture can listen. arXiv preprint arXiv:2311.15830, 2023.
  29. Self-supervised learning from images with a joint-embedding predictive architecture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15619–15629, 2023.
  30. Transdreamer: Reinforcement learning with transformer world models. arXiv preprint arXiv:2202.09481, 2022.
  31. Storm: Efficient stochastic transformer based world models for reinforcement learning. arXiv preprint arXiv:2310.09615, 2023a.
  32. Mastering atari games with limited data. Advances in Neural Information Processing Systems, 34:25476–25488, 2021.
  33. Harmony world models: Boosting sample efficiency for model-based reinforcement learning. arXiv preprint arXiv:2310.00344, 2023.
  34. Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
  35. J Beau W Webber. A bi-symmetric log transformation for wide-range data. Measurement Science and Technology, 24(2):027001, 2012.
  36. Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464, 2018.
  37. Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018.
  38. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on robot learning, pages 1094–1100. PMLR, 2020.
  39. World model as a graph: Learning latent landmarks for planning. In International Conference on Machine Learning, pages 12611–12620. PMLR, 2021.
  40. Pathdreamer: A world model for indoor navigation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14738–14748, 2021.
  41. Safe dreamerv3: Safe reinforcement learning with world models. arXiv preprint arXiv:2307.07176, 2023.
  42. Daydreamer: World models for physical robot learning. In Conference on Robot Learning, pages 2226–2240. PMLR, 2023.
  43. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
  44. Video generation models as world simulators. 2024. URL https://openai.com/research/video-generation-models-as-world-simulators.
  45. Genie: Generative interactive environments, 2024.
  46. Active world model learning with progress curiosity. In International conference on machine learning, pages 5306–5315. PMLR, 2020.
  47. Model-based imitation learning for urban driving. Advances in Neural Information Processing Systems, 35:20703–20716, 2022.
  48. Enhance sample efficiency and robustness of end-to-end urban autonomous driving via semantic masked world model. arXiv preprint arXiv:2210.04017, 2022.
  49. Dreamingv2: Reinforcement learning with discrete world models without reconstruction. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 985–991. IEEE, 2022.
  50. Dreamerpro: Reconstruction-free model-based reinforcement learning with prototypical representations. In International Conference on Machine Learning, pages 4956–4975. PMLR, 2022.
  51. Maxent dreamer: Maximum entropy reinforcement learning with world model. In 2022 International Joint Conference on Neural Networks (IJCNN), pages 1–9. IEEE, 2022.
  52. Drivedreamer: Towards real-world-driven world models for autonomous driving. arXiv preprint arXiv:2309.09777, 2023a.
  53. Adriver-i: A general world model for autonomous driving. arXiv preprint arXiv:2311.13549, 2023.
  54. Gaia-1: A generative world model for autonomous driving. arXiv preprint arXiv:2309.17080, 2023.
  55. Driving into the future: Multiview visual forecasting and planning with world model for autonomous driving. arXiv preprint arXiv:2311.17918, 2023b.
  56. Trafficbots: Towards world models for autonomous driving simulation and motion prediction. arXiv preprint arXiv:2303.04116, 2023b.
  57. Uniworld: Autonomous driving pre-training via world models. arXiv preprint arXiv:2308.07234, 2023.
  58. Dream to generalize: zero-shot model-based reinforcement learning for unseen visual distractions. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 7802–7810, 2023.
  59. Transformer-based world models are happy with 100k interactions. arXiv preprint arXiv:2303.07109, 2023.
  60. Diffdreamer: Towards consistent unsupervised single-view scene extrapolation with conditional diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2139–2150, 2023.
  61. Muvo: A multimodal generative world model for autonomous driving with geometric representations. arXiv preprint arXiv:2311.11762, 2023.
  62. Structured world models from human videos. arXiv preprint arXiv:2308.10901, 2023.
  63. Occworld: Learning a 3d occupancy world model for autonomous driving. arXiv preprint arXiv:2311.16038, 2023.
  64. Worlddreamer: Towards general world models for video generation via predicting masked tokens. arXiv preprint arXiv:2401.09985, 2024.
  65. Facing off world model backbones: Rnns, transformers, and s4. Advances in Neural Information Processing Systems, 36, 2024.
  66. V-jepa: Latent video prediction for visual representation learning. 2023b.
  67. Think2drive: Efficient reinforcement learning by thinking in latent world model for quasi-realistic autonomous driving (in carla-v2). arXiv preprint arXiv:2402.16720, 2024.
  68. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020.
  69. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023), 2023.
  70. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  71. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  72. Alberto Elfes. Using occupancy grids for mobile robot perception and navigation. Computer, 22(6):46–57, 1989.
  73. Learning structured output representation using deep conditional generative models. Advances in neural information processing systems, 28, 2015.
  74. On the difficulty of training recurrent neural networks. In International conference on machine learning, pages 1310–1318. Pmlr, 2013.
  75. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
  76. Parallel learning-based steering control for autonomous driving. IEEE Transactions on Intelligent Vehicles, 8(1):379–389, 2022.
  77. Explainable ai in industry. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 3203–3204, 2019.
  78. Alexey Ignatiev. Towards trustable explainable ai. In IJCAI, pages 5154–5158, 2020.
  79. General Data Protection Regulation. General data protection regulation (gdpr). Intersoft Consulting, Accessed in October, 24(1), 2018.
  80. Autonomous driving: technical, legal and social aspects. Springer Nature, 2016.
  81. The social dilemma of autonomous vehicles. Science, 352(6293):1573–1576, 2016.
Citations (13)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com