Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robots That Can See: Leveraging Human Pose for Trajectory Prediction (2309.17209v1)

Published 29 Sep 2023 in cs.RO, cs.CV, cs.HC, and cs.LG

Abstract: Anticipating the motion of all humans in dynamic environments such as homes and offices is critical to enable safe and effective robot navigation. Such spaces remain challenging as humans do not follow strict rules of motion and there are often multiple occluded entry points such as corners and doors that create opportunities for sudden encounters. In this work, we present a Transformer based architecture to predict human future trajectories in human-centric environments from input features including human positions, head orientations, and 3D skeletal keypoints from onboard in-the-wild sensory information. The resulting model captures the inherent uncertainty for future human trajectory prediction and achieves state-of-the-art performance on common prediction benchmarks and a human tracking dataset captured from a mobile robot adapted for the prediction task. Furthermore, we identify new agents with limited historical data as a major contributor to error and demonstrate the complementary nature of 3D skeletal poses in reducing prediction error in such challenging scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. “Agentformer: Agent-aware transformers for socio-temporal multi-agent forecasting” In ICCV, 2021, pp. 9813–9823
  2. “Wayformer: Motion forecasting via simple & efficient attention networks” In arXiv:2207.05844, 2022
  3. “Scene Transformer: A unified architecture for predicting future trajectories of multiple agents” In ICLR OpenReview.net, 2022 URL: https://openreview.net/forum?id=Wm3EA5OlHsG
  4. “Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data” In ECCV, 2020, pp. 683–700 Springer
  5. “You’ll never walk alone: Modeling social behavior for multi-target tracking” In ICCV, 2009, pp. 261–268 IEEE
  6. Alon Lerner, Yiorgos Chrysanthou and Dani Lischinski “Crowds by example” In Computer graphics forum 26, 2007, pp. 655–664 Wiley Online Library
  7. “It is not the journey but the destination: Endpoint conditioned trajectory prediction” In ECCV, 2020, pp. 759–776 Springer
  8. “Sophie: An attentive gan for predicting paths compliant to social and physical constraints” In CVPR, 2019, pp. 1349–1358
  9. “From goals, waypoints & paths to long term human trajectory forecasting” In ICCV, 2021, pp. 15233–15242
  10. “The trajectron: Probabilistic multi-agent trajectory modeling with dynamic spatiotemporal graphs” In ICCV, 2019, pp. 2375–2384
  11. “On-Board Pedestrian Trajectory Prediction Using Behavioral Features” In arXiv:2210.11999, 2022
  12. “Rsn: Range sparse net for efficient, accurate lidar 3d object detection” In CVPR, 2021, pp. 5725–5734
  13. “HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for Autonomous Driving” In arXiv:2212.07729, 2022
  14. “Social gan: Socially acceptable trajectories with generative adversarial networks” In CVPR, 2018, pp. 2255–2264
  15. “Multimodal deep generative models for trajectory prediction: A conditional variational autoencoder approach” In RA-L 6.2 IEEE, 2020, pp. 295–302
  16. “Generative modeling of multimodal multi-human behavior” In IROS, 2018, pp. 3088–3095 IEEE
  17. “Context-aware human motion prediction” In CVPR, 2020, pp. 6992–7001
  18. “Dlow: Diversifying latent flows for diverse human motion prediction” In ECCV, 2020, pp. 346–364 Springer
  19. Yan Zhang, Michael J Black and Siyu Tang “We are more than our joints: Predicting how 3d bodies move” In CVPR, 2021, pp. 3372–3382
  20. Wei Mao, Miaomiao Liu and Mathieu Salzmann “History repeats itself: Human motion prediction via motion attention” In ECCV, 2020, pp. 474–489 Springer
  21. Tim Salzmann, Marco Pavone and Markus Ryll “Motron: Multimodal Probabilistic Human Motion Forecasting” In CVPR, 2022, pp. 6457–6466
  22. “SoMoFormer: Multi-Person Pose Forecasting with Transformers” In arXiv:2208.14023, 2022
  23. “End-to-end object detection with transformers” In ECCV, 2020, pp. 213–229 Springer
  24. “Proxemo: Gait-based emotion learning and multi-view proxemic fusion for socially-aware robot navigation” In IROS, 2020, pp. 8200–8207 IEEE
  25. “Future person localization in first-person videos” In CVPR, 2018, pp. 7593–7602
  26. Kai Chen, Xiao Song and Xiaoxiang Ren “Pedestrian Trajectory Prediction in Heterogeneous Traffic Using Pose Keypoints-Based Convolutional Encoder-Decoder Network” In TCSVT 31.5 IEEE, 2020, pp. 1764–1775
  27. “STPOTR: Simultaneous Human Trajectory and Pose Prediction Using a Non-Autoregressive Transformer for Robot Following Ahead” In arXiv:2209.07600, 2022
  28. “Tripod: Human trajectory and pose dynamics forecasting in the wild” In ICCV, 2021, pp. 13390–13400
  29. “OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields” In PAMI, 2019
  30. “Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time” In PAMI IEEE, 2022
  31. “YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss” In CVPR, 2022, pp. 2637–2646
  32. tensorflow.org “MoveNet: Ultra fast and accurate pose detection model.” In TensorFlow Google, 2022 URL: https://www.tensorflow.org/hub/tutorials/movenet
  33. tensorflow.org “Real-time human pose estimation in the browser with tensorflow.js” In The TensorFlow Blog Google, 2018 URL: https://blog.tensorflow.org/2018/05/real-time-human-pose-estimation-in.html
  34. “BlazePose GHUM Holistic: Real-time 3D Human Landmarks and Pose Estimation” In Computer Vision for AR/VR, 2022
  35. “Microsoft coco: Common objects in context” In Computer Vision–ECCV 2014: 13th European Conf., Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, 2014, pp. 740–755 Springer
  36. “MediaPipe Holistic — Simultaneous Face, Hand and Pose Prediction, on Device” In Google AI Blog Google, 2018 URL: hhttps://ai.googleblog.com/2020/12/mediapipe-holistic-simultaneous-face.html
  37. “Ghum & ghuml: Generative 3d human shape and articulated pose models” In CVPR, 2020, pp. 6184–6193
  38. “Attention is all you need” In NeurIPS 30, 2017
  39. “Set transformer: A framework for attention-based permutation-invariant neural networks” In Int. conference on machine learning, 2019, pp. 3744–3753 PMLR
  40. “Large Scale Interactive Motion Forecasting for Autonomous Driving: The Waymo Open Motion Dataset” In ICCV, 2021, pp. 9710–9719
  41. “nuscenes: A multimodal dataset for autonomous driving” In CVPR, 2020, pp. 11621–11631
  42. “Argoverse 2: Next Generation Datasets for Self-driving Perception and Forecasting” In NeurIPS Datasets and Benchmarks, 2021
  43. “One thousand and one hours: Self-driving motion prediction dataset” In Conf. on Robot Learning, 2021, pp. 409–418 PMLR
  44. “Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments” In PAMI 36.7 IEEE Computer Society, 2014, pp. 1325–1339
  45. “AMASS: Archive of Motion Capture as Surface Shapes” In ICCV, 2019, pp. 5442–5451
  46. “Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera” In ECCV, 2018
  47. “Jrdb: A dataset and benchmark of egocentric robot visual perception of humans in built environments” In PAMI IEEE, 2021
  48. “Jrmot: A real-time 3d multi-object tracker and a new large-scale dataset” In IROS, 2020, pp. 10335–10342 IEEE
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Tim Salzmann (8 papers)
  2. Lewis Chiang (1 paper)
  3. Markus Ryll (19 papers)
  4. Dorsa Sadigh (162 papers)
  5. Carolina Parada (11 papers)
  6. Alex Bewley (30 papers)
Citations (12)

Summary

We haven't generated a summary for this paper yet.