Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CoMusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion (2305.12554v3)

Published 21 May 2023 in cs.CV and cs.LG

Abstract: Stochastic Human Motion Prediction (HMP) aims to predict multiple possible future human pose sequences from observed ones. Most prior works learn motion distributions through encoding-decoding in the latent space, which does not preserve motion's spatial-temporal structure. While effective, these methods often require complex, multi-stage training and yield predictions that are inconsistent with the provided history and can be physically unrealistic. To address these issues, we propose CoMusion, a single-stage, end-to-end diffusion-based stochastic HMP framework. CoMusion is inspired from the insight that a smooth future pose initialization improves prediction performance, a strategy not previously utilized in stochastic models but evidenced in deterministic works. To generate such initialization, CoMusion's motion predictor starts with a Transformer-based network for initial reconstruction of corrupted motion. Then, a graph convolutional network (GCN) is employed to refine the prediction considering past observations in the discrete cosine transformation (DCT) space. Our method, facilitated by the Transformer-GCN module design and a proposed variance scheduler, excels in predicting accurate, realistic, and consistent motions, while maintaining appropriate diversity. Experimental results on benchmark datasets demonstrate that CoMusion surpasses prior methods across metrics, while demonstrating superior generation quality. Our Code is released at https://github.com/jsun57/CoMusion/ .

Definition Search Book Streamline Icon: https://streamlinehq.com
References (85)
  1. A spatio-temporal transformer for 3d human motion prediction. In 3DV, pages 565–574, 2021.
  2. Listen, denoise, action! audio-driven motion synthesis with diffusion models. ACM Trans. Graph., 42(4):44:1–44:20, 2023.
  3. A stochastic conditioning scheme for diverse human motion prediction. In CVPR, pages 5222–5231. Computer Vision Foundation / IEEE, 2020.
  4. Belfusion: Latent diffusion for behavior-driven human motion prediction. In ICCV, 2023.
  5. HP-GAN: probabilistic 3d human motion prediction via GAN. In CVPR Workshops, pages 1418–1427, 2018.
  6. Accurate and diverse sampling of sequences based on a “best of many” sample objective. In CVPR, pages 8485–8493, 2018.
  7. Behavior-driven synthesis of human dynamics. In CVPR, pages 12236–12246. Computer Vision Foundation / IEEE, 2021.
  8. Motionmixer: Mlp-based 3d human body pose forecasting. In IJCAI, pages 791–798, 2022.
  9. Deep representation learning for human motion prediction and classification. In CVPR, pages 1591–1599, 2017.
  10. Learning progressive joint propagation for human motion prediction. In ECCV, pages 226–242, 2020.
  11. Humanmac: Masked motion completion for human motion prediction. In ICCV, 2023.
  12. Diffusion models in vision: A survey. CoRR, abs/2209.04747, 2022.
  13. Mofusion: A framework for denoising-diffusion-based motion synthesis. In CVPR, pages 9760–9770. IEEE, 2023.
  14. MSR-GCN: multi-scale residual graph convolution networks for human motion prediction. In ICCV, pages 11447–11456, 2021.
  15. Diverse human motion prediction via gumbel-softmax sampling from an auxiliary space. In MM, pages 5162–5171, 2022.
  16. Diffusion models beat gans on image synthesis. In NeurIPS, pages 8780–8794, 2021.
  17. Recurrent network models for human dynamics. In ICCV, pages 4346–4354, 2015.
  18. Generative adversarial nets. In NeurIPS, pages 2672–2680, 2014.
  19. Teaching robots to predict human motion. In IROS, pages 562–567, 2018.
  20. Multi-person extreme motion prediction. In CVPR, pages 13043–13054, 2022.
  21. Back to MLP: A simple baseline for human motion prediction. In WACV, pages 4798–4808, 2023.
  22. Social GAN: socially acceptable trajectories with generative adversarial networks. In CVPR, pages 2255–2264. Computer Vision Foundation / IEEE Computer Society, 2018.
  23. Deligan: Generative adversarial networks for diverse and limited data. In CVPR, pages 4941–4949, 2017.
  24. Robust motion in-betweening. ACM Trans. Graph., 39(4):60, 2020.
  25. Denoising diffusion probabilistic models. In NeurIPS, 2020.
  26. Fastdiff: A fast conditional diffusion model for high-quality speech synthesis. In IJCAI, pages 4157–4163, 2022.
  27. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. TPAMI, 36(7):1325–1339, 2014.
  28. Structural-rnn: Deep learning on spatio-temporal graphs. In CVPR, pages 5308–5317, 2016.
  29. Motiongpt: Human motion as a foreign language. CoRR, abs/2306.14795, 2023.
  30. Human-art: A versatile human-centric dataset bridging natural and artificial scenes. In CVPR, 2023.
  31. Adam: A method for stochastic optimization. In ICLR, 2015.
  32. Auto-encoding variational bayes. In ICLR, 2014.
  33. Variational diffusion models. CoRR, abs/2107.00630, 2021.
  34. VIBE: video inference for human body pose and shape estimation. In CVPR, pages 5252–5262. Computer Vision Foundation / IEEE, 2020.
  35. Diffwave: A versatile diffusion model for audio synthesis. In ICLR, 2021.
  36. Bihmp-gan: Bidirectional 3d human motion prediction GAN. In AAAI, pages 8553–8560, 2019.
  37. DESIRE: distant future prediction in dynamic scenes with interacting agents. In CVPR, pages 2165–2174. IEEE Computer Society, 2017.
  38. Convolutional sequence to sequence model for human dynamics. In CVPR, pages 5226–5234, 2018.
  39. Intention aware robot crowd navigation with attention-based interaction graph. In ICRA, 2023.
  40. SMPL: a skinned multi-person linear model. ACM Trans. Graph., 34(6):248:1–248:16, 2015.
  41. Dpm-solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. In NeurIPS, 2022a.
  42. Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. CoRR, abs/2211.01095, 2022b.
  43. Posegpt: Quantization-based 3d human motion generation and forecasting. In ECCV, 2022.
  44. Multi-objective diverse human motion prediction with knowledge distillation. In CVPR, pages 8151–8161, 2022a.
  45. Progressively generating better initial guesses towards next stages for high-quality human motion prediction. In CVPR, pages 6427–6436, 2022b.
  46. AMASS: archive of motion capture as surface shapes. In ICCV, pages 5441–5450, 2019.
  47. Learning trajectory dependencies for human motion prediction. In ICCV, pages 9488–9496, 2019.
  48. History repeats itself: Human motion prediction via motion attention. In ECCV, pages 474–489, 2020.
  49. Generating smooth pose sequences for diverse human motion prediction. In ICCV, pages 13289–13298, 2021a.
  50. Multi-level motion attention for human motion prediction. IJCV, 129(9):2513–2535, 2021b.
  51. On human motion prediction using recurrent neural networks. In CVPR, pages 4674–4683, 2017.
  52. Improved denoising diffusion probabilistic models. In ICML, pages 8162–8171, 2021.
  53. A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans. Intell. Veh., 1(1):33–55, 2016.
  54. Pytorch: An imperative style, high-performance deep learning library. NeurIPS, 32, 2019.
  55. Imitating human behaviour with diffusion models. In ICLR, 2023.
  56. High-resolution image synthesis with latent diffusion models. In CVPR, pages 10674–10685, 2022.
  57. U-net: Convolutional networks for biomedical image segmentation. In MICCAI, pages 234–241, 2015.
  58. Motron: Multimodal probabilistic human motion forecasting. In CVPR, pages 6447–6456, 2022.
  59. Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. IJCV, 87(1-2):4–27, 2010.
  60. Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, pages 2256–2265. JMLR.org, 2015.
  61. Denoising diffusion implicit models. In ICLR, 2021.
  62. Towards accurate human motion prediction via iterative refinement. CoRR, abs/2305.04443, 2023.
  63. An intelligent non-invasive real-time human activity recognition system for next-generation healthcare. Sensors, 20(9):2653, 2020.
  64. Human motion diffusion model. In ICLR, 2023.
  65. Nikolaus F Troje. Decomposing biological motion: A framework for analysis and synthesis of human gait patterns. Journal of vision, 2(5):2–2, 2002.
  66. EDGE: editable dance generation from music. In CVPR, pages 448–458. IEEE, 2023.
  67. Real time animation of virtual humans: A trade-off between naturalness and control. Comput. Graph. Forum, 29(8):2530–2554, 2010.
  68. Attention is all you need. In NeurIPS, pages 5998–6008, 2017.
  69. The pose knows: Video forecasting by generating pose futures. In ICCV, pages 3352–3361, 2017.
  70. Diffusion-gan: Training gans with diffusion. In ICLR, 2023.
  71. Human joint kinematics diffusion-refinement for stochastic motion prediction. In AAAI, pages 6110–6118, 2023.
  72. Deblurring via stochastic refinement. In CVPR, pages 16272–16282. IEEE, 2022.
  73. Tackling the generative learning trilemma with denoising diffusion gans. In ICLR, 2022.
  74. Diverse human motion prediction guided by multi-level spatial-temporal anchors. In ECCV, 2022.
  75. Interdiff: Generating 3d human-object interactions with physics-informed diffusion. In ICCV, 2023a.
  76. Stochastic multi-person 3d motion forecasting. In ICLR, 2023b.
  77. MT-VAE: learning motion transformations to generate multimodal human dynamics. In ECCV, pages 276–293, 2018.
  78. Neural interactive keypoint detection. In ICCV, pages 15122–15132, 2023.
  79. Dlow: Diversifying latent flows for diverse human motion prediction. In ECCV, pages 346–364, 2020a.
  80. Diverse trajectory forecasting with determinantal point processes. In ICLR, 2020b.
  81. Physdiff: Physics-guided human motion diffusion model. In ICCV, pages 16010–16021, 2023.
  82. Motiondiffuse: Text-driven human motion generation with diffusion model. CoRR, abs/2208.15001, 2022.
  83. Fast sampling of diffusion models with exponential integrator. In ICLR, 2023.
  84. We are more than our joints: Predicting how 3d bodies move. In CVPR, pages 3372–3382. Computer Vision Foundation / IEEE, 2021.
  85. Taming diffusion models for music-driven conducting motion generation. CoRR, abs/2306.10065, 2023.
Citations (2)

Summary

We haven't generated a summary for this paper yet.