Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Purposer: Putting Human Motion Generation in Context (2404.12942v1)

Published 19 Apr 2024 in cs.CV

Abstract: We present a novel method to generate human motion to populate 3D indoor scenes. It can be controlled with various combinations of conditioning signals such as a path in a scene, target poses, past motions, and scenes represented as 3D point clouds. State-of-the-art methods are either models specialized to one single setting, require vast amounts of high-quality and diverse training data, or are unconditional models that do not integrate scene or other contextual information. As a consequence, they have limited applicability and rely on costly training data. To address these limitations, we propose a new method ,dubbed Purposer, based on neural discrete representation learning. Our model is capable of exploiting, in a flexible manner, different types of information already present in open access large-scale datasets such as AMASS. First, we encode unconditional human motion into a discrete latent space. Second, an autoregressive generative model, conditioned with key contextual information, either with prompting or additive tokens, and trained for next-step prediction in this space, synthesizes sequences of latent indices. We further design a novel conditioning block to handle future conditioning information in such a causal model by using a network with two branches to compute separate stacks of features. In this manner, Purposer can generate realistic motion sequences in diverse test scenes. Through exhaustive evaluation, we demonstrate that our multi-contextual solution outperforms existing specialized approaches for specific contextual information, both in terms of quality and diversity. Our model is trained with short sequences, but a byproduct of being able to use various conditioning signals is that at test time different combinations can be used to chain short sequences together and generate long motions within a context scene.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Text2action: Generative adversarial synthesis from language to action. In ICRA, 2018.
  2. Structured prediction helps 3d human motion modelling. In ICCV, 2019.
  3. TEACH: Temporal Action Compositions for 3D Humans. In 3DV, 2022.
  4. Norman Badler. Temporal scene analysis: Conceptual descriptions of object movements. In PhD thesis, University of Toronto, 1975.
  5. Simulating humans: Computer graphics animation and control. In Oxford University Press, 1993.
  6. A note on the inception score. arXiv preprint arXiv:1801.01973, 2018.
  7. Hp-gan: Probabilistic 3d human motion prediction via gan. In CVPRW, 2018.
  8. Language models are few-shot learners. NeurIPS, 2020.
  9. D-grasp: Physically plausible dynamic grasp synthesis for hand-object interactions. In CVPR, 2022.
  10. Context-aware human motion prediction. In CVPR, 2020.
  11. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In CVPR, 2017.
  12. Synthesis of compositional animations from textual descriptions. In CVPR, 2021.
  13. Action2motion: Conditioned generation of 3d human motions. In ACMMM, 2020.
  14. Tm2t: Stochastic and tokenized modeling for the reciprocal generation of 3d human motions and texts. In ECCV, 2022.
  15. A recurrent variational autoencoder for human motion synthesis. In BMVC, 2017.
  16. Resolving 3D human pose ambiguities with 3D scene constraints. In ICCV, 2019.
  17. Stochastic scene-aware motion prediction. In ICCV, 2021a.
  18. Populating 3D scenes by learning human-scene interaction. In CVPR, 2021b.
  19. Synthesizing physical character-scene interactions. In SIGGRAPH, 2023.
  20. Gans trained by a two time-scale update rule converge to a nash equilibrium. In NeurIPS, 2017.
  21. Diffusion-based generation, optimization, and planning in 3d scenes. CVPR, 2023.
  22. Dancing to music. NeurIPS, 2019.
  23. Learn to dance with aist++: Music conditioned 3D dance generation. arXiv preprint arXiv:2101.08779, 2021.
  24. Generating animated videos of human activities from natural language descriptions. In Visually Grounded Interaction and Language Workshop at NeurIPS, 2018.
  25. Human motion modeling using dvgans. arXiv preprint arXiv:1804.10652, 2018.
  26. Adaptive density estimation for generative models. In NeurIPS, 2019.
  27. PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting. In ECCV, 2022.
  28. Embodied scene-aware human pose estimation. In NeurIPS, 2022.
  29. Amass: Archive of motion capture as surface shapes. In ICCV, 2019.
  30. Weakly-supervised action transition learning for stochastic human motion prediction. In CVPR, 2022.
  31. Reliable fidelity and diversity metrics for generative models. In ICML, 2020.
  32. Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155, 2022.
  33. Expressive body capture: 3d hands, face, and body from a single image. In CVPR, 2019.
  34. Action-conditioned 3d human motion synthesis with transformer vae. In ICCV, 2021.
  35. TEMOS: Generating diverse human motions from textual descriptions. In ECCV, 2022.
  36. BABEL: Bodies, action and behavior with english labels. In CVPR, 2021.
  37. Pointnet: Deep learning on point sets for 3d classification and segmentation. CVPR, 2017.
  38. Humor: 3d human motion model for robust pose estimation. ICCV, 2021.
  39. Trace and pace: Controllable pedestrian animation via guided trajectory diffusion. In CVPR, 2023.
  40. Human motion diffusion as a generative prior, 2023.
  41. Hulc: 3d human motion capture with pose manifold sampling and dense contact guidance. In ECCV, 2022.
  42. How good is my gan? In ECCV, 2018.
  43. Bailando: 3d dance generation by actor-critic gpt with choreographic memory. In CVPR, 2022.
  44. Neural state machine for character-scene interactions. In ACM ToG, 2019.
  45. GOAL: Generating 4D whole-body motion for hand-object grasping. In CVPR, 2022.
  46. Modeling human motion using binary latent variables. NeurIPS, 2006.
  47. Motionclip: Exposing human motion generation to clip space. arXiv preprint arXiv:2203.08063, 2022.
  48. Human motion diffusion model. In ICLR, 2023.
  49. Body size and depth disambiguation in multi-person reconstruction from single images. In 2021 International Conference on 3D Vision (3DV), pages 53–63. IEEE, 2021.
  50. Modeling human locomotion with topologically constrained latent variable models. In Workshop on Human Motion, 2007.
  51. Neural discrete representation learning. In ICML, 2018.
  52. Synthesizing long-term 3d human motion and interaction in 3d scenes. In CVPR, 2021.
  53. Towards diverse and natural scene-aware 3d human motion synthesis. In CVPR, 2022a.
  54. HUMANISE: Language-conditioned human motion generation in 3d scenes. In NeurIPS, 2022b.
  55. Saga: Stochastic whole-body grasping with contact. In ECCV, 2022.
  56. Dlow: Diversifying latent flows for diverse human motion prediction. In ECCV, 2020.
  57. T2m-gpt: Generating human motion from textual descriptions with discrete representations. arXiv preprint arXiv:2301.06052, 2023.
  58. PLACE: Proximity learning of articulation and contact in 3D environments. In 3DV, 2020a.
  59. Learning motion priors for 4d human body capture in 3d scenes. In ICCV, 2021a.
  60. Couch: Towards controllable human-chair interactions. In ECCV, 2022.
  61. The wanderings of odysseus in 3d scenes. In CVPR, 2022.
  62. Generating 3d people in scenes without people. In CVPR, 2020b.
  63. We are more than our joints: Predicting how 3D bodies move. In CVPR, 2021b.
  64. Compositional human-scene interaction synthesis with semantic control. In ECCV, 2022.
  65. Synthesizing diverse human motions in 3d indoor scenes. In ICCV, 2023.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com