Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Coherent Reconstruction of Multiple Humans from a Single Image (2006.08586v1)

Published 15 Jun 2020 in cs.CV

Abstract: In this work, we address the problem of multi-person 3D pose estimation from a single image. A typical regression approach in the top-down setting of this problem would first detect all humans and then reconstruct each one of them independently. However, this type of prediction suffers from incoherent results, e.g., interpenetration and inconsistent depth ordering between the people in the scene. Our goal is to train a single network that learns to avoid these problems and generate a coherent 3D reconstruction of all the humans in the scene. To this end, a key design choice is the incorporation of the SMPL parametric body model in our top-down framework, which enables the use of two novel losses. First, a distance field-based collision loss penalizes interpenetration among the reconstructed people. Second, a depth ordering-aware loss reasons about occlusions and promotes a depth ordering of people that leads to a rendering which is consistent with the annotated instance segmentation. This provides depth supervision signals to the network, even if the image has no explicit 3D annotations. The experiments show that our approach outperforms previous methods on standard 3D pose benchmarks, while our proposed losses enable more coherent reconstruction in natural images. The project website with videos, results, and code can be found at: https://jiangwenpl.github.io/multiperson

Citations (172)

Summary

  • The paper proposes a top-down regression framework using SMPL to coherently reconstruct multiple humans in 3D from a single image by addressing issues like interpenetration and incorrect depth ordering.
  • The methodology introduces novel collision and depth ordering-aware loss functions to ensure spatial separation and consistent depth relationships among reconstructed human meshes without explicit 3D depth annotations.
  • Experiments on standard 3D pose datasets demonstrate that the proposed method significantly improves coherence, depth ordering, and reduces interpenetration errors compared to existing baseline approaches.

Coherent Reconstruction of Multiple Humans from a Single Image

This paper introduces an approach for coherent reconstruction of multiple humans from a single image, emphasizing the need to maintain inter-person coherence while estimating individual 3D poses. The problem tackled by the authors is crucial for holistic scene understanding, especially in contexts where multiple humans are interacting, as incoherent predictions can lead to inconsistencies such as overlapping reconstructions and incorrect depth ordering.

The methodology relies on a top-down regression framework, employing the SMPL parametric model to simultaneously estimate poses and shapes of all detected individuals in an image. The approach leverages two novel loss functions designed to promote coherence in reconstruction: a collision loss and a depth ordering-aware loss. The collision loss penalizes interpenetrations among reconstructed human meshes, ensuring spatial separation when performing the multi-person prediction. On the other hand, the depth ordering-aware loss reasons about occlusions and establishes a depth order consistent with annotated instance segmentation, providing supervision related to the depths of the people without explicit 3D depth annotations.

The numerical results presented show that these additional constraints enable improved coherence in natural images. Experiments demonstrate superiority of the proposed method over baseline approaches in terms of both traditional 3D pose metrics and qualitative coherence evaluation. The methods are validated on standard 3D pose datasets such as Human3.6M, Panoptic, MPI-INF-3DHP, and Multi-Person 3D Pose (MuPoTS-3D), showcasing advancements in depth ordering and reduced interpenetration errors compared to existing methods.

Implications of this research are substantial in both practical and theoretical domains. Practically, coherent reconstruction of multiple humans finds applications in social interaction understanding, augmented reality, surveillance, and sports analysis, among others. Theoretically, the integration of interaction-aware constraints opens possibilities for future exploration in multi-agent scene reconstruction, potentially influencing developments in AI-driven scene understanding.

Speculative future developments in AI might focus on expanding this coherent approach to incorporate broader environmental context, such as objects and surfaces interacting with humans. Additionally, extending the paradigm to dynamic scenes (video data) and scaling to richer semantic understanding could enhance multi-human interaction modeling.

In summary, this paper provides a robust framework for addressing multi-person 3D pose estimation, focusing on maintaining coherence across multiple humans in a single-image setting, and paving the way for further research in scene-level human analysis.