4D Association Graph for Realtime Multi-person Motion Capture Using Multiple Video Cameras (2002.12625v1)

Published 28 Feb 2020 in cs.CV

Abstract: This paper contributes a novel realtime multi-person motion capture algorithm using multiview video inputs. Due to the heavy occlusions in each view, joint optimization on the multiview images and multiple temporal frames is indispensable, which brings up the essential challenge of realtime efficiency. To this end, for the first time, we unify per-view parsing, cross-view matching, and temporal tracking into a single optimization framework, i.e., a 4D association graph that each dimension (image space, viewpoint and time) can be treated equally and simultaneously. To solve the 4D association graph efficiently, we further contribute the idea of 4D limb bundle parsing based on heuristic searching, followed with limb bundle assembling by proposing a bundle Kruskal's algorithm. Our method enables a realtime online motion capture system running at 30fps using 5 cameras on a 5-person scene. Benefiting from the unified parsing, matching and tracking constraints, our method is robust to noisy detection, and achieves high-quality online pose reconstruction quality. The proposed method outperforms the state-of-the-art method quantitatively without using high-level appearance information. We also contribute a multiview video dataset synchronized with a marker-based motion capture system for scientific evaluation.

PDF Abstract

Analyzing the 4D Association Graph for Realtime Multi-Person Motion Capture Using Multiple Video Cameras

In the presented paper, Zhang et al. propose a novel realtime multi-person motion capture (MMC) algorithm leveraging multiview video inputs. The critical challenge addressed in this research is the reconciliation of efficient real-time processing with the robustness required for high-quality data capture in complex environments with significant occlusions. To achieve this, the authors introduce a unified framework for per-view parsing, cross-view matching, and temporal tracking through the formulation of a 4D association graph that treats dimensions of image space, viewpoint, and time equally.

Algorithmic Framework

The cornerstone of this paper is the development of a 4D association graph designed to simultaneously address multiple dimensions of motion capture data. This graph integrates the following components:

Per-view parsing: Utilizing parsing edges to form skeleton joints within individual camera views.
Cross-view matching: Establishing correspondence between the same joints across different camera perspectives using matching edges.
Temporal tracking: Associating current joint detections with previous frame reconstructions via tracking edges.

The authors employ a heuristic approach to solve the graph, relying on efficient 4D limb bundle parsing followed by assembly using a novel bundle Kruskal's algorithm. This method facilitates real-time performance capable of processing at 30 frames per second across scenes with five people and five cameras.

Experimental Results

The algorithm demonstrates robust performance in various scenarios, including crowded scenes, occlusions, and complex human interactions. During quantitative evaluations on datasets such as Shelf and the authors' newly introduced dataset, the proposed method achieves superior accuracy relative to existing systems:

On the Shelf dataset, the approach attains a percentage of correct parts (PCP) of 97.6%.
The new evaluation dataset, featuring complex close interactions and challenging motion, corroborates the efficiency and precision of this graph-based method.

Implications and Future Work

From a practical perspective, this research can significantly enhance real-world applications by offering scalable MMC without markers, which is highly desirable in entertainment, sports analytics, and human-computer interaction domains. Theoretically, the unified graph approach may inspire future explorations into more complex, high-dimensional data associations in computer vision and machine learning.

For subsequent advancements, integration with advanced appearance-based models might further improve accuracy, especially in scenarios with less camera coverage or more intricate occlusion patterns. Exploration of graph neural networks could also provide a robust mechanism for learning complex feature interactions automatically, potentially lessening the reliance on heuristic limb parsing and matching.

Overall, this paper advances the field of motion capture by addressing computational efficiency while maintaining high fidelity in capturing and reconstructing human motion via novel graph-based methodologies.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Yuxiang Zhang (104 papers)
Liang An (9 papers)
Tao Yu (282 papers)
Xiu Li (166 papers)
Kun Li (193 papers)
Yebin Liu (115 papers)

Citations (73)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos