Pose Stitching with Pose Maps

Updated 2 October 2025

The paper presents a unified framework that stitches fragmented pose maps into anatomically valid trajectories by leveraging temporal coherence and spatial context.
It employs dynamic programming and Pose Flow Non-Maximum Suppression (PF-NMS) to efficiently merge detections, achieving real-time tracking at over 10 FPS.
The technique finds broad application in continuous action recognition, human–object interaction, and sports analytics by ensuring robust temporal and spatial consistency.

Pose stitching with pose maps refers to a family of methodologies designed to assemble temporally or spatially fragmented pose representations (pose maps) into a coherent, continuous, and anatomically valid estimate of a subject’s pose. In articulated human and animal pose estimation, particularly in videos and complex imaging scenarios, pose stitching mitigates issues such as missing detections, occlusion, discontinuities, and noise by leveraging both spatial context and temporal coherence. This article systematically overviews the algorithmic principles, representative frameworks, key methodologies, optimization strategies, and application domains associated with pose stitching using pose maps, with all statements directly traceable to the peer-reviewed and preprint literature.

1. Foundations and Definitions

Pose maps are data structures encoding the spatial (2D or 3D) configuration of body landmarks, keypoints, or surface regions inferred from image data. In video or multi-view settings, these maps may be available only for subsets of frames, viewpoints, or body parts due to occlusion, motion blur, or detection errors.

Pose stitching is the process of associating, merging, and reconciling these partial pose maps—across time, across camera views, or across body segments—such that the resulting output forms a temporally continuous and spatially coherent sequence of full-body poses. This process requires robust methods for matching, fusing, and disambiguating pose hypotheses, often under challenging or unconstrained conditions (Xiu et al., 2018).

2. Temporal Pose Stitching via Online Optimization

In video-based multi-person pose tracking, pose stitching is achieved by building “pose flows,” which are temporally continuous trajectories of detected poses corresponding to a single person. The underlying principle is to link pose detections across frames by finding a sequence that maximizes an accumulated confidence measure:

Given a pose $P_i^t$ in frame $t$ , candidate associations in the next frame are

$\mathcal{T}(P_i^t) = \{ P \in \Omega_{t+1} \mid d_c(P, P_i^t) \leq \epsilon \}$

where $d_c$ is a distance metric incorporating spatial proximity and appearance similarity. The optimal pose flow maximizes

$F(t, T) = \max_{Q_t,\ldots,Q_{t+T}} \sum_{i=t}^{t+T} s(Q_i),$

where $s(Q_i)$ is a combination of box and keypoint confidence scores, and $Q_j \in \mathcal{T}(Q_{j-1})$ .

A dynamic programming approach is used, executed online (frame-by-frame), with a stopping criterion based on the incremental gain in confidence score. This method produces temporally “stitched” pose maps able to withstand occlusions and detection gaps, thus ensuring robust human pose tracking at real-time frame rates (approximately 10 FPS with negligible computational overhead) (Xiu et al., 2018).

3. Redundancy Pruning and Pose Fusion: Pose Flow Non-Maximum Suppression

After candidate pose flows have been constructed, redundancy and fragmentation are addressed by Pose Flow Non-Maximum Suppression (PF-NMS), which operates on entire pose flows rather than frame-level detections. The similarity between two pose flows $\mathcal{Y}_a, \mathcal{Y}_b$ is measured by aggregating per-frame pose distances over overlapping frames:

$d_{PF}(\mathcal{Y}_a, \mathcal{Y}_b) = \text{median} \{ d_f(P_a^k, P_b^k) \}$

where $d_f$ accounts for both spatial and confidence-based discrepancies between pose maps.

Close flows are merged via weighted averaging of keypoints:

$\hat{x}_{t,i} = \frac{\sum_j s_{t,i}^j x_{t,i}^j}{\sum_j s_{t,i}^j}, \qquad \hat{s}_{t,i} = \frac{\sum_j s_{t,i}^j}{\sum_j \mathbb{1}(s_{t,i}^j)}$

ensuring robust output even in the presence of outlier frames. The result is a single, temporally continuous pose map—effectively a “stitched” trajectory representing an individual across time, with redundancy and fragmentation eliminated (Xiu et al., 2018).

4. Experimental Outcomes and Comparative Metrics

Substantial improvements have been reported when employing pose flows and PF-NMS. On large benchmark datasets, such as PoseTrack and PoseTrack Challenge, the methodology resulted in gains of 13.5 mAP and 25.4 MOTA over previous methods. On the PoseTrack Challenge validation set, state-of-the-art results were obtained, with 58.3% MOTA and 66.5% mAP. These metrics reflect advances not only in pose detection accuracy (mean average precision, mAP) but also in trajectory integrity (Multiple Object Tracking Accuracy, MOTA), directly evidencing the effectiveness of pose map stitching techniques for temporal coherence in tracking scenarios (Xiu et al., 2018).

5. Algorithmic Efficiency and Real-Time Considerations

The described pose stitching methodologies are computationally efficient by design. The total additional burden when augmenting a frame-based pose detector with pose flow building and PF-NMS is approximately 100 ms per frame, enabling the full pipeline—including both detection and stitching—to exceed 10 FPS. This supports deployment in real-time video-analytics and surveillance applications, where minimal latency is critical (Xiu et al., 2018).

6. Relevance to Broader Applications

Robust pose stitching via pose maps is central to a variety of downstream applications:

Continuous action recognition, where coherent tracking of pose over time directly improves temporal labeling accuracy.
Human–object interaction analysis, which requires the maintenance of full-body pose continuity even across complex activities with occlusion.
Person re-identification and scene understanding, leveraging stitched pose maps to reduce false positives and support robust individual trajectory analysis.
Sports analytics and anomaly detection, where high-fidelity pose trajectories are essential for quantifying performance or identifying uncommon behaviors.

These applications all benefit from the temporal and spatial consistency guarantees provided by algorithmic pose stitching strategies (Xiu et al., 2018).

7. Conclusion and Outlook

Pose stitching with pose maps advances multi-frame or multi-part pose estimation by enforcing continuity, coherence, and robustness through principled optimization frameworks and sequence-level redundancy reduction. By addressing the fundamental problem of fragmented or noisy pose detections—particularly in the presence of occlusion, motion blur, or misdetections—these methods enable practical, real-time deployment of pose-driven analytics in unconstrained environments. The methodologies outlined underpin current state-of-the-art systems for video pose tracking, affirming the centrality of pose stitching for modern computer vision pipelines (Xiu et al., 2018).

PDF Markdown Chat (Pro)

References (1)

Pose Flow: Efficient Online Pose Tracking (2018)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Pose Stitching with Pose Maps.