Refinement Module based on Parse Graph of Feature Map for Human Pose Estimation (2501.11069v6)
Abstract: The parse graph play a crucial role in enhancing the performance of human pose estimation (HPE). Its key advantage lies in its hierarchical structure, like a tree structure, and context relations among nodes, enabling more accurate for inference. To equip models with the advantage of parse graphs, many researchers design frameworks based on the parse graph of body structure for HPE. However, these frameworks struggle with deviations from predefined parse graphs and incur high parameter counts. Unlike them, we view the feature map holistically, much like the human body. It can be optimized using parse graphs, where nodes' implicit feature representation boosts adaptability, avoiding rigid structural limitations. In this paper, we design the Refinement Module based on the Parse Graph of feature map (RMPG), comprising two stages: top-down decomposition and bottom-up combination. The first stage constructs feature maps into a tree structure via recursive decomposition, where each node represents a sub-feature map for hierarchical feature modeling. The second stage calculates context information and recursively connects sub-feature maps with context information to gradually build a refined feature map. Additionally, we design a hierarchical network with fewer parameters using multiple RMPG modules to model the context relations and hierarchies in the parse graph of body structure for HPE. Our network achieves excellent results on major HPE benchmarks and the effectiveness of RMPG is proven on different methods. RMPG code will be available.