Papers
Topics
Authors
Recent
2000 character limit reached

HccePose(BF): Dual-Surface 6D Pose Estimation

Updated 30 October 2025
  • The paper introduces a dual-surface approach that constructs ultra-dense 2D-3D correspondences for enhanced 6D pose estimation using the PnP algorithm.
  • It presents a novel hierarchical continuous coordinate encoding (HCCE) mechanism that overcomes discretization issues of binary codes with continuous regression.
  • Experimental results on BOP datasets demonstrate significant improvements in AP accuracy, validating the method’s robustness against occlusions and partial visibility.

HccePose(BF) is a 6D pose estimation method for known rigid objects that utilizes dense 2D-3D correspondences constructed from both the front and back surfaces of the object, as well as interpolated interior points. This network produces ultra-dense correspondences for each visible foreground pixel in an RGB image, enhancing the accuracy of pose estimation via the Perspective-n-Point (PnP) algorithm. The hierarchical continuous coordinate encoding (HCCE) mechanism is integral to this approach and addresses previous learnability challenges in surface coordinate regression. HccePose(BF) demonstrates improved performance over existing state-of-the-art methods on seven well-established BOP core datasets (Wang et al., 11 Oct 2025).

1. Dual-Surface and Ultra-Dense 2D-3D Correspondence Prediction

HccePose(BF) reformulates the dense correspondence paradigm by predicting both the front and back 3D surface coordinates for each foreground image pixel. Let Q~f\tilde{Q}_f and Q~b\tilde{Q}_b denote the predicted front and back surface 3D coordinates, with associated 2D projections P~f=P~b\tilde{P}_f = \tilde{P}_b. This dual prediction provides explicit geometric constraints from both sides of the object, which is especially beneficial for disambiguating pose in the presence of occlusions or partial visibility.

To further exploit object volumetric structure, HccePose(BF) samples nn equidistant 3D points between Q~f\tilde{Q}_f and Q~b\tilde{Q}_b for each pixel: n=q~1q~22dˉn = \left\lfloor \frac{\Vert \tilde{q}_1 - \tilde{q}_2 \Vert_2}{\bar{d}} \right\rfloor where q~1,q~2\tilde{q}_1, \tilde{q}_2 are front and back coordinates and dˉ\bar{d} is the average nearest-neighbor 3D point distance. The interpolated points are computed as: s(q~1,q~2,a)=aq~1+(1a)q~2,a=tn+1, t=1ns(\tilde{q}_1, \tilde{q}_2, a) = a \tilde{q}_1 + (1-a)\tilde{q}_2, \quad a = \frac{t}{n+1},\ t=1\dots n All points (front, back, interpolated) share the same 2D location, yielding the ultra-dense sets P~u\tilde{P}_u and Q~u\tilde{Q}_u.

2. Hierarchical Continuous Coordinate Encoding (HCCE)

The HCCE mechanism supersedes the prior hierarchical binary code encodings (HBCE) for 3D surface coordinates. Binary encodings are less learnable by neural networks due to discontinuities (“stripe” artifacts), especially at fine quantization. HCCE addresses this by replacing each level of the hierarchical code with a continuous value for each coordinate, recursively defined:

  • Level 1: Cx1,k=f1(xk)=xkC_{x_{1,k}} = f_1(x_k) = x_k
  • For i>1i>1:

Cxi,k={fi1(2xk),xk<0.5 fi1(22xk),xk0.5C_{x_{i,k}} = \begin{cases} f_{i-1}(2x_k), & x_k < 0.5 \ f_{i-1}(2-2x_k), & x_k \geq 0.5 \end{cases}

The predicted code stack for each axis is decoded via level-wise thresholding g(t)g(t) to reconstruct the final binary code, which is then used to recover the normalized coordinate.

3. Network Architecture and Loss Functions

The network outputs 48 channels (24 per surface; 3 axes × 8 code levels) and an object mask. Mask error is minimized via LM=iMiM~i\mathcal{L}_M = \sum_i |M_i - \tilde{M}_i|, while the code regression uses a hierarchical loss with per-level, per-coordinate weighting. Error rates for each code level are monitored in real time via histograms: hf,x,i=exp(σmin(rf,x,i,0.5rf,x,i))h_{f,x,i} = \exp\left(\sigma \cdot \min(r_{f,x,i}, 0.5-r_{f,x,i})\right) where rf,x,ir_{f,x,i} is the error ratio. The total loss aggregates these weighted errors using separate histograms for each of x,y,zx, y, z and f/bf/b surfaces, enhancing optimization stability.

4. Pose Estimation Workflow and RANSAC-PnP Integration

From predicted ultra-dense correspondences (P~u,Q~u)(\tilde{P}_u, \tilde{Q}_u), HccePose(BF) uses the following inference pipeline:

  1. For each foreground pixel, retrieve its associated ultra-dense set of 3D coordinates (front, back, and interpolated).
  2. In RANSAC-based PnP, each hypothesis uses a sampled single 3D point per 2D pixel to maintain valid 2D-3D pairing.
  3. Among 150 RANSAC iterations, select the pose minimizing overall reprojection error.

This approach exploits information from the complete object geometry, surpassing previous methods limited to visible (front) surface mapping.

5. Experimental Evaluation and Ablation on BOP Core Datasets

HccePose(BF) is evaluated on LM-O, YCB-V, IC-BIN, TUD-L, HB, T-LESS, and ITODD benchmarks using the BOP protocol. Models use ResNet34 (ablations) or EfficientNet-B4 (SOTA comparison) as backbones. Separate networks are trained per object instance. PyOpenGL is employed for dual surface map label preparation. Each test instance is processed in approximately 30ms.

The method demonstrates superior performance:

Dataset ZebraPose (RGB) HccePose(BF) (RGB) GDRNPP (RGB-D) HccePose(BF) (RGB-D)
LM-O 72.9 75.5 79.2 80.5
T-LESS 82.1 85.6 87.2 87.9
TUD-L 85.0 86.9 93.6 94.4
IC-BIN 59.2 63.5 70.2 72.4
ITODD 50.4 54.2 58.8 73.4
HB 92.2 91.9 90.9 93.1
YCB-V 82.8 83.9 83.4 91.1
Mean 74.9 77.3 80.5 84.7

HccePose(BF) achieves an average +2.4% BOP AP improvement over ZebraPose for RGB-based estimation, with further robust gains under RGB-D evaluation (+4.7%) despite having trained only on RGB.

Ablations indicate that HCCE encoding provides up to +5.1% ADD(-S) accuracy improvement over hierarchical binary codes. Per-coordinate, multi-histogram hierarchical loss yields more consistent accuracy and stability than single-histogram or unweighted losses. Ultra-dense sampling between front and back surfaces delivers additive accuracy gains over front- or back-only correspondences (up to +2.0% BOP AP).

6. Significance and Impact

HccePose(BF) introduces a new paradigm for geometric correspondence generation by leveraging dual-surface coordinatization and ultra-dense 2D-3D mapping, establishing more robust geometric constraints for PnP-based 6D pose recovery. The move to hierarchical continuous coordinate encoding reduces the learning burden on neural architectures and abolishes discretization artifacts that hinder finer quantization. The impact of the method is evidenced by its consistent outperformance across diverse, challenging object datasets (e.g., symmetric/non-symmetric objects, textureless geometries). The approach is compatible with standard segmentation backbones and remains computationally efficient.

Code and trained model resources for HccePose(BF) are openly available at https://github.com/WangYuLin-SEU/HCCEPose.

7. Context, Limitations, and Research Directions

The ultra-dense inference regime enabled by HccePose(BF) assumes high-fidelity object meshes and aligned CAD models for dual-surface supervision. Each object requires a separate network, which, while standard in BOP protocol comparisons, limits plug-and-play application to novel categories. The method is evaluated on rigid, known-instance object settings focal to BOP datasets. A plausible implication is that generalizing the dual-surface and HCCE paradigm to class-level or category-level pose estimation remains an open challenge.

Recent advances in pose estimation have focused on refining 2D-to-3D triangulation or correcting annotation biases for human pose (e.g., PoseRN (Sayo et al., 2021)), but HccePose(BF) uniquely exploits full object geometry for rigid objects. The dual-surface dense approach represents a significant contribution toward higher-precision 6D pose recovery in challenging settings, with expected influence for subsequent research on dense correspondence and geometric encoding schemes.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to HccePose(BF).