Cause of coat rack segmentation difficulty under viewpoint shifts

Determine whether the consistently poor segmentation performance for the MVImgNet coat rack class under large viewpoint changes in the Hummingbird-based multi-view evaluation arises from intrinsic geometric properties of the coat rack (e.g., its extremely thin structure) or from dataset-specific factors (such as sampling or annotation issues).

Background

Within the multi-view segmentation benchmark, the coat rack class is the worst-performing across models and bins, likely due to its thin structure. Despite this hypothesis, the authors explicitly note uncertainty about whether the difficulty is inherent to the object’s geometry or caused by dataset-related factors.

Resolving this uncertainty would clarify whether improvements should target representation learning for thin, high-curvature objects or data curation and labeling quality for such categories.

References

It remains an open question whether this difficulty is intrinsic to the geometry or arises from dataset-specific factors.

Evaluating Foundation Models' 3D Understanding Through Multi-View Correspondence Analysis (2512.11574 - Lilova et al., 12 Dec 2025) in Appendix A, Experiment A: coat rack figure caption