Source-Free and Image-Only Unsupervised Domain Adaptation for Category Level Object Pose Estimation (2401.10848v1)
Abstract: We consider the problem of source-free unsupervised category-level pose estimation from only RGB images to a target domain without any access to source domain data or 3D annotations during adaptation. Collecting and annotating real-world 3D data and corresponding images is laborious, expensive, yet unavoidable process, since even 3D pose domain adaptation methods require 3D data in the target domain. We introduce 3DUDA, a method capable of adapting to a nuisance-ridden target domain without 3D or depth data. Our key insight stems from the observation that specific object subparts remain stable across out-of-domain (OOD) scenarios, enabling strategic utilization of these invariant subcomponents for effective model updates. We represent object categories as simple cuboid meshes, and harness a generative model of neural feature activations modeled at each mesh vertex learnt using differential rendering. We focus on individual locally robust mesh vertex features and iteratively update them based on their proximity to corresponding features in the target domain even when the global pose is not correct. Our model is then trained in an EM fashion, alternating between updating the vertex features and the feature extractor. We show that our method simulates fine-tuning on a global pseudo-labeled dataset under mild assumptions, which converges to the target domain asymptotically. Through extensive empirical validation, including a complex extreme UDA setup which combines real nuisances, synthetic noise, and occlusion, we demonstrate the potency of our simple approach in addressing the domain shift challenge and significantly improving pose estimation accuracy.
- Shapenet: An information-rich 3d model repository, 2015.
- Learning canonical shape space for category-level 6d object pose and size estimation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2020a. doi: 10.1109/cvpr42600.2020.01199. URL http://dx.doi.org/10.1109/cvpr42600.2020.01199.
- Kai Chen and Qi Dou. Sgpa: Structure-guided prior adaptation for category-level 6d object pose estimation. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2753–2762, 2021. URL https://api.semanticscholar.org/CorpusID:244129110.
- Category level object pose estimation via neural analysis-by-synthesis. Lecture Notes in Computer Science, pp. 139–156, 2020b. ISSN 1611-3349. doi: 10.1007/978-3-030-58574-7˙9. URL http://dx.doi.org/10.1007/978-3-030-58574-7_9.
- Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: A review, 2019.
- Siren: Shaping representations for detecting out-of-distribution objects. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp. 20434–20449. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/804dbf8d3b8eee1ef875c6857efc64eb-Paper-Conference.pdf.
- Category-level 6d object pose estimation in the wild: A semi-supervised learning approach and a new dataset, 2022.
- Zero-shot category-level object pose estimation, 2022.
- Mask r-cnn, 2018.
- Pvn3d: A deep point-wise 3d keypoints voting network for 6dof pose estimation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2020. doi: 10.1109/cvpr42600.2020.01165. URL http://dx.doi.org/10.1109/CVPR42600.2020.01165.
- Ffb6d: A full flow bidirectional fusion network for 6d pose estimation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2021. doi: 10.1109/cvpr46437.2021.00302. URL http://dx.doi.org/10.1109/CVPR46437.2021.00302.
- Towards self-supervised category-level object pose and size estimation, 2022.
- Benchmarking neural network robustness to common corruptions and perturbations, 2019.
- Repose: Fast 6d object pose refinement via deep texture rendering. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Oct 2021. doi: 10.1109/iccv48922.2021.00329. URL http://dx.doi.org/10.1109/ICCV48922.2021.00329.
- Differentiable rendering: A survey, 2020.
- Compositional convolutional neural networks: A deep architecture with innate robustness to partial occlusion, 2020.
- Uda-cope: Unsupervised domain adaptation for category-level object pose estimation, 2022.
- Tta-cope: Test-time adaptation for category-level object pose estimation, 2023.
- Dualposenet: Category-level 6d object pose and size estimation using dual pose network with refined learning of pose consistency. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Oct 2021. doi: 10.1109/iccv48922.2021.00354. URL http://dx.doi.org/10.1109/ICCV48922.2021.00354.
- Category-level 6d object pose and size estimation using self-supervised deep prior deformation networks, 2022.
- Unsupervised domain adaptation through inter-modal rotation for rgb-d object recognition. IEEE Robotics and Automation Letters, 5(4):6631–6638, Oct 2020. ISSN 2377-3774. doi: 10.1109/lra.2020.3007092. URL http://dx.doi.org/10.1109/LRA.2020.3007092.
- Robust category-level 6d pose estimation with coarse-to-fine rendering of neural features, 2022.
- Pose estimation for augmented reality: A hands-on survey. IEEE Transactions on Visualization and Computer Graphics, 22, 01 2016. doi: 10.1109/TVCG.2015.2513408.
- Eitan Marder-Eppstein. Project tango. In ACM SIGGRAPH 2016 Real-Time Live!, SIGGRAPH ’16, New York, NY, USA, 2016. Association for Computing Machinery. ISBN 9781450343787. doi: 10.1145/2933540.2933550. URL https://doi.org/10.1145/2933540.2933550.
- Directional Statistics. Wiley Series in Probability and Statistics. Wiley, 2009. ISBN 9780470317815. URL https://books.google.com/books?id=PTNiCm4Q-M0C.
- Pix2pose: Pixel-wise coordinate regression of objects for 6d pose estimation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Oct 2019. doi: 10.1109/iccv.2019.00776. URL http://dx.doi.org/10.1109/ICCV.2019.00776.
- 6-dof object pose from semantic keypoints. 2017 IEEE International Conference on Robotics and Automation (ICRA), May 2017a. doi: 10.1109/icra.2017.7989233. URL http://dx.doi.org/10.1109/ICRA.2017.7989233.
- 6-dof object pose from semantic keypoints. 2017 IEEE International Conference on Robotics and Automation (ICRA), May 2017b. doi: 10.1109/icra.2017.7989233. URL http://dx.doi.org/10.1109/ICRA.2017.7989233.
- Pvnet: Pixel-wise voting network for 6dof pose estimation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2019. doi: 10.1109/cvpr.2019.00469. URL http://dx.doi.org/10.1109/CVPR.2019.00469.
- Self-supervised category-level 6d object pose estimation with deep implicit shape representation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 2082–2090, 2022.
- Maskfusion: Real-time recognition, tracking and reconstruction of multiple moving objects. 2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Oct 2018. doi: 10.1109/ismar.2018.00024. URL http://dx.doi.org/10.1109/ISMAR.2018.00024.
- Suvrit Sra. A short note on parameter approximation for von mises-fisher distributions: And a fast implementation of i s(x). Computational Statistics, 27:177–190, 03 2012. doi: 10.1007/s00180-011-0232-x.
- Shape prior deformation for categorical 6d object pose and size estimation. Lecture Notes in Computer Science, pp. 530–546, 2020. ISSN 1611-3349. doi: 10.1007/978-3-030-58589-1˙32. URL http://dx.doi.org/10.1007/978-3-030-58589-1_32.
- Deep object pose estimation for semantic robotic grasping of household objects, 2018.
- Robust object detection under occlusion with context-aware compositionalnets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12645–12654, 2020.
- Nemo: Neural mesh models of contrastive features for robust 3d pose estimation, 2021a.
- Neural textured deformable meshes for robust analysis-by-synthesis, 2023.
- Densefusion: 6d object pose estimation by iterative dense fusion. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2019a. doi: 10.1109/cvpr.2019.00346. URL http://dx.doi.org/10.1109/CVPR.2019.00346.
- Normalized object coordinate space for category-level 6d object pose and size estimation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2019b. doi: 10.1109/cvpr.2019.00275. URL http://dx.doi.org/10.1109/CVPR.2019.00275.
- Category-level 6d object pose estimation via cascaded relation and recurrent reconstruction networks. 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sep 2021b. doi: 10.1109/iros51168.2021.9636212. URL http://dx.doi.org/10.1109/IROS51168.2021.9636212.
- Segicp: Integrated deep semantic segmentation and pose estimation. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sep 2017. doi: 10.1109/iros.2017.8206470. URL http://dx.doi.org/10.1109/IROS.2017.8206470.
- Beyond pascal: A benchmark for 3d object detection in the wild. In IEEE Winter Conference on Applications of Computer Vision, pp. 75–82, 2014. doi: 10.1109/WACV.2014.6836101.
- Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. Robotics: Science and Systems XIV, Jun 2018. doi: 10.15607/rss.2018.xiv.019. URL http://dx.doi.org/10.15607/RSS.2018.XIV.019.
- Robust category-level 3d pose estimation from synthetic data, 2023.
- Multi-view self-supervised deep learning for 6d pose estimation in the amazon picking challenge. 2017 IEEE International Conference on Robotics and Automation (ICRA), May 2017. doi: 10.1109/icra.2017.7989165. URL http://dx.doi.org/10.1109/ICRA.2017.7989165.
- Self-supervised geometric correspondence for category-level 6d object pose estimation in the wild, 2022.
- Ood-cv-v2: An extended benchmark for robustness to out-of-distribution shifts of individual nuisances in natural images, 2023.
- Starmap for category-agnostic keypoint and viewpoint estimation, 2018.
- Prakhar Kaushik (9 papers)
- Aayush Mishra (10 papers)
- Adam Kortylewski (73 papers)
- Alan Yuille (294 papers)