Instance-specific object search capability of CLIP-Fields
Determine whether CLIP-Fields, which constructs a 3D semantic map by aligning CLIP image encoder features to point-cloud features, can accurately search for the location of a specific object instance in a 3D environment when given a query image, as required by the Instance-Specific Image Goal Navigation task, rather than only providing coarse localization of the image capture location.
References
In addition, Shafiullah~et~al. suggests that it is possible to coarsely identify the location where a given image query was taken using CLIP-Fields. However, whether it is possible to search for the location of a specific object as required by InstanceImageNav has not been verified.
— Object Instance Retrieval in Assistive Robotics: Leveraging Fine-Tuned SimSiam with Multi-View Images Based on 3D Semantic Map
(2404.09647 - Sakaguchi et al., 15 Apr 2024) in Section 3.2 Object Retrieval (Related Work)