Physical Property Understanding from Language-Embedded Feature Fields (2404.04242v1)
Abstract: Can computers perceive the physical properties of objects solely through vision? Research in cognitive science and vision science has shown that humans excel at identifying materials and estimating their physical properties based purely on visual appearance. In this paper, we present a novel approach for dense prediction of the physical properties of objects using a collection of images. Inspired by how humans reason about physics through vision, we leverage LLMs to propose candidate materials for each object. We then construct a language-embedded point cloud and estimate the physical properties of each 3D point using a zero-shot kernel regression approach. Our method is accurate, annotation-free, and applicable to any object in the open world. Experiments demonstrate the effectiveness of the proposed approach in various physical property reasoning tasks, such as estimating the mass of common objects, as well as other properties like friction and hardness.
- Estimating cover crop biomass. https://www.nrcs.usda.gov/sites/default/files/2022-09/EstBiomassCoverCrops_Sept2018.pdf. Accessed: 2023-11-17.
- Edward H Adelson. On seeing stuff: the perception of materials by humans and machines. In Human vision and electronic imaging VI. SPIE, 2001.
- Intrinsic images in the wild. SIGGRAPH, 2014.
- Material recognition in the wild with the materials in context database. In CVPR, 2015.
- Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
- Open-vocabulary queryable scene representations for real world planning. In ICRA, 2023.
- Abo: Dataset and benchmarks for real-world 3d object understanding. In CVPR, 2022.
- Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
- Roland W. Fleming. Visual perception of materials and their properties. Vision Research, 2014.
- Perceptual qualities and material classes. Journal of vision, 2013.
- Learning visual predictive models of physics for playing billiards. ICLR, 2016.
- Clip on wheels: Zero-shot object navigation as object localization and exploration. arXiv preprint arXiv:2203.10421, 2022.
- Deep residual learning for image recognition. In CVPR, 2016.
- 3d concept learning and reasoning from multi-view images. In CVPR, 2023.
- OpenCLIP, 2021. https://github.com/mlfoundations/open_clip.
- Conceptfusion: Open-set multimodal 3d mapping. arXiv preprint arXiv:2302.07241, 2023.
- LERF: Language embedded radiance fields. In ICCV, 2023.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Language-driven semantic segmentation. In ICLR, 2022a.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023.
- PAC-NeRF: Physics augmented continuum neural radiance fields for geometry-agnostic system identification. In ICLR, 2022b.
- Visual instruction tuning. In NeurIPS, 2023a.
- Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499, 2023b.
- Text2mesh: Text-driven neural stylization for meshes. In CVPR, 2022.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- Uncertainty evaluation of shore hardness testers. Measurement, 33(3):251–257, 2003.
- Chils: Zero-shot image classification with hierarchical label sets. In ICML, 2023.
- Openscene: 3d scene understanding with open vocabularies. arXiv preprint arXiv:2211.15654, 2022.
- The curious robot: Learning visual representations via physical interactions. In ECCV, pages 3–18. Springer, 2016.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- Language embedded radiance fields for zero-shot task-oriented grasping. In CoRL, 2023.
- Clip-fields: Weakly supervised semantic fields for robotic memory. arXiv preprint arXiv:2210.05663, 2022.
- Material perception: What can you see in a brief glance? Journal of Vision, 2009.
- Recognizing materials using perceptually inspired features. IJCV, 2013.
- image2mass: Estimating the mass of an object from its image. In CoRL, 2017.
- Nerfstudio: A modular framework for neural radiance field development. In SIGGRAPH, 2023.
- A statistical approach to material classification using image patch exemplars. TPAMI, 2008.
- Galileo: Perceiving physical object properties by integrating a physics engine with deep learning. In NeurIPS, 2015.
- Physics 101: Learning physical object properties from unlabeled videos. In BMVC, 2016.
- Casa: Category-agnostic skeletal animal reconstruction. NeurIPS, 2022.
- The dawn of lmms: Preliminary explorations with gpt-4v (ision). arXiv preprint arXiv:2309.17421, 9, 2023.
- Estimating tactile models of heterogeneous deformable objects in real time. In ICRA, 2023.
- A comprehensive capability analysis of gpt-3 and gpt-3.5 series models. arXiv preprint arXiv:2303.10420, 2023.
- Interpreting dynamic scenes by a physics engine and bottom-up visual cues.
- Albert J. Zhai (6 papers)
- Yuan Shen (72 papers)
- Emily Y. Chen (6 papers)
- Gloria X. Wang (1 paper)
- Xinlei Wang (21 papers)
- Sheng Wang (239 papers)
- Kaiyu Guan (7 papers)
- Shenlong Wang (70 papers)