Papers
Topics
Authors
Recent
Search
2000 character limit reached

Connecting NeRFs, Images, and Text

Published 11 Apr 2024 in cs.CV | (2404.07993v1)

Abstract: Neural Radiance Fields (NeRFs) have emerged as a standard framework for representing 3D scenes and objects, introducing a novel data type for information exchange and storage. Concurrently, significant progress has been made in multimodal representation learning for text and image data. This paper explores a novel research direction that aims to connect the NeRF modality with other modalities, similar to established methodologies for images and text. To this end, we propose a simple framework that exploits pre-trained models for NeRF representations alongside multimodal models for text and image processing. Our framework learns a bidirectional mapping between NeRF embeddings and those obtained from corresponding images and text. This mapping unlocks several novel and useful applications, including NeRF zero-shot classification and NeRF retrieval from images or text.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Neural processing of tri-plane hybrid neural fields. In The Twelfth International Conference on Learning Representations, 2024.
  2. Tensorf: Tensorial radiance fields. In European Conference on Computer Vision (ECCV), 2022.
  3. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020a.
  4. Uniter: Universal image-text representation learning. In European conference on computer vision, pages 104–120. Springer, 2020b.
  5. Scannerf: a scalable benchmark for neural radiance fields. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 816–825, 2023a.
  6. Deep learning on implicit neural representations of shapes. In International Conference on Learning Representations (ICLR), 2023b.
  7. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  8. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  9. From data to functa: Your data point is a function and you can treat it like one. In International Conference on Machine Learning, pages 5694–5725. PMLR, 2022.
  10. Plenoxels: Radiance fields without neural networks. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  11. Imagebind: One embedding space to bind them all. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15180–15190, 2023.
  12. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
  13. Robert Hecht-Nielsen. On the algebraic structure of feedforward network weight spaces. In Advanced Neural Computers, pages 129–135. Elsevier, 1990.
  14. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
  15. Nerf-rpn: A general framework for object detection in nerfs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23528–23538, 2023.
  16. Generating adversarial examples with graph neural networks. In Uncertainty in Artificial Intelligence, pages 1556–1564. PMLR, 2021.
  17. Parameter prediction for unseen deep architectures. In Advances in Neural Information Processing Systems, 2021.
  18. Graph neural networks for learning equivariant representations of neural networks. In The Twelfth International Conference on Learning Representations, 2023.
  19. Align before fuse: Vision and language representation learning with momentum distillation. Advances in neural information processing systems, 34:9694–9705, 2021.
  20. BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023.
  21. Graph metanetworks for processing diverse neural architectures. In The Twelfth International Conference on Learning Representations, 2024.
  22. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  23. Neural network branching for neural network verification. In International Conference on Learning Representations, 2020.
  24. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in neural information processing systems, 32, 2019.
  25. Unified-io: A unified model for vision, language, and multi-modal tasks. In The Eleventh International Conference on Learning Representations, 2022.
  26. Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7210–7219, 2021.
  27. Nerf: Representing scenes as neural radiance fields for view synthesis. In European conference on computer vision, pages 405–421. Springer, 2020.
  28. Nerf in the dark: High dynamic range view synthesis from noisy raw images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16190–16199, 2022.
  29. Ansh Mittal. Neural radiance fields: Past, present, and future. arXiv preprint arXiv:2304.10050, 2023.
  30. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, 2022.
  31. Equivariant architectures for learning in deep weight spaces. In International Conference on Machine Learning, 2023.
  32. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE International Conference on Computer Vision, pages 1406–1415, 2019.
  33. Dreamfusion: Text-to-3d using 2d diffusion. In The Eleventh International Conference on Learning Representations, 2022.
  34. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021a.
  35. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021b.
  36. Deep learning on 3d neural fields. arXiv preprint arXiv:2312.13277, 2023.
  37. Self-supervised representation learning on neural network weights for model characteristic prediction. In Advances in Neural Information Processing Systems, 2021.
  38. Improved generalization of weight space networks via augmentations. arXiv preprint arXiv:2402.04081, 2024.
  39. Flava: A foundational language and vision alignment model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15638–15650, 2022.
  40. Super-convergence: Very fast training of neural networks using large learning rates. In Artificial intelligence and machine learning for multi-domain operations applications, page 1100612. International Society for Optics and Photonics, 2019.
  41. Vl-bert: Pre-training of generic visual-linguistic representations. arXiv preprint arXiv:1908.08530, 2019.
  42. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5459–5469, 2022.
  43. Predicting neural network accuracy from weights. arXiv, abs/2002.11448, 2020.
  44. Git: A generative image-to-text transformer for vision and language. Transactions on Machine Learning Research, 2022a.
  45. Ofa: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework. In International Conference on Machine Learning, pages 23318–23340. PMLR, 2022b.
  46. One-peace: Exploring one general representation model toward unlimited modalities. arXiv preprint arXiv:2305.11172, 2023a.
  47. Image as a foreign language: Beit pretraining for vision and vision-language tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19175–19186, 2023b.
  48. Simvlm: Simple visual language model pretraining with weak supervision. In International Conference on Learning Representations, 2021.
  49. NExt-GPT: Any-to-any multimodal LLM, 2024.
  50. Disn: Deep implicit surface network for high-quality single-view 3d reconstruction. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2019.
  51. Nerf-supervision: Learning dense object descriptors from neural radiance fields. In 2022 international conference on robotics and automation (ICRA), pages 6496–6503. IEEE, 2022.
  52. Deep learning on 3D neural fields. arXiv preprint arXiv:2312.13277, 2023.
  53. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
  54. Vinvl: Revisiting visual representations in vision-language models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5579–5588, 2021.
  55. Permutation equivariant neural functionals. Advances in neural information processing systems, 37, 2023a.
  56. Neural functional transformers. Advances in neural information processing systems, 37, 2023b.
  57. Universal neural functionals. arXiv preprint arXiv:2402.05232, 2024.
Citations (2)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.