PNeSM: Arbitrary 3D Scene Stylization via Prompt-Based Neural Style Mapping (2403.08252v1)
Abstract: 3D scene stylization refers to transform the appearance of a 3D scene to match a given style image, ensuring that images rendered from different viewpoints exhibit the same style as the given style image, while maintaining the 3D consistency of the stylized scene. Several existing methods have obtained impressive results in stylizing 3D scenes. However, the models proposed by these methods need to be re-trained when applied to a new scene. In other words, their models are coupled with a specific scene and cannot adapt to arbitrary other scenes. To address this issue, we propose a novel 3D scene stylization framework to transfer an arbitrary style to an arbitrary scene, without any style-related or scene-related re-training. Concretely, we first map the appearance of the 3D scene into a 2D style pattern space, which realizes complete disentanglement of the geometry and appearance of the 3D scene and makes our model be generalized to arbitrary 3D scenes. Then we stylize the appearance of the 3D scene in the 2D style pattern space via a prompt-based 2D stylization algorithm. Experimental results demonstrate that our proposed framework is superior to SOTA methods in both visual quality and generalization.
- Exploring visual prompts for adapting large-scale models. arXiv preprint arXiv:2203.17274, 1(3): 4.
- Language models are few-shot learners. Advances in neural information processing systems, 33: 1877–1901.
- Psnet: A style transfer network for point cloud stylization on geometry and color. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer vision, 3337–3345.
- Coherent online video style transfer. In Proceedings of the IEEE International Conference on Computer Vision, 1105–1114.
- TeSTNeRF: text-driven 3D style transfer via cross-modal learning. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 5788–5796.
- Stylizing 3D scene via implicit representation and HyperNetwork. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 1475–1484.
- Template-Based Named Entity Recognition Using BART. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 1835–1845.
- Arbitrary video style transfer via multi-channel correlation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 1210–1217.
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR.
- Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision, 2758–2766.
- Unified implicit neural stylization. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XV, 636–654. Springer.
- Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5501–5510.
- Reconet: Real-time coherent video style transfer network. In Asian Conference on Computer Vision, 637–653. Springer.
- Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2414–2423.
- Stylemesh: Style transfer for indoor 3d scene reconstructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6198–6208.
- Learning to stylize novel views. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 13869–13878.
- Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision, 1501–1510.
- StylizedNeRF: consistent 3D scene stylization as stylized NeRF via 2D-3D mutual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18342–18352.
- Visual prompt tuning. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIII, 709–727. Springer.
- Perceptual losses for real-time style transfer and super-resolution. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, 694–711. Springer.
- Recognizing image style. arXiv preprint arXiv:1311.3715.
- Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG), 36(4): 1–13.
- Point-Based Neural Rendering with Per-View Optimization. In Computer Graphics Forum, volume 40, 29–43. Wiley Online Library.
- The Power of Scale for Parameter-Efficient Prompt Tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 3045–3059.
- Precomputed real-time texture synthesis with markovian generative adversarial networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14, 702–716. Springer.
- Prefix-Tuning: Optimizing Continuous Prompts for Generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 4582–4597.
- Universal style transfer via feature transforms. Advances in neural information processing systems, 30.
- StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8338–8348.
- Adaattn: Revisit attention mechanism in arbitrary neural style transfer. In Proceedings of the IEEE/CVF international conference on computer vision, 6649–6658.
- P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv preprint arXiv:2110.07602.
- Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics (TOG), 38(4): 1–14.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In European conference on computer vision, 405–421. Springer.
- 3d photo stylization: Learning to generate stylized novel views from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16273–16282.
- SNeRF: stylized neural implicit representations for 3D scenes. ACM Transactions on Graphics (TOG), 41(4): 1–11.
- Arbitrary style transfer with style-attentional networks. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 5880–5888.
- Language Models as Knowledge Bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2463–2473.
- Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5459–5469.
- Texture networks: Feed-forward synthesis of textures and stylized images.
- Consistent video style transfer via relaxation and regularization. IEEE Transactions on Image Processing, 29: 9125–9139.
- Neutex: Neural texture mapping for volumetric neural rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7119–7128.
- ARF: Artistic Radiance Fields.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, 586–595.
- Factual Probing Is [MASK]: Learning vs. Learning to Recall. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 5017–5033.
- Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16816–16825.
- Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9): 2337–2348.