SAIR: Learning Semantic-aware Implicit Representation (2310.09285v1)
Abstract: Implicit representation of an image can map arbitrary coordinates in the continuous domain to their corresponding color values, presenting a powerful capability for image reconstruction. Nevertheless, existing implicit representation approaches only focus on building continuous appearance mapping, ignoring the continuities of the semantic information across pixels. As a result, they can hardly achieve desired reconstruction results when the semantic information within input images is corrupted, for example, a large region misses. To address the issue, we propose to learn semantic-aware implicit representation (SAIR), that is, we make the implicit representation of each pixel rely on both its appearance and semantic information (\eg, which object does the pixel belong to). To this end, we propose a framework with two modules: (1) building a semantic implicit representation (SIR) for a corrupted image whose large regions miss. Given an arbitrary coordinate in the continuous domain, we can obtain its respective text-aligned embedding indicating the object the pixel belongs. (2) building an appearance implicit representation (AIR) based on the SIR. Given an arbitrary coordinate in the continuous domain, we can reconstruct its color whether or not the pixel is missed in the input. We validate the novel semantic-aware implicit representation method on the image inpainting task, and the extensive experiments demonstrate that our method surpasses state-of-the-art approaches by a significant margin.
- Latr: Layout-aware transformer for scene-text vqa, 2022.
- Visual prompting via image inpainting. Neurips, 35:25005–25017, 2022.
- Nerv: Neural representations for videos. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), NIPS, volume 34, pp. 21557–21568. Curran Associates, Inc., 2021a.
- Learning continuous image representation with local implicit image function. In CVPR, pp. 8628–8638, 2021b.
- Learning implicit fields for generative shape modeling. In CVPR, pp. 5939–5948, 2019.
- Cross-image context for single image inpainting. Neurips, 35:1474–1487, 2022.
- Generalised implicit neural representations. arXiv preprint arXiv:2205.15674, 2022.
- Jpgnet: Joint predictive filtering and generative network for image inpainting. In Proceedings of the 29th ACM International Conference on Multimedia, pp. 386–394, 2021.
- Versatile neural processes for learning implicit neural representations. arXiv preprint arXiv:2301.08883, 2023.
- Disco: Adversarial defense with local implicit functions. arXiv preprint arXiv:2212.05630, 2022.
- Capturing implicit hierarchical structure in 3d biomedical images with self-supervised hyperbolic representations. Neurips, 34:5112–5123, 2021.
- Segment anything. arXiv:2304.02643, 2023.
- Maskgan: Towards diverse and interactive facial image manipulation. In CVPR, 2020.
- Local texture estimator for implicit representation function. In CVPR, pp. 1929–1938, June 2022.
- Recurrent feature reasoning for image inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7760–7768, 2020.
- Mat: Mask-aware transformer for large hole image inpainting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10758–10768, 2022a.
- Misf: Multi-level interactive siamese filtering for high-fidelity image inpainting. In CVPR, pp. 1869–1878, 2022b.
- Stylet2i: Toward compositional and high-fidelity text-to-image synthesis. In CVPR, pp. 18197–18207, 2022c.
- Uncertainty-aware semantic guidance and estimation for image inpainting. IEEE Journal of Selected Topics in Signal Processing, 15(2):310–323, 2020.
- Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 136–144, 2017.
- Revive: Regional visual representation matters in knowledge-based visual question answering. arXiv preprint arXiv:2206.01201, 2022.
- Image inpainting for irregular holes using partial convolutions. In ECCV, pp. 85–100, 2018.
- Deep learning face attributes in the wild. In ICCV, December 2015.
- Image segmentation using text and image prompts. In CVPR, pp. 7086–7096, 2022.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
- Edgeconnect: Structure guided image inpainting using edge prediction. In Proceedings of the IEEE/CVF international conference on computer vision workshops, pp. 0–0, 2019.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
- Structureflow: Image inpainting via structure-aware appearance flow. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 181–190, 2019.
- Inras: Implicit neural representation for audio scenes. Neurips, 35:8144–8158, 2022.
- Resolution-robust large mask inpainting with fourier convolutions. WACV, 2022.
- Df-gan: A simple and effective baseline for text-to-image synthesis. In CVPR, pp. 16515–16525, 2022.
- Image inpainting via generative multi-column convolutional neural networks. Neurips, 31, 2018.
- S-nerf: Neural radiance fields for street views. arXiv preprint arXiv:2303.00749, 2023.
- Groupvit: Semantic segmentation emerges from text supervision. In CVPR, pp. 18134–18144, 2022.
- Tap: Text-aware pre-training for text-vqa and text-caption. In CVPR, pp. 8751–8761, 2021.
- Volume rendering of neural implicit surfaces. Neurips, 34:4805–4815, 2021.
- Coordinates are not lonely–codebook prior helps implicit neural 3d representations. arXiv preprint arXiv:2210.11170, 2022.
- Text-guided neural image inpainting. In Proceedings of the 28th ACM International Conference on Multimedia, pp. 1302–1310, 2020.
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
- Towards video text visual question answering: Benchmark and baseline. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.
- MI Zhenxing and Dan Xu. Switch-nerf: Learning scene decomposition with mixture of experts for large-scale neural radiance fields. In The Eleventh International Conference on Learning Representations, 2022.
- Scene parsing through ade20k dataset. In CVPR, pp. 633–641, 2017.
- Extract free dense labels from clip. In European Conference on Computer Vision, pp. 696–712. Springer, 2022.
- One model to edit them all: Free-form text-driven image manipulation with semantic modulations. Neurips, 35:25146–25159, 2022.