Attack Deterministic Conditional Image Generative Models for Diverse and Controllable Generation (2403.08294v1)
Abstract: Existing generative adversarial network (GAN) based conditional image generative models typically produce fixed output for the same conditional input, which is unreasonable for highly subjective tasks, such as large-mask image inpainting or style transfer. On the other hand, GAN-based diverse image generative methods require retraining/fine-tuning the network or designing complex noise injection functions, which is computationally expensive, task-specific, or struggle to generate high-quality results. Given that many deterministic conditional image generative models have been able to produce high-quality yet fixed results, we raise an intriguing question: is it possible for pre-trained deterministic conditional image generative models to generate diverse results without changing network structures or parameters? To answer this question, we re-examine the conditional image generation tasks from the perspective of adversarial attack and propose a simple and efficient plug-in projected gradient descent (PGD) like method for diverse and controllable image generation. The key idea is attacking the pre-trained deterministic generative models by adding a micro perturbation to the input condition. In this way, diverse results can be generated without any adjustment of network structures or fine-tuning of the pre-trained models. In addition, we can also control the diverse results to be generated by specifying the attack direction according to a reference text or image. Our work opens the door to applying adversarial attack to low-level vision tasks, and experiments on various conditional image generation tasks demonstrate the effectiveness and superiority of the proposed method.
- Synthetic and natural noise both break neural machine translation. arXiv preprint arXiv:1711.02173.
- Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096.
- Hopskipjumpattack: A query-efficient decision-based attack. In 2020 ieee symposium on security and privacy (sp), 1277–1294. IEEE.
- PSD: Principled synthetic-to-real dehazing guided by physical priors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 7180–7189.
- User-controllable arbitrary style transfer via entropy regularization.
- Chiu, T.-Y. 2019. Understanding generalized whitening and coloring transform for universal style transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 4452–4460.
- StyTr^ 2: Unbiased Image Style Transfer with Transformers. arXiv preprint arXiv:2105.14576.
- Compression artifacts reduction by a deep convolutional network. In Proceedings of the IEEE international conference on computer vision, 576–584.
- Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 12873–12883.
- StyleGAN-NADA: CLIP-guided domain adaptation of image generators. ACM Transactions on Graphics (TOG), 41(4): 1–13.
- Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2414–2423.
- Generative adversarial networks. Communications of the ACM, 63(11): 139–144.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.
- Hertzmann, A. 2003. A survey of stroke-based rendering. Institute of Electrical and Electronics Engineers.
- Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1125–1134.
- Perceptual losses for real-time style transfer and super-resolution. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, 694–711. Springer.
- A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 4401–4410.
- Clipstyler: Image style transfer with a single text condition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18062–18071.
- Mat: Mask-aware transformer for large hole image inpainting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 10758–10768.
- Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF international conference on computer vision, 1833–1844.
- Reduce information loss in transformers for pluralistic image inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11347–11357.
- Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083.
- Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784.
- Edgeconnect: Generative image inpainting with adversarial edge learning. arXiv preprint arXiv:1901.00212.
- The limitations of deep learning in adversarial settings. In 2016 IEEE European symposium on security and privacy (EuroS&P), 372–387. IEEE.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, 8748–8763. PMLR.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 10684–10695.
- A style-aware content loss for real-time hd style transfer. In proceedings of the European conference on computer vision (ECCV), 698–714.
- Resolution-robust large mask inpainting with fourier convolutions. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2149–2159.
- Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.
- Eigenfaces for recognition. Journal of cognitive neuroscience, 3(1): 71–86.
- High-fidelity pluralistic image completion with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 4692–4701.
- Diversified arbitrary style transfer via deep feature perturbation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7789–7798.
- DivSwapper: towards diversified patch-based arbitrary style transfer. arXiv preprint arXiv:2101.06381.
- Fast is better than free: Revisiting adversarial training. arXiv preprint arXiv:2001.03994.
- Characterizing adversarial examples based on spatial consistency information for semantic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), 217–234.
- Image denoising and inpainting with deep neural networks. Advances in neural information processing systems, 25.
- Learning to incorporate structure knowledge for image inpainting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 12605–12612.
- Free-form image inpainting with gated convolution. In Proceedings of the IEEE/CVF international conference on computer vision, 4471–4480.
- Diverse image inpainting with bidirectional and autoregressive transformers. In Proceedings of the 29th ACM International Conference on Multimedia, 69–78.
- Multimodal image synthesis and editing: A survey. arXiv preprint arXiv:2112.13592.
- Large scale image completion via co-modulated generative adversarial networks. arXiv preprint arXiv:2103.10428.
- Pluralistic image completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1438–1447.
- Tianyi Chu (11 papers)
- Wei Xing (34 papers)
- Jiafu Chen (5 papers)
- Zhizhong Wang (14 papers)
- Jiakai Sun (8 papers)
- Lei Zhao (808 papers)
- Haibo Chen (93 papers)
- Huaizhong Lin (7 papers)