Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 71 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 22 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 88 tok/s Pro
Kimi K2 138 tok/s Pro
GPT OSS 120B 446 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models (2311.17919v2)

Published 29 Nov 2023 in cs.CV

Abstract: We address the problem of synthesizing multi-view optical illusions: images that change appearance upon a transformation, such as a flip or rotation. We propose a simple, zero-shot method for obtaining these illusions from off-the-shelf text-to-image diffusion models. During the reverse diffusion process, we estimate the noise from different views of a noisy image, and then combine these noise estimates together and denoise the image. A theoretical analysis suggests that this method works precisely for views that can be written as orthogonal transformations, of which permutations are a subset. This leads to the idea of a visual anagram--an image that changes appearance under some rearrangement of pixels. This includes rotations and flips, but also more exotic pixel permutations such as a jigsaw rearrangement. Our approach also naturally extends to illusions with more than two views. We provide both qualitative and quantitative results demonstrating the effectiveness and flexibility of our method. Please see our project webpage for additional visualizations and results: https://dangeng.github.io/visual_anagrams/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. AUTOMATIC1111. Negative prompt. https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Negative-prompt, 2022. Accessed: November 7, 2023.
  2. Diffusion illusions: Hiding images in plain sight. https://ryanndagreat.github.io/Diffusion-Illusions, 2023.
  3. Designing perceptual puzzles by differentiating probabilistic programs. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–9, 2022.
  4. Optical illusion shape texturing using repeated asymmetric patterns. The Visual Computer, 30:809–819, 2014.
  5. Camouflage images. ACM Trans. Graph., 29(4):51–1, 2010.
  6. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
  7. Implicit generation and generalization in energy-based models. arXiv preprint arXiv:1903.08689, 2019.
  8. Compositional visual generation with energy based models. Advances in Neural Information Processing Systems, 33:6637–6647, 2020.
  9. Reduce, reuse, recycle: Compositional generation with energy-based diffusion models and mcmc. In International Conference on Machine Learning, pages 8489–8510. PMLR, 2023.
  10. Werner Ehm. A variational approach to geometric-optical illusions modeling. Proceedings of Fechner Day, 27(1):41–46, 2011.
  11. Adversarial examples that fool both computer vision and time-limited humans. Advances in neural information processing systems, 31, 2018.
  12. Motion without movement. ACM Siggraph Computer Graphics, 25(4):27–30, 1991.
  13. Compositional sculpting of iterative generative processes. arXiv preprint arXiv:2309.16115, 2023.
  14. Convolutional neural networks can be deceived by visual illusions. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12309–12317, 2019.
  15. On the synthesis of visual illusions using deep generative models. Journal of Vision, 22(8):2–2, 2022.
  16. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
  17. Diffusion models as plug-and-play priors. Advances in Neural Information Processing Systems, 35:14715–14728, 2022.
  18. Ganmouflage: 3d object nondetection with texture fields. Computer Vision and Pattern Recognition (CVPR), 2023.
  19. Aaron Hertzmann. Visual indeterminacy in gan art. In ACM SIGGRAPH 2020 Art Gallery, pages 424–428. 2020.
  20. Color visual illusions: A statistics-based computational model. Advances in neural information processing systems, 33:9447–9458, 2020.
  21. Classifier-free diffusion guidance, 2022.
  22. Denoising diffusion probabilistic models. arXiv preprint arxiv:2006.11239, 2020.
  23. Intriguing properties of generative classifiers. arXiv preprint arXiv:2309.16779, 2023.
  24. If by deepfloyd lab at stabilityai, 2023. GitHub repository.
  25. Monster Labs. Controlnet qr code monster v2 for sd-1.5, 2023.
  26. Learning to compose visual relations. Advances in Neural Information Processing Systems, 34:23166–23178, 2021.
  27. Compositional visual generation with composable diffusion models. In European Conference on Computer Vision, pages 423–439. Springer, 2022.
  28. A parametric framework to generate visual illusions using python. Perception, 50(11):950–965, 2021.
  29. Is clip fooled by optical illusions? 2023.
  30. Glide: Towards photorealistic image generation and editing with text-guided diffusion models, 2021.
  31. Hybrid images. ACM Trans. Graph., 25(3):527–532, 2006.
  32. Camouflaging an object from many viewpoints. 2014.
  33. Dreamfusion: Text-to-3d using 2d diffusion. arXiv, 2022.
  34. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
  35. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  36. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  37. Photorealistic text-to-image diffusion models with deep language understanding, 2022.
  38. Network simulations of optical illusions. International Journal of Modern Physics C, 28(02):1750018, 2017.
  39. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning, pages 2256–2265, Lille, France, 2015. PMLR.
  40. Denoising diffusion implicit models. arXiv:2010.02502, 2020.
  41. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
  42. Matthew Tancik. Illusion diffusion. https://github.com/tancik/Illusion-Diffusion, 2023.
  43. Ugleh. Spiral town - different approach to qr monster. https://www.reddit.com/r/StableDiffusion/comments/16ew9fz/spiral_town_different_approach_to_qr_monster/, 2023.
  44. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12619–12629, 2023.
  45. Toward quantifying ambiguities in artistic images. ACM Transactions on Applied Perception (TAP), 17(4):1–10, 2020.
  46. Wikipedia contributors. The dress. https://en.wikipedia.org/wiki/The_dress. Accessed: November 9, 2023.
  47. Adding conditional control to text-to-image diffusion models, 2023.
Citations (15)

Summary

  • The paper presents a zero-shot approach leveraging reverse diffusion to generate images that morph through specific pixel rearrangements.
  • The method applies orthogonal transformations including rotations, flips, and complex 'polymorphic jigsaws' to achieve seamless multi-view consistency.
  • Quantitative CLIP analysis shows superior alignment and concealment compared to previous techniques, validating its theoretical foundations.

Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models

The paper "Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models" explores an innovative application of text-to-image diffusion models to generate multi-view optical illusions. These illusions are images intended to change appearance upon undergoing transformations such as flips, rotations, or more unconventional permutations. The proposed method leverages the capabilities of off-the-shelf diffusion models in a zero-shot fashion, effectively bypassing the necessity of explicit human perception models—a characteristic differentiator from some prior computational approaches.

Core Methodology

The approach introduced utilizes a reverse diffusion process where noise is estimated from different views of a noisy image. These noise estimates are computationally combined and used to refine the image, leading to the emergence of the optical illusion. The paper theoretically proves that this methodology is particularly suited for views that can be represented as orthogonal transformations, a category encompassing permutations. This insight is foundational, as it establishes a formal definition for what is termed a "visual anagram"—an image designed to morph its appearance when subject to specific pixel rearrangements.

Transformations and Implementation

The paper does not limit itself to conventional transformations such as rotations and flips but expands to incorporate more complex permutations such as pixel rearrangement akin to jigsaw puzzles—which they term "polymorphic jigsaws." In practice, the versatility of this method is further evidenced by its capability to manage illusions with more than two perspectives.

A critical design choice in their implementation is the employment of a pixel-based diffusion model, in contrast to latent models, due to potential artifacts associated with the latter. By operating directly on pixel data, seamless transformations are achieved without the distortions that might arise from latent feature orientation changes.

Quantitative and Qualitative Analysis

Quantitatively, the performance is assessed using CLIP-based metrics to gauge alignment and concealment of the synthesized illusions. Impressively, the method shows superior results compared to existing techniques, showcasing stronger alignment and concealment across various tested datasets. These outcomes reflect a nuanced balance between rendering different prompts under transformations and maintaining visible distinctions in each view.

Qualitative examples further validate the method's robustness, including illusions where visual components serve dual functions across multiple interpretations depending on the view—demonstrating the system's adeptness at integrating distinct elements into cohesive visual narratives.

Implications and Future Directions

The implications of this research are multifaceted, impacting both theoretical and practical domains. Theoretically, the exploration adds to the growing understanding of how generative models can intuitively align with human-like perception mechanisms. Practically, applications range from artistic exploration in media to potential usage in psychological studies regarding perception.

Future research could explore expanding the repertoire of transformations by overcoming the orthogonality constraint to include non-linear operations. Moreover, addressing the current method's limitations in consistently delivering perfect illusions marks a promising avenue. This could involve optimizing the initial conditions or further refining noise estimation techniques.

In conclusion, this paper presents a sophisticated methodology for generating multi-view optical illusions using diffusion models, validated by both theoretical underpinnings and empirical results. It sets the stage for further advancements in generative model applications and perceptual studies, expanding the potential of AI-driven creative processes.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 posts and received 2 likes.

Youtube Logo Streamline Icon: https://streamlinehq.com