Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 77 tok/s

Gemini 2.5 Pro 54 tok/s Pro

GPT-5 Medium 29 tok/s Pro

GPT-5 High 26 tok/s Pro

GPT-4o 103 tok/s Pro

Kimi K2 175 tok/s Pro

GPT OSS 120B 454 tok/s Pro

Claude Sonnet 4.5 38 tok/s Pro

2000 character limit reached

Diffusion Illusions: Hiding Images in Plain Sight (2312.03817v1)

Published 6 Dec 2023 in cs.CV

Abstract: We explore the problem of computationally generating special prime' images that produce optical illusions when physically arranged and viewed in a certain way. First, we propose a formal definition for this problem. Next, we introduce Diffusion Illusions, the first comprehensive pipeline designed to automatically generate a wide range of these illusions. Specifically, we both adapt the existingscore distillation loss' and propose a new `dream target loss' to optimize a group of differentially parametrized prime images, using a frozen text-to-image diffusion model. We study three types of illusions, each where the prime images are arranged in different ways and optimized using the aforementioned losses such that images derived from them align with user-chosen text prompts or images. We conduct comprehensive experiments on these illusions and verify the effectiveness of our proposed method qualitatively and quantitatively. Additionally, we showcase the successful physical fabrication of our illusions -- as they are all designed to work in the real world. Our code and examples are publicly available at our interactive project website: https://diffusionillusions.com

Citations (11)

View on Semantic Scholar

Summary

The paper introduces an automated system that generates optical illusions by optimizing prime images using adapted diffusion models.
It employs a two-phase methodology with Score Distillation Loss and a novel Dream Target Loss to align image outputs with textual and visual prompts.
Comprehensive experiments verify the framework’s ability to fabricate illusions in the real world, despite challenges with high-frequency image details.

Introduction

Optical illusions have long been created with intensive effort and significant artistic skill, and while some types have been computationally generated for a while now, photorealistic illusions have remained a particular challenge. This paper introduces a system that automates the creation of optical illusions, utilizing deep learning models, particularly diffusion models, to transform text prompts into visual illusions.

Problem Formalization

The paper presents a formal definition of the task of generating visual illusions and establishes a generic framework called Diffusion Illusions. An illusion here is defined based on 'prime' images that, when physically arranged or viewed differently, produce distinct 'derived' images. This framework is capable of generating several types of illusions, including Flip Illusions, Rotation Overlay Illusions, and Hidden Overlay Illusions. The process of creating an illusion involves selecting, modeling, and then searching for prime image patterns that yield required derived images when presented in a certain arrangement.

Methodology

The core of the Diffusion Illusions system lies in optimizing a set of prime images using adapted and proposed loss functions against a frozen text-to-image diffusion model. The pipeline consists of two main phases:

Score Distillation Loss Phase: Initially, prime images are optimized using Score Distillation Loss, which involves signaling the derivative images to align with text prompts.
Dream Target Loss Phase: A novel technique called Dream Target Loss is then introduced to refine the prime images by pulling the derived images towards periodically updated target images.

An interesting aspect of the system is that it allows the generation of illusions integrating visual prompts – images that don't need textual description but are rather specific visual targets like QR codes.

Performance and Experiments

Comprehensive experiments are conducted to assess the capability of the framework to generate illusions that can be physically fabricated. The illusions were tested both quantitatively and qualitatively for various properties. Quantitative assessments included evaluating cosines similarity metrics between the images and prompts, image diversity, aesthetics, and a unique metric called Independence Score.

Furthermore, ablation studies show the framework's flexibility, including variations in the number of derived images, which showcases trade-offs between image quality and generation constraints.

Real-World Application and Limitations

The framework isn't just a theoretical construct but has been translated into actual physical fabrications. The illusion images can be printed, manipulated, and viewed in the real world. However, one limitation noted is that high-frequency details in prime images can lead to brittle illusions which may not tolerate real-world imperfections during fabrication. The paper also acknowledges the slower nature of the illusion generation process as a limitation, hinting at the potential for future speed improvements.

Conclusion

The paper presents a substantial leap in the automation of optical illusion generation. It discusses the flexibility of the Diffusion Illusions framework, its comprehensive testing, and successful adaptation to real-world applications. Ending on a forward-looking note, it suggests the potential for more advanced types of illusion generation and creative applications of diffusion models.