- The paper introduces PIP, a method that transforms LLM hallucinations into imaginative outputs for creative and narrative applications.
- It employs Low-Rank Adaptation on the Llama-3.2-1B-Instruct model using a synthetic dataset to deliberately elicit surreal responses.
- Preliminary results and system visualizations indicate that controlled hallucinations can boost creative engagement in mixed-reality and storytelling environments.
This paper introduces Purposefully Induced Psychosis (PIP), a method that reframes LLM hallucinations not as errors, but as a form of computational imagination useful for creative tasks. Instead of trying to eliminate factually inaccurate or surreal outputs, PIP intentionally encourages them through fine-tuning. The inspiration comes from literary figures like Pip in Moby-Dick, whose "madness" led to profound insights, and the concept of consensual illusions in performance arts like theater and magic, where audiences willingly suspend disbelief.
PIP Methodology and Implementation:
- Goal: To amplify LLM hallucinations for imaginative applications like speculative fiction, interactive storytelling, and mixed-reality simulations.
- Technique: The core technique involves fine-tuning a base LLM using Low-Rank Adaptation (LoRA). This modifies the model to favor speculative, metaphorical, and surreal outputs when prompted appropriately, while preserving its general language capabilities.
- Base Model: The implementation used Meta's
Llama-3.2-1B-Instruct
model.
- Fine-tuning Data: The model was fine-tuned on the
PIP-One
dataset, a collection of 5,000 synthetic instruction-following pairs designed to elicit creative and non-literal responses. Examples include prompts like "Imagine the cosmic symphony as a song sung by stars. Describe its melody." or "You are a mythical creature who can taste colors. What does a supernova taste like?".
- Training Configuration: Fine-tuning specifics included a learning rate of 1e-5, a batch size of 1 per device, gradient accumulation over 8 steps, and a cosine learning rate scheduler, facilitated by Hugging Face AutoTrain. The model configuration specified a hidden size of 2048, intermediate size of 8192, 32 attention heads across 16 layers, and an increased maximum position embedding size of 131072.
- Control: A hybrid strategy combining model-level fine-tuning (via LoRA) and prompt-level control allows managing when the model produces imaginative versus more standard outputs.
System Architecture (PIPeline):
The end-to-end system involves:
- Data Ingestion: Using the
PIP-One
synthetic dataset.
- Model: Fine-tuned Llama 3.2-1B-Instruct with LoRA adapters.
- PIP API: A lightweight API to handle queries and route them to the fine-tuned model.
- Interface Layer: Supports both text-based interaction and a mixed-reality (XR) environment.
Mixed-Reality Application:
A key application demonstrated is an XR simulation built in Unity for the Meta Quest 3 headset. This system integrates several components:
- PIP Model: Accessed via the Hugging Face API for generating surreal text responses to user voice input.
- Secondary LLM: Parses PIP's text output into structured JSON specifying elements for 3D visualization (object type, material, color, behavior).
- Speech Interaction: Uses Meta Voice SDK for Text-to-Speech (TTS) to voice PIP's responses and Eleven Labs' API for Speech-to-Text (STT) to process user queries.
- 3D Generation: The structured JSON is sent to the Meshy API to generate 3D meshes in real-time.
- XR Environment: Generated 3D objects ("hallucinations") are spatially anchored in the user's physical environment using passthrough AR. Users interact via hand tracking and voice commands.
The workflow (illustrated in Figure 1 of the paper) shows user input -> PIP model -> JSON structuring -> 3D generation/TTS -> XR visualization and interaction.
Conceptual Framework and Observations:
- Consensual Illusions: PIP operates like a "digital magician," creating illusions that users engage with willingly for creative stimulation. This relies on clear context and user consent, distinguishing it from harmful misinformation.
- Creative Domains: The approach is intended for creative fields (writing, design, art, brainstorming) where non-factual or unexpected outputs can be valuable provocations, contrasting with high-stakes domains requiring accuracy (e.g., medicine, law).
- Preliminary User Feedback: Early tests (n=10) suggested users found the surreal outputs "unsettling in a generative way" and like "collaborative misfires" that sparked new ideas and increased engagement by introducing surprise and encouraging creative risk-taking. Visualizations of word embeddings (Figure 2) illustrate the difference between structured, poetic outputs and more fragmented, highly hallucinatory ones.
Ethical Considerations and Future Work:
- Context and Consent: Clear labeling and interface design are crucial to ensure users understand when they are interacting with a model in "imaginative mode" versus "factual mode." Potential solutions include distinct modes or toggles.
- Trust: Normalizing illusions in creative contexts must not undermine trust in AI for factual tasks.
- Future Directions: Refining user interfaces, conducting structured user studies on creativity and engagement, and exploring further applications in interactive art and storytelling.
In essence, the paper proposes a practical method (LoRA fine-tuning on creative datasets) and a conceptual framework (consensual illusion) for harnessing LLM hallucinations as a creative tool, demonstrated through a text interface and an integrated mixed-reality system.