Generative Imagination in AI
The development of artificial intelligence that can tackle creative tasks represents a significant stride in the field of AI. A recent investigation explores the creation of embodied agents, specifically oriented toward open-ended creative tasks. These agents, unlike their predecessors whose functionality was limited to clear instructions and specific goals, exhibit creativity by generating novel and diverse solutions for tasks characterized by ambiguous instructions.
How Creative Agents Work
Creative agents are implemented as a combination of an imaginator and a controller. The former is responsible for producing detailed task outcomes based on linguistic instructions. There are two innovative approaches to implementing these components. The imaginator can be either a LLM producing text-based imaginations or a diffusion model conjuring visual imaginations. Upon envisioning the task outcome, these are then used by the controller to execute the required actions within the environment.
The Techniques and Benchmarks
The controller comes in two forms: either as a behavior-cloning policy learned from a dataset or as a pre-trained foundation model that generates executable code. These agents are benchmarked in Minecraft, a challenging open-world game, where the problem is to create diverse buildings following free-form language instructions. To truly measure the innovation of these agents, new evaluation metrics are proposed using GPT-4V. This method offers a general and human-independent evaluation advantage by leveraging the VLM’s analytical strengths.
Evaluations and Effects
Experimental analysis in the Minecraft domain has showcased the proficiency of these creative agents. They have succeeded in survival mode, an achievement never met by previous research. Through detailed analysis, it was observed that Chain-of-Thought (CoT) imagination enriches the task details and a vision-LLM (VLM) as a controller infers marginally better performance. Remarkably, agents powered by Diffusion+GPT-4V have shown robustness even when coping with the noise in the visual imaginations provided by the diffusion model.
Future Horizons
This constitutes a major leap in artificial intelligence research, with the potential to amplify the creativity quotient in AI agents. It opens doors to new possibilities for tasks beyond the confines of clear and narrow instructions, extending AI's reach into the field of human-like imagination and creativity. Of note is the idea to open-source the dataset and models, an approach poised to facilitate future research in open-ended environments and creative artificial intelligence.