Probing and Inducing Combinational Creativity in Vision-Language Models (2504.13120v2)

Published 17 Apr 2025 in cs.CV, cs.AI, and cs.CL

Abstract: The ability to combine existing concepts into novel ideas stands as a fundamental haLLMark of human intelligence. Recent advances in Vision-LLMs (VLMs) like GPT-4V and DALLE-3 have sparked debate about whether their outputs reflect combinational creativity--defined by M. A. Boden (1998) as synthesizing novel ideas through combining existing concepts--or sophisticated pattern matching of training data. Drawing inspiration from cognitive science, we investigate the combinational creativity of VLMs from the lens of concept blending. We propose the Identification-Explanation-Implication (IEI) framework, which decomposes creative processes into three levels: identifying input spaces, extracting shared attributes, and deriving novel semantic implications. To validate this framework, we curate CreativeMashup, a high-quality dataset of 666 artist-generated visual mashups annotated according to the IEI framework. Through extensive experiments, we demonstrate that in comprehension tasks, best VLMs have surpassed average human performance while falling short of expert-level understanding; in generation tasks, incorporating our IEI framework into the generation pipeline significantly enhances the creative quality of VLMs' outputs. Our findings establish both a theoretical foundation for evaluating artificial creativity and practical guidelines for improving creative generation in VLMs.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (8)

YouTube

Show All Videos

Probing and Inducing Combinational Creativity in Vision-Language Models (2504.13120v2)

Summary

Follow-up Questions

Related Papers

Authors (8)

YouTube