MacGyver: Are Large Language Models Creative Problem Solvers? (2311.09682v3)

Published 16 Nov 2023 in cs.CL and cs.AI

Abstract: We explore the creative problem-solving capabilities of modern LLMs in a novel constrained setting. To this end, we create MACGYVER, an automatically generated dataset consisting of over 1,600 real-world problems deliberately designed to trigger innovative usage of objects and necessitate out-of-the-box thinking. We then present our collection to both LLMs and humans to compare and contrast their problem-solving abilities. MACGYVER is challenging for both groups, but in unique and complementary ways. For instance, humans excel in tasks they are familiar with but struggle with domain-specific knowledge, leading to a higher variance. In contrast, LLMs, exposed to a variety of specialized knowledge, attempt broader problems but fail by proposing physically-infeasible actions. Finally, we provide a detailed error analysis of LLMs, and demonstrate the potential of enhancing their problem-solving ability with novel prompting techniques such as iterative step-wise reflection and divergent-convergent thinking. This work (1) introduces a fresh arena for intelligent agents focusing on intricate aspects of physical reasoning, planning, and unconventional thinking, which supplements the existing spectrum of machine intelligence; and (2) provides insight into the constrained problem-solving capabilities of both humans and AI.

PDF Abstract

Overview of "MacGyver: Are LLMs Creative Problem Solvers?"

This paper presents a paper on the creative problem-solving capabilities of LLMs within a constrained setting, introducing the "MacGyver" dataset. The dataset consists of over 1,600 real-world problems designed to trigger innovative object usage and out-of-the-box thinking. This research contrasts LLMs and human abilities in problem-solving, highlighting the unique strengths and weaknesses of each.

Dataset and Methodology

The authors introduce "MacGyver," a dataset curated to assess the creative capabilities of LLMs and humans. The dataset includes scenarios that necessitate unconventional tool usage, pushing against cognitive biases like functional fixedness. Their methodology involves generating scenarios using GPT-4 and refining them with human feedback to ensure quality and creativity, resulting in a challenging set of problems that require innovative solutions.

Key Findings

LLM vs. Human Performance: Humans exhibit broader variance in responses, performing better in familiar contexts. LLMs, such as GPT-4, often propose physically infeasible actions due to a lack of deep understanding of tool affordances and constraints. Despite this, LLMs performed well in domain-specific tasks due to their extensive pre-training data.
Error Typologies: Common errors in LLM responses include proposing infeasible actions, irrelevant solutions, and using nonexistent tools. These issues highlight the limitations in LLMs regarding physical and spatial reasoning.
Prompting Strategies: New prompting methods like Iterative Step-Wise Reflection and Divergent-Convergent Thinking were developed to mitigate these errors, showing improvement in LLM performance by enhancing their reasoning process.

Implications and Future Directions

The introduction of the MacGyver dataset expands the testing ground for AI's reasoning and creativity, focusing on everyday innovation rather than traditional logic or artistic creativity. This work emphasizes the complementary nature of human and AI capabilities, suggesting that collaborative approaches may yield better problem-solving results.

Future research could explore enhancing LLMs' reasoning with embodied agents that interact with physical environments, addressing the gap in understanding physical affordances. Additionally, developing automated evaluation metrics for creative problem-solving remains an open challenge.

In conclusion, this paper provides valuable insights into the comparison of human and AI problem-solving abilities, introducing novel approaches to improve AI's performance in creative tasks. The MacGyver dataset and its findings lay the groundwork for further exploration of AI in complex, real-world scenarios.