Fast and flexible: Human program induction in abstract reasoning tasks (2103.05823v1)

Published 10 Mar 2021 in cs.HC, cs.AI, and cs.LG

Abstract: The Abstraction and Reasoning Corpus (ARC) is a challenging program induction dataset that was recently proposed by Chollet (2019). Here, we report the first set of results collected from a behavioral study of humans solving a subset of tasks from ARC (40 out of 1000). Although this subset of tasks contains considerable variation, our results showed that humans were able to infer the underlying program and generate the correct test output for a novel test input example, with an average of 80% of tasks solved per participant, and with 65% of tasks being solved by more than 80% of participants. Additionally, we find interesting patterns of behavioral consistency and variability within the action sequences during the generation process, the natural language descriptions to describe the transformations for each task, and the errors people made. Our findings suggest that people can quickly and reliably determine the relevant features and properties of a task to compose a correct solution. Future modeling work could incorporate these findings, potentially by connecting the natural language descriptions we collected here to the underlying semantics of ARC.

PDF Abstract

Fast and Flexible: Human Program Induction in Abstract Reasoning Tasks

The paper, "Fast and flexible: Human program induction in abstract reasoning tasks" by Aysja Johnson, Wai Keen Vong, Brenden M. Lake, and Todd M. Gureckis, investigates human performance on a subset of tasks from the Abstraction and Reasoning Corpus (ARC). The ARC, introduced by François Chollet, is designed to assess abstract reasoning and generalization capabilities, posing significant challenges for contemporary AI systems.

Methodology and Experimentation

The paper examines human problem-solving behavior on 40 of the 1000 tasks in the ARC training set. Participants were presented with a limited number of input-output pairs (2-6) and tasked with generating the correct output for a novel test input. The overall procedure involved participants using an interactive interface to manually craft outputs by selecting and applying various tools.

Key Findings and Results

Participants demonstrated strong performance, accurately solving approximately 83.8% of tasks on average. Performance greatly exceeded that of computational models, such as the best algorithm in a recent Kaggle competition which achieved an accuracy of only 21% on a similar range of tasks. Notably, human participants required no extensive training or familiarity with the ARC, effectively utilizing prior knowledge and abstract reasoning to infer solutions.

Accuracy Distribution: A significant variance was observed in task difficulty. While most tasks were solvable by over 80% of participants, some tasks posed considerable challenges, with the hardest being solved correctly by only 38.1% of participants.
Time to Completion: The average time for task completion was around 3 minutes and 6 seconds, revealing the deliberative and inferential nature of the processes involved.
Error Analysis: Human errors often adhered closely to the problem structure, unlike the machine errors which frequently violated fundamental task constraints, such as object continuity and spatial relations.

Action Sequences and Natural Language Descriptions

The paper provides a detailed analysis of action sequences and written descriptions by participants:

Action Sequences: Participants exhibited a variety of strategies converging on common intermediate states that often represented object-centric goals. This flexibility starkly contrasts with the rigid and exhaustive search algorithms employed by machine-learning models.
Natural Language Descriptions: Participants’ descriptions were categorized into nine distinct content classes, with color, object, and geometric terms being most frequent. This analysis underscores the diversity of cognitive strategies employed.

Theoretical Implications

The results pose two main challenges to existing probabilistic language-of-thought (pLOT) models in cognitive science:

Hypothesis Space: Traditional LOT models rely on a fixed set of primitives, but human performance on ARC suggests a more dynamic hypothesis space leveraging extensive conceptual background knowledge.
Object Perception Flexibility: ARC tasks demonstrate the flexible nature of object perception, a capability that extends beyond the symbolic representations typically considered in LOT models.

Future Directions

The behavioral insights from this paper have significant implications for both cognitive science and artificial intelligence. Future work could involve expanding the task set to validate these preliminary findings across a broader range of ARC tasks. Further experiments could manipulate task parameters to elucidate specific cognitive processes involved in program induction and abstract reasoning. The ultimate goal could be the development of computational models that integrate these human-like flexible and dynamic approaches to problem-solving, potentially enhancing AI's ability to generalize and reason abstractly.

In summary, the paper provides a comprehensive examination of human program induction capabilities in the context of ARC tasks, highlighting the strengths of human cognition in abstract and flexible reasoning. This comparison with computational models elucidates the current gaps in AI and opens avenues for improved integration of human cognitive strategies into machine learning frameworks.