Tackling the Abstraction and Reasoning Corpus (ARC) with Object-centric Models and the MDL Principle
In the field of advancing AI to achieve human-level intelligence, the Abstraction and Reasoning Corpus (ARC) benchmark has been pivotal. It raises the challenge of generating colored grids based on sparse examples, thus aiming to push AI systems beyond narrow generalization to demonstrate flexibility and adaptability in novel tasks. This paper introduces a novel method employing object-centric models guided by the Minimum Description Length (MDL) principle to tackle ARC tasks. This method contrasts with prior approaches that primarily utilize transformation-based program synthesis.
Introduction
Despite significant strides in specific AI applications such as image recognition, board games, and natural language processing, AI systems often falter in generalizing across novel tasks with minimal training. This limitation has spurred efforts to develop more versatile and adaptive AI, with ARC serving as a unique psychometric test differentiating human and machine intelligence. ARC tasks demand converting input grids to output grids based on a few exemplars, a domain where humans significantly outshine even the best AI models.
Contributions
This paper makes two notable contributions to address the ARC challenge:
- Object-centric Models: The authors propose models that parse and generate grids in terms of object patterns and computations on those objects. This contrasts with transformation-based approaches and aligns more closely with how humans conceptualize and solve ARC tasks.
- MDL-based Efficient Search: The MDL principle is employed to efficiently navigate the large space of possible models. This principle facilitates finding models that compress the data effectively, reflecting more succinct and natural programs.
Methodology
Object-centric Models
The proposed object-centric models combine patterns and functions, diverging from purely procedural transformation-based approaches. These models handle invariants versus variants in tasks by defining pattern constructors for grid elements (e.g., squares, rectangles) and leveraging references to input grid components and functions for output generation.
ARC task models consist of an input grid model and an output grid model. Parsing and generating grids with these models involve determining descriptions that can either parse a grid according to the model or generate a grid from a model description. These functions enable three operational modes:
- Predict: Generating an output grid from an input grid.
- Describe: Providing a joint description for input-output grid pairs.
- Create: Synthesizing new input-output grid pairs consistent with the task model.
MDL-based Learning
MDL principle drives the learning process by selecting models that best compress the training data set. Two-part MDL measures the model and the data encoded by the model:
- L(M,D)=L(M)+α(gi,go)∑L(gi,go∣M)
- L(M) describes the model's complexity.
- L(D∣M) accounts for the data encoded with the model, adjusted by a rehearsal factor α.
The strategy involves starting with an initial model and iteratively refining it based on the ability to compress data. Pruning further generalizes models post-learning, ensuring they do not overfit to specific training examples.
Evaluation
Performance on ARC
The approach was evaluated on 800 ARC tasks and in ARCathon'22. It demonstrated competitive performance:
- Solved 96/400 training tasks (24%) with an average runtime of 4.6 seconds per task.
- Achieved a generalization rate of 92% on training tasks and 72% on evaluation tasks, indicating strong model consistency across different examples.
Compared to previous methods, this approach uses fewer primitives (30) and achieves substantial depth of search (average of 19 steps), reflecting efficient and comprehensive model refinement.
Application to FlashFill
By redefining patterns and functions, the approach was adapted to automate spreadsheets' cell transformations (FlashFill). It performed well on the example set, solving 5/14 tasks completely and demonstrating partial success on others. This highlighted the method's versatility beyond ARC, capable of learning models that align closely with natural programs produced by humans.
Conclusion and Future Prospects
This research advances AI’s ability to efficiently solve novel tasks with minimal training data by leveraging object-centric models and the MDL principle. Future work aims to broaden the scope of the models to cover more comprehensive knowledge priors required by ARC and further enhance capabilities in domains like FlashFill. The results underscore the potential for developing more intelligent systems that mirror human cognitive processes in problem-solving.