- The paper introduces CodeIt, a self-improving approach using prioritized hindsight replay that enhances ARC task performance by 15%.
- It utilizes a two-stage process of program sampling with hindsight relabeling and prioritized experience replay to refine model outputs.
- Ablation studies confirm that components like the ExIt mechanism and replay buffer significantly boost the model's problem-solving efficiency.
Introduction
The domain of general AI often grapples with the creation of models that can exhibit human-like intelligence across various cognitive tasks. One such benchmark to measure general intelligence in AI systems is the Abstraction and Reasoning Corpus (ARC), a collection of tasks designed to mimic the fluid intelligence and problem-solving capabilities of humans. Within ARC, the tasks are presented as programs, where given input-output pairs serve as examples to derive the logic or rule that governs the transformation of inputs into outputs. The ARC challenge has been a formidable one, with state-of-the-art approaches making only incremental progress. In particular, the performance of AI systems on ARC has remained considerably beneath human capabilities.
CodeIt Methodology
In contrast to traditional methods, a recent approach, dubbed CodeIt, offers a breakthrough in the form of a scalable self-improvement strategy for LLMs to tackle such complex tasks. CodeIt is predicated on iterating between two paramount stages: program sampling with hindsight relabeling and learning from prioritized experience replay.
The core idea lies in reframing the ARC tasks into programming-by-examples problems, granting the model a chance to generate programs that match example outputs. Initial training involves ground truth data assimilated through a domain-specific language (DSL), bolstered by mutation methods to foster data augmentation and model familiarity with the DSL syntax.
During the sampling stage, programs are authored using a pretrained LLM policy. Non-compliant and time-intensive samples are culled, while the rest are stored in a replay buffer. These buffered samples are critical for the learning stage as they ensure continual feeding of experiences into the model's training regime, a process that's further optimized through prioritized replay to address the concern of catastrophic forgetting.
Experimental Results
When CodeIt was deployed on the full ARC evaluation dataset, results were impressive: it resolved some 15% of the tasks, thereby setting a new benchmark and eclipsing the prior best neural and symbolic methods. Digging deeper, an examination of the discovered programs revealed their conciseness and diversity compared to baseline alternatives, indicating not only model's efficiency but also its cognitive likeness to human reasoning in program generation.
Impact of CodeIt Components
A series of ablation studies dissected the influence of CodeIt's components on task performance. Results affirmed that each aspect -- the ExIt mechanism, the hindsight relabeling, and the assimilation of prior knowledge from pretrained models -- plays a significant role in enhancing overall task performance. The system’s finesse lies in its capacity to capitalize on these features to consistently refine its approach, seeking more efficient solutions over time.
Conclusion
The paper showcases that CodeIt's self-improving loop, driven by collecting and learning from practical experiences, along with the adoption of prior knowledge sources, emboldens LLMs to achieve remarkable and tangible advancements in tackling benchmarks such as the ARC. This outperformance provides an optimistic outlook for neuro-symbolic AI systems' trajectory in reaching and potentially surpassing human-level reasoning and complexity handling. As AI makes strides in the field of generalized intelligence, the CodeIt method stands as a testament to the fruitful union of experience-based learning and strategic insights from human-like reasoning patterns.