CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction (2502.07316v4)

Published 11 Feb 2025 in cs.CL and cs.AI

Abstract: Reasoning is a fundamental capability of LLMs. While prior research predominantly focuses on enhancing narrow skills like math or code generation, improving performance on many other reasoning tasks remains challenging due to sparse and fragmented training data. To address this issue, we propose CodeI/O, a novel approach that systematically condenses diverse reasoning patterns inherently embedded in contextually-grounded codes, through transforming the original code into a code input-output prediction format. By training models to predict inputs/outputs given code and test cases entirely in natural language as Chain-of-Thought (CoT) rationales, we expose them to universal reasoning primitives -- like logic flow planning, state-space searching, decision tree traversal, and modular decomposition -- while decoupling structured reasoning from code-specific syntax and preserving procedural rigor. Experimental results demonstrate CodeI/O leads to consistent improvements across symbolic, scientific, logic, math & numerical, and commonsense reasoning tasks. By matching the existing ground-truth outputs or re-executing the code with predicted inputs, we can verify each prediction and further enhance the CoTs through multi-turn revision, resulting in CodeI/O++ and achieving higher performance. Our data and models are available at https://github.com/hkust-nlp/CodeIO.

PDF Abstract

CodeI/O: Enhancing Reasoning in LLMs through Code Input-Output Prediction

The paper "CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction" provides a novel approach to improving reasoning capabilities of LLMs by leveraging code's inherent logical structure. In essence, the paper introduces CodeI/O, which focuses on training models to predict inputs or outputs based on code snippets and subsequent test cases—executed entirely as natural language Chain-of-Thought (CoT) rationales.

The authors highlight a limitation faced in prior research: the fragmentation and sparseness of training data which hamper the performance enhancement of LLMs on broader reasoning tasks. While previous efforts have concentrated on specific skills like mathematical problem solving or code generation, these approaches fall short in covering the spectrum of reasoning capabilities expected of advanced LLMs.

Innovation of CodeI/O Approach:

CodeI/O addresses the aforementioned limitation by transforming raw code into a structured input-output prediction format. This transformation allows LLMs to recognize and learn diverse reasoning patterns embedded in various contexts. This focus encompasses typical reasoning processes such as logic flow planning, state-space searching, decision tree traversal, and modular decomposition—all while dissociating structured reasoning from code-specific syntax to maintain procedural rigor. The research leverages diverse real-world code, transforming them into executable functions and formulating tasks that require predicting feasible inputs given outputs, or vice versa, in natural language.

Key Findings and Results:

The paper provides empirical evidence demonstrating that CodeI/O enhances model performance across a range of reasoning tasks—not restricted to code-related scenarios. Through a comprehensive evaluation involving 454,900 code files and a total of 3.5 million training samples, models trained with CodeI/O show considerable gains on benchmarks evaluating symbolic, numerical, logic, and commonsense reasoning.

An advanced variant, CodeI/O++, introduces multi-turn revision where predictions are verified via code execution, and erroneous predictions are iteratively refined using LLM-based assessments. This enhanced data-driven approach further cements performance gains across task domains.

Implications and Future Directions:

The implications of the research are multifaceted. Practically, the approach offers a scalable method to enrich LLM training datasets with diverse reasoning examples without requiring intra-domain overfitting. Theoretically, it paves the way for constructing a unified understanding of reasoning across multiple domains within LLMs.

This work sets the stage for further exploration into the intersection of coding, logic, and language, suggesting that future AI developments could benefit from deeper integration between natural language processing capabilities and programmatic logic. The authors speculate on the potential of further combining LLMs with execution-based frameworks or reinforcement learning paradigms to maximize reasoning proficiency.

Ultimately, CodeI/O is posited as an essential step towards bridging the gap between human-like cognitive reasoning and machine intelligence, offering concrete methodologies to expand the general reasoning abilities of large-scale AI models.