Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models (2506.06137v1)

Published 6 Jun 2025 in cs.LG and cs.CL

Abstract: Table reasoning (TR) requires structured reasoning over semi-structured tabular data and remains challenging, particularly for small LLMs (SLMs, e.g., LLaMA-8B) due to their limited capacity compared to large LMs (LLMs, e.g., GPT-4o). To narrow this gap, we explore program-based TR (P-TR), which circumvents key limitations of text-based TR (T-TR), notably in numerical reasoning, by generating executable programs. However, applying P-TR to SLMs introduces two challenges: (i) vulnerability to heterogeneity in table layouts, and (ii) inconsistency in reasoning due to limited code generation capability. We propose Table-r1, a two-stage P-TR method designed for SLMs. Stage 1 introduces an innovative self-supervised learning task, Layout Transformation Inference, to improve tabular layout generalization from a programmatic view. Stage 2 adopts a mix-paradigm variant of Group Relative Policy Optimization, enhancing P-TR consistency while allowing dynamic fallback to T-TR when needed. Experiments on four TR benchmarks demonstrate that Table-r1 outperforms all SLM-based methods, achieving at least a 15% accuracy improvement over the base model (LLaMA-8B) across all datasets and reaching performance competitive with LLMs.

Summary

  • The paper presents the Table-r1 framework that improves table reasoning in small language models by at least a 15% accuracy gain over baseline performance.
  • It employs a self-supervised Layout Transformation Inference task to enhance model generalization across diverse and complex table layouts.
  • It integrates a tailored mix-paradigm Group Relative Policy Optimization strategy to ensure consistent program-based reasoning and fallback flexibility.

Table-r1: Advancements in Program-Based Table Reasoning for Small LLMs

The paper "Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small LLMs" addresses a critical challenge in the field of natural language processing, specifically the task of table reasoning (TR) for small LLMs (SLMs). Unlike LLMs, such as GPT-4 which excel in this domain due to substantial computational resources, SLMs often struggle with TR due to their limited capacity. This paper highlights a significant gap in performance between SLMs and LLMs and proposes a novel approach, Table-r1, to bridge this gap.

The primary focus of the research is on program-based table reasoning (P-TR), an approach that generates executable programs to handle table reasoning tasks, as opposed to text-based table reasoning (T-TR), which processes tables as plain text. While P-TR can inherently address certain limitations of T-TR, including numerical reasoning, it introduces its own set of challenges when applied to SLMs. These challenges include dealing with the heterogeneity of table layouts and maintaining consistency in code generation due to limited capabilities of SLMs.

The authors propose a groundbreaking method, Table-r1, which incorporates a two-stage process aimed at augmenting the efficacy of SLMs. In Stage 1, a self-supervised learning task named Layout Transformation Inference is introduced. This task enhances the SLM's ability to generalize across diverse table layouts from a programmatic perspective. In Stage 2, a tailored reinforcement learning strategy, specifically a mix-paradigm variant of Group Relative Policy Optimization (GRPO), is utilized. This strategy ensures more consistent P-TR performance while allowing a fallback to T-TR, boosting flexibility based on the context.

Experimental results show notable improvements, with Table-r1 achieving at least a 15% accuracy improvement over a baseline SLM (LLaMA-8B) across four TR benchmarks. Furthermore, Table-r1's performance aligns closely with those of LLMs, underscoring its potential to democratize access to complex TR capabilities for smaller models. The paper acknowledges the limitations of current SLM capacity and proposes future avenues for enhancing generalization across different table formats.

The implications of this research are significant both theoretically and practically. It opens up new possibilities for deploying table reasoning capabilities on devices with limited computational resources, thus expanding the accessibility of advanced NLP techniques. The approach also provides insights into the utilization of reinforcement learning for improving reasoning tasks in LLMs beyond mere fine-tuning.

In conclusion, the paper presents a substantial advancement in the domain of table reasoning, particularly for small LLMs. By combining self-supervised learning with reinforcement learning paradigms, the proposed Table-r1 framework demonstrates that SLMs can effectively be enhanced to perform on par with larger counterparts in specific tasks. Future developments might focus on further refining these techniques and exploring their applicability to other domains within artificial intelligence.