Identify the mechanism by which instruction-tuning without tabular exposure improves tabular classification performance

Ascertain the precise mechanism by which instruction-tuning of Large Language Models (e.g., Llama-3-8B fine-tuned on the Alpaca instruction-following dataset without any tabular data) yields substantial improvements on tabular classification tasks, clarifying whether and how general instruction-following capabilities translate into effective handling of serialized table inputs.

Background

The study finds that instruction-tuning alone, without any tabular pretraining, accounts for the majority of Tabula-8B’s improvement over the base model on classification tasks. While the empirical decomposition shows strong gains attributable to instruction-following, the causal mechanism behind this transfer remains unspecified.

Understanding this mechanism is important for interpreting reported TLM capabilities and for designing robust baselines that disentangle format adherence from genuine tabular reasoning.

References

While the precise mechanism remains unclear, we hypothesize that instruction-tuning equips the model with general capabilities for comprehending task descriptions and following input-output mappings—skills that may prove sufficient for many tabular classification tasks without requiring explicit tabular exposure.

The Illusion of Generalization: Re-examining Tabular Language Model Evaluation  (2602.04031 - Gorla et al., 3 Feb 2026) in Section: Instruction-Following, Not Tabular Knowledge, Drives Performance (Classification: Instruction-Tuning Dominates)