Tabby: Language Models Learn to Synthesize Tabular Data
This presentation explores Tabby, a novel architecture that adapts transformer-based language models for high-fidelity tabular data synthesis. The talk examines how Mixture-of-Experts modifications to the language modeling head enable models to handle the unique challenges of tabular data—heterogeneous types, complex column dependencies, and spurious ordering correlations—while remaining computationally efficient. We'll see how Tabby outperforms existing methods and extends beyond simple tables to nested structured data.Script
Language models excel at text, but tabular data—with its mixed types, complex column relationships, and arbitrary ordering—has remained stubbornly difficult to synthesize. The authors of Tabby propose a surprisingly elegant solution: what if we simply taught the model to pay specialized attention to each column?
Traditional language models struggle with tables because every column tells a different statistical story. A column of ages follows one distribution, a column of prices another, and column ordering—which means nothing semantically—can create false patterns the model learns to exploit.
The key insight is to let the model specialize.
Tabby modifies the language model head with Mixture-of-Experts layers, assigning each column its own specialized processing block. This architectural tweak, combined with a straightforward training method that simply formats tables as sequences, lets the model learn column-specific distributions without the preprocessing gymnastics that plague other approaches.
The results are striking. When evaluated on Machine Learning Efficacy—how well a downstream classifier trained on synthetic data performs on real test data—Tabby with the Multi-Head modification consistently outperforms all prior methods. The purple line at the top represents Tabby, dominating across the entire performance spectrum. Even more remarkable, smaller Tabby models outperform larger non-Tabby architectures, suggesting the modification fundamentally improves learning efficiency rather than simply adding parameters.
The architecture proves both efficient and flexible. Parameter for parameter, Tabby models synthesize better data than their non-specialized counterparts. And the same Mixture-of-Experts principle extends naturally beyond flat tables: recursive application of the MoE layers enables synthesis of nested structures like JSON, opening pathways to broader structured data generation.
Tabby demonstrates that adapting language models to tabular data doesn't require abandoning their core strengths—it requires teaching them when to pay attention differently. To explore more cutting-edge research and create your own presentation videos, visit EmergentMind.com.