Tabula-8B: Dual ML and Secure Inference
- The TabuLa-8B language model adapts a transformer architecture to tabular data by serializing spreadsheet inputs with novel tokens and row-causal masking, achieving state-of-the-art zero- and few-shot prediction.
- The Tabula-8B secure inference protocol uses precomputed 8-bit quantized lookup tables to drastically cut communication (2 bytes per activation) and speed up evaluation compared to garbled circuits.
- Both systems pioneer advancements in their domains—enhancing transfer learning for heterogeneous tabular data and optimizing secure multiparty computation for neural networks, thereby enabling scalable, privacy-preserving AI applications.
Tabula-8B denotes two distinct state-of-the-art systems in machine learning research, each addressing advanced challenges in their respective domains: (1) a LLM for tabular prediction via transfer learning on heterogeneous, spreadsheet-style datasets, and (2) a secure, multiparty computation protocol for fast non-linear activation evaluation in neural network inference with 8-bit quantization. The following sections detail both concepts as explored in (Gardner et al., 2024) and (Lam et al., 2022), with clear delineation between the two usages.
1. TabuLa-8B: LLM for Tabular Prediction
TabuLa-8B, termed “Tabular Llama 3-8B”, is a decoder-only transformer model based on Llama 3 8B, specialized for tabular data prediction tasks. It seeks to transfer the pretraining-and-adaptation paradigm of modern foundation models (as seen in NLP and vision) to the domain of tabular, heterogeneous data, where domain-specific benchmarks and methods (e.g., XGBoost, TabPFN) have historically dominated.
Model Architecture and Adaptations
- Base Model: TabuLa-8B uses Meta’s open-source Llama 3 8B model, preserving its architecture: 8B parameters, rotary position embeddings, 32 attention heads, and 4096-dimensional MLPs.
- Input Serialization: Each data row is serialized into a natural language prompt, incorporating column headers and values. Three new special tokens—
||(label delimiter),<|endinput|>, and<|endcompletion|>—are introduced. - Row-Causal Tabular Masking (RCTM): The attention mask allows each token to attend to tokens within the current and preceding rows of the same table, but blocks attention between different tables, yielding a block-triangular structure to encourage in-context learning (few-shot) while preventing cross-table leakage.
- Fine-Tuning Objective: Full-parameter fine-tuning is conducted using cross-entropy loss, considering only the target tokens (post-
<|endinput|>). This covers both classification and binned regression (with real-valued targets discretized into quantiles).
Dataset Extraction and Preprocessing
- T4 Construction: From TabLib’s 627 million raw HTML/CSV tables, filtering and quality protocols produce 3.1 million high-quality tables (≈1.6B rows, ≈80B tokens).
- Filtering Steps: Multi-stage filters exclude non-English tables, insufficient schema heterogeneity, parsing errors, PII, code-like cells, and columns/rows with excessive missing or constant data, PII, or code markers.
- Unsupervised Target Selection: For each filtered table, columns suitable for target prediction are selected via header and value criteria. Targets are categorical () or continuous (discretized, ), with features separated from targets as in classical supervised tabular learning.
Training Procedure
- Hyperparameters: Training involves 40,000 steps, a batch size of 24 sequence packs (totalling ≈600 rows/step), and a learning rate with 10% linear warmup followed by cosine decay (peak ), no weight decay.
- Data Regimen: Each update samples from ≈8B tokens (10% of T4). No regularization or augmentation beyond the RCTM packing is used.
- Objective: Monitoring is performed on downstream validation loss and sample few-shot metrics, with no explicit task-specific fine-tuning.
Evaluation and Comparative Results
TabuLa-8B is tested on 329 tabular datasets from established benchmarks (UniPredict, Grinsztajn, AutoML Multimodal, OpenML-CC18, OpenML-CTR23), with open-vocabulary exact match as the metric. Results are summarized below.
| Shots | Random | Llama3 Base | TabuLa-8B | XGBoost | TabPFN |
|---|---|---|---|---|---|
| 0 | 10–33% | 20% | 35% | — | — |
| 1 | — | 40% | 55% | 50% | 50% |
| 8 | — | 60% | 75% | 60% | 60% |
| 32 | — | 70% | 85% | 70% | 70% |
- Zero-Shot: +17pp over random (≈35%), +50pp over base Llama 3-8B (which is unadapted for tabular input). Legacy models (XGBoost, TabPFN) are not competitive in this setting.
- Few-Shot: In -shot (), TabuLa-8B is 5–15pp more accurate than XGBoost and TabPFN trained and tuned on equal or greater data. The sample efficiency (e.g., achieving 60% accuracy) improves by up to 16 relative to supervised baselines (Gardner et al., 2024).
Use Cases, Limitations, and Future Directions
- Applications: Zero- or few-shot tabular prediction, especially where column semantics are meaningful or data is expensive to label; mixed-type features and binned regression.
- Limitations: Fixed 8192-token context, large model size (8B parameters), reliance on informative headers, residual risk of PII, no formal privacy guarantees, inherited bias from web data.
- Prospective Work: Scaling to larger models/data, improved privacy/safety, enhanced inference via retrieval or prompt ensembling, and broadening to data generation, schema mapping, and interpretability tasks.
2. Tabula-8B in Secure Inference: Lookup-Based MPC Protocol
Tabula-8B also denotes a variant of the Tabula protocol for secure evaluation of nonlinear activation functions (e.g., ReLU, sigmoid) in multiparty computation (MPC) settings, using 8-bit quantization ( = 8). This approach replaces communication-heavy garbled circuits with precomputed lookup tables, yielding substantial gains in efficiency for secure neural network inference (Lam et al., 2022).
Quantization and Preprocessing
- Uniform Symmetric Quantizer: . Given , step size
quantized value
- Lookup Table Construction: For each activation function (e.g., ReLU), build
over a prime field , . Each table of 256 entries (8 bits each) is secret-shared between the client and server, along with a random additive mask .
- Error Bounds: For Lipschitz-1 activations,
Online Protocol Execution
Tabula-8B achieves a single communication round per activation, with both parties sending one 8-bit share (2 bytes total):
| Step | Party 0 (Client) | Party 1 (Server) |
|---|---|---|
| Secure truncation | Local truncation | Local truncation |
| Masked index reveal | Send 1 byte | Send 1 byte |
| Local table lookup | ||
| Output (rescaling) |
- Communication per activation: 2 bytes.
- No need for oblivious transfers or garbled gates during online inference.
Storage and Communication Efficiency
- Per Activation Table Size: per party.
- Online Communication: (Tabula-8B) vs. (8-bit garbled circuits)—an approximate 280 reduction.
- Runtime: s (Tabula-8B) vs. s (GC)—a 140 speedup.
End-to-End Inference Benchmarks
Evaluations on ResNet-32 and VGG-16 (all 8-bit quantized):
| Network | # ReLUs | GC-8bit Comm | Tabula-8B Comm | Reduction | GC-8bit Time | Tab-8 Time | Speedup |
|---|---|---|---|---|---|---|---|
| ResNet-32 | 303 K | 77 MB | 0.58 MB | 132× | 30.6 s | 0.6 s | 51× |
| VGG-16 | 284 K | 72 MB | 0.54 MB | 133× | 19.9 s | 0.4 s | 50× |
Preprocessing storage drops from ≈1 GB (GC) to ≈7 MB (Tabula-8B) per network.
Discussion of Trade-Offs
- Scalability: Storage grows exponentially in (bits of quantization); for , 1 KB per activation is required, constraining use to .
- Quantization Error: For standard activations and 8-bit quantization, error is , negligible in large networks (<2% accuracy loss).
- Practical Deployment: Minuscule per-activation tables are suitable for typical client devices, and low latency enables real-time encrypted inference applications (e.g., video).
3. Comparative Context and Position in the Literature
TabuLa-8B (the LLM variant) stands in contrast to classical supervised methods such as XGBoost and TabPFN, which require explicit training and hyperparameter tuning on each new task, and cannot perform zero-shot prediction. By leveraging transfer learning and context-dependent masked attention, TabuLa-8B readily generalizes to new tables and tasks without further fine-tuning, with substantial improvements in few-shot sample efficiency (Gardner et al., 2024).
Tabula-8B (secure inference) distinguishes itself from garbled circuit protocols by reducing per-activation communication, runtime, and preprocessing storage requirements by orders of magnitude, using precomputed quantized lookup tables and share-masked indices (Lam et al., 2022).
4. Applications and Limitations
LLM Variant
- Primary Use Cases: Rapid zero- and few-shot prediction on new tabular datasets, mixed-type data handling, and semantic-rich schemas where labeled data is scarce.
- Limitations: Context window constraints, hardware demands, sensitivity to uninformative headers, and risks associated with residual data privacy/bias.
Secure MPC Variant
- Primary Use Cases: Secure neural network inference where activation privacy is paramount, especially in edge deployments or bandwidth-constrained mobile contexts.
- Limitations: Exponential lookup table growth for high-precision quantization, dependence on secure truncation protocols, and field size/modulo constraints for large activations.
5. Prospects and Research Directions
For TabuLa-8B (LLM):
- Scaling: Extending to larger architectures (Llama 3 70B), increased corpus sizes, and longer context windows.
- Safety: Enhanced PII filtering, bias monitoring.
- Model Extensions: Data wrangling, schema translation, interpretability, retrieval-augmentation, and prompt ensembling as avenues for empowerment and robustness.
For Tabula-8B (secure inference):
- Parameter Optimization: Balancing quantization fidelity and storage requirements.
- Protocol Extension: Exploring more expressive nonlinearities or batched evaluation strategies.
Both systems represent paradigm shifts in their respective domains: one in transfer learning for tabular data; the other in secure, efficient private inference. Each line of research opens avenues for further exploration as the demands of scalable, safe, and adaptive AI expand across modalities and applications (Gardner et al., 2024, Lam et al., 2022).