Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

140 tokens/sec

GPT-4o

7 tokens/sec

Gemini 2.5 Pro Pro

46 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

TableMoE-Align Dataset

Updated 1 July 2025

TableMoE-Align is a large-scale multimodal dataset that aligns table images, HTML, JSON, and code representations to support robust pretraining for structured table reasoning.
It enables neuro-symbolic Mixture-of-Experts pretraining by enforcing explicit modality alignments and symbolic supervision to handle degraded, real-world table conditions.
The dataset’s scale, fine-grained annotations, and diverse sources drive state-of-the-art performance on WildStruct benchmarks in finance, science, and industry.

TableMoE-Align is a large-scale multimodal dataset designed to facilitate neuro-symbolic mixture-of-experts (MoE) pretraining for robust, structured reasoning over real-world tables exhibiting diverse layouts, semantics, and visual degradations. Developed specifically for the TableMoE architecture, TableMoE-Align is engineered to support expert specialization via alignment across table, HTML, JSON, and code modalities. Its breadth, fine-grained annotation, and explicit modality-level alignments address critical prerequisites for enabling resilient table understanding under WildStruct conditions such as blur, skew, symbolic density, incomplete structure, and cross-lingual content.

1. Dataset Structure and Composition

TableMoE-Align consists of 1.2 million quadruples of the form:

Component	Description
Table	Raw table (image or structured)
HTML	Human- and machine-readable HTML representation of the table
JSON	Spatial triple mapping (row ID, column ID, value) and attributes
Code	Executable Python code (e.g., matplotlib) reconstructing the table

The dataset is systematically curated to ensure that each quadruple provides direct and modality-aligned mappings between the different representations. This high-fidelity, multi-format alignment enables the induction of semantic and symbolic reasoning paths during pretraining.

Source Domains

TableMoE-Align incorporates samples from:

Finance (e.g., FinTabNet): accounting, reports, audit tables
Science/Biomedicine (e.g., PubTabNet): clinical trial, publication, research tables
Industry (e.g., TableBank): process, supply chain, administrative forms
Real-world/Noisy Settings (e.g., WTW): tables with blur, watermark, multilingual elements, and degraded visual structure

Diversity is further enhanced by balanced sampling: approximately 600k HTML, 400k JSON, and 200k code examples, covering a broad spectrum of structural and semantic complexity, as well as linguistic variety (notably English and Chinese).

2. Purpose and Role in TableMoE Pretraining

TableMoE-Align serves as the exclusive upstream pretraining corpus for the TableMoE Mixture-of-Connector-Experts module, which includes three dedicated expert branches:

HTML expert: Specializes in parsing and reasoning over table layout and semantic structure.
JSON expert: Encodes tokens, spatial/attribute triples, and structural relations, supporting grid/node-level reasoning.
Code expert: Abstracts symbolic and executable patterns (e.g., formulae, programmatic layouts) that bridge from table structure to computational logic.

No samples from TableMoE-Align are used for evaluation, ensuring a strict separation between pretraining and downstream testing.

Alignment-driven Expert Initialization

Each expert is pretrained on its respective alignment task (e.g., Table-to-HTML, Table-to-JSON, Table-to-Code), yielding modality-specific priors that inform subsequent integrated fine-tuning. This approach enables differentiable specialization before joint neuro-symbolic routing.

3. Alignment and Curriculum Mechanism

The quadruple structure imposes explicit alignment constraints, such that each representation in a sample refers to the same underlying table instance. Pretraining proceeds as follows:

Expert pretraining on alignment tasks: Each expert is supervised to map between the raw table and its modality-specific target.
Joint representation learning: Experts are incorporated into the MoCE layer, and symbolic supervision (token role, structure graphs) is overlaid during fine-tuning via neuro-symbolic annealing.

Formally, the annealed objective integrates alignment and symbolic loss terms:

$\mathcal{L}_{\text{NSA}}(t) = (1-\lambda(t)) \cdot \mathcal{L}_{\text{task}} + \lambda(t)\big[\lambda_1 \mathcal{L}_{\text{role}} + \lambda_2 \mathcal{L}_{\text{struct}}\big]$

where $\mathcal{L}_{\text{task}}$ is the main alignment loss from TableMoE-Align, and $\lambda(t)$ is a schedule controlling the annealing from purely neural to symbolic objectives.

4. Token Role Prediction and Neuro-Symbolic Routing

While TableMoE-Align quadruples do not themselves provide explicit token role labels (such as HEADER, DATA, AXIS, FORMULA), the accuracy and diversity of the alignments across modalities enable the model to infer semantic token roles during supervised pretraining. These roles underlie TableMoE's neuro-symbolic routing mechanism:

For each token, a role distribution $\mathbf{r}_i$ is predicted.
A confidence coefficient $\alpha_i$ is computed: $\alpha_i = 1 - \frac{H(\widetilde{\mathbf{r}}_i)}{\log R}$ , where $H$ denotes entropy and $R$ is the number of roles.
This coefficient modulates gating among modality-specific experts, ensuring that structural and symbolic cues (enabled by TableMoE-Align) guide routing.

The integration of symbolic reasoning graphs is made possible by the presence of code and JSON/HTML alignments, allowing for explicit, interpretable structure-aware expert assignment.

5. Performance Impact and Empirical Results on WildStruct Benchmarks

Pretraining with TableMoE-Align is empirically shown to be critical for TableMoE’s robust performance on WildStruct conditions. This is evidenced by extensive ablation experiments:

Removing code expert pretraining: $-6.97$ points on exact match accuracy
Omitting symbolic role/graph supervision: $-5.37$ points loss
Substituting with neural-only objectives: sharp degradation especially in settings with table defects (e.g., missing headers, noisy cells)

TableMoE, pretrained on TableMoE-Align, achieves state-of-the-art results on all four WildStruct benchmarks:

WMMFinQA: Financial QA under severe layout and symbolic noise
WMMTatQA: Multiturn tabular QA with multi-source degradation
WMMTabDialog: Multilingual, deeply nested, visually noisy dialogs
WMMFinanceMath: Reasoning-intensive, highly corrupted math QA

In each, TableMoE outperforms competing vision-LLMs (including GPT-4o), with up to +9.2% absolute accuracy improvement.

6. Significance and Research Implications

TableMoE-Align exemplifies a new standard for alignment pretraining in multimodal table reasoning:

Scale and Diversity: Enables broad generalization, resilience to unseen structure and noise, and cross-domain transfer.
Symbolic Integration: The inclusion of code and structured formats facilitates explicit reasoning and interpretable token-role assignment.
Foundation for Neuro-Symbolic Architectures: By supplying clean, granular alignments across multiple modalities, TableMoE-Align supports the emergence of neuro-symbolic routing—enabling modular, interpretable, and robust table understanding under degraded, real-world conditions.

TableMoE-Align thus forms the empirical and architectural bedrock for current and future neuro-symbolic approaches in multimodal table machine learning.

PDF Markdown Chat (Upgrade)