Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
140 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TableMoE-Align Dataset

Updated 1 July 2025
  • TableMoE-Align is a large-scale multimodal dataset that aligns table images, HTML, JSON, and code representations to support robust pretraining for structured table reasoning.
  • It enables neuro-symbolic Mixture-of-Experts pretraining by enforcing explicit modality alignments and symbolic supervision to handle degraded, real-world table conditions.
  • The dataset’s scale, fine-grained annotations, and diverse sources drive state-of-the-art performance on WildStruct benchmarks in finance, science, and industry.

TableMoE-Align is a large-scale multimodal dataset designed to facilitate neuro-symbolic mixture-of-experts (MoE) pretraining for robust, structured reasoning over real-world tables exhibiting diverse layouts, semantics, and visual degradations. Developed specifically for the TableMoE architecture, TableMoE-Align is engineered to support expert specialization via alignment across table, HTML, JSON, and code modalities. Its breadth, fine-grained annotation, and explicit modality-level alignments address critical prerequisites for enabling resilient table understanding under WildStruct conditions such as blur, skew, symbolic density, incomplete structure, and cross-lingual content.

1. Dataset Structure and Composition

TableMoE-Align consists of 1.2 million quadruples of the form:

Component Description
Table Raw table (image or structured)
HTML Human- and machine-readable HTML representation of the table
JSON Spatial triple mapping (row ID, column ID, value) and attributes
Code Executable Python code (e.g., matplotlib) reconstructing the table

The dataset is systematically curated to ensure that each quadruple provides direct and modality-aligned mappings between the different representations. This high-fidelity, multi-format alignment enables the induction of semantic and symbolic reasoning paths during pretraining.

Source Domains

TableMoE-Align incorporates samples from:

  • Finance (e.g., FinTabNet): accounting, reports, audit tables
  • Science/Biomedicine (e.g., PubTabNet): clinical trial, publication, research tables
  • Industry (e.g., TableBank): process, supply chain, administrative forms
  • Real-world/Noisy Settings (e.g., WTW): tables with blur, watermark, multilingual elements, and degraded visual structure

Diversity is further enhanced by balanced sampling: approximately 600k HTML, 400k JSON, and 200k code examples, covering a broad spectrum of structural and semantic complexity, as well as linguistic variety (notably English and Chinese).

2. Purpose and Role in TableMoE Pretraining

TableMoE-Align serves as the exclusive upstream pretraining corpus for the TableMoE Mixture-of-Connector-Experts module, which includes three dedicated expert branches:

  • HTML expert: Specializes in parsing and reasoning over table layout and semantic structure.
  • JSON expert: Encodes tokens, spatial/attribute triples, and structural relations, supporting grid/node-level reasoning.
  • Code expert: Abstracts symbolic and executable patterns (e.g., formulae, programmatic layouts) that bridge from table structure to computational logic.

No samples from TableMoE-Align are used for evaluation, ensuring a strict separation between pretraining and downstream testing.

Alignment-driven Expert Initialization

Each expert is pretrained on its respective alignment task (e.g., Table-to-HTML, Table-to-JSON, Table-to-Code), yielding modality-specific priors that inform subsequent integrated fine-tuning. This approach enables differentiable specialization before joint neuro-symbolic routing.

3. Alignment and Curriculum Mechanism

The quadruple structure imposes explicit alignment constraints, such that each representation in a sample refers to the same underlying table instance. Pretraining proceeds as follows:

  1. Expert pretraining on alignment tasks: Each expert is supervised to map between the raw table and its modality-specific target.
  2. Joint representation learning: Experts are incorporated into the MoCE layer, and symbolic supervision (token role, structure graphs) is overlaid during fine-tuning via neuro-symbolic annealing.

Formally, the annealed objective integrates alignment and symbolic loss terms:

LNSA(t)=(1λ(t))Ltask+λ(t)[λ1Lrole+λ2Lstruct]\mathcal{L}_{\text{NSA}}(t) = (1-\lambda(t)) \cdot \mathcal{L}_{\text{task}} + \lambda(t)\big[\lambda_1 \mathcal{L}_{\text{role}} + \lambda_2 \mathcal{L}_{\text{struct}}\big]

where Ltask\mathcal{L}_{\text{task}} is the main alignment loss from TableMoE-Align, and λ(t)\lambda(t) is a schedule controlling the annealing from purely neural to symbolic objectives.

4. Token Role Prediction and Neuro-Symbolic Routing

While TableMoE-Align quadruples do not themselves provide explicit token role labels (such as HEADER, DATA, AXIS, FORMULA), the accuracy and diversity of the alignments across modalities enable the model to infer semantic token roles during supervised pretraining. These roles underlie TableMoE's neuro-symbolic routing mechanism:

  • For each token, a role distribution ri\mathbf{r}_i is predicted.
  • A confidence coefficient αi\alpha_i is computed: αi=1H(r~i)logR\alpha_i = 1 - \frac{H(\widetilde{\mathbf{r}}_i)}{\log R}, where HH denotes entropy and RR is the number of roles.
  • This coefficient modulates gating among modality-specific experts, ensuring that structural and symbolic cues (enabled by TableMoE-Align) guide routing.

The integration of symbolic reasoning graphs is made possible by the presence of code and JSON/HTML alignments, allowing for explicit, interpretable structure-aware expert assignment.

5. Performance Impact and Empirical Results on WildStruct Benchmarks

Pretraining with TableMoE-Align is empirically shown to be critical for TableMoE’s robust performance on WildStruct conditions. This is evidenced by extensive ablation experiments:

  • Removing code expert pretraining: 6.97-6.97 points on exact match accuracy
  • Omitting symbolic role/graph supervision: 5.37-5.37 points loss
  • Substituting with neural-only objectives: sharp degradation especially in settings with table defects (e.g., missing headers, noisy cells)

TableMoE, pretrained on TableMoE-Align, achieves state-of-the-art results on all four WildStruct benchmarks:

  • WMMFinQA: Financial QA under severe layout and symbolic noise
  • WMMTatQA: Multiturn tabular QA with multi-source degradation
  • WMMTabDialog: Multilingual, deeply nested, visually noisy dialogs
  • WMMFinanceMath: Reasoning-intensive, highly corrupted math QA

In each, TableMoE outperforms competing vision-LLMs (including GPT-4o), with up to +9.2% absolute accuracy improvement.

6. Significance and Research Implications

TableMoE-Align exemplifies a new standard for alignment pretraining in multimodal table reasoning:

  • Scale and Diversity: Enables broad generalization, resilience to unseen structure and noise, and cross-domain transfer.
  • Symbolic Integration: The inclusion of code and structured formats facilitates explicit reasoning and interpretable token-role assignment.
  • Foundation for Neuro-Symbolic Architectures: By supplying clean, granular alignments across multiple modalities, TableMoE-Align supports the emergence of neuro-symbolic routing—enabling modular, interpretable, and robust table understanding under degraded, real-world conditions.

TableMoE-Align thus forms the empirical and architectural bedrock for current and future neuro-symbolic approaches in multimodal table machine learning.