Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
140 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TableMoE: Neuro-Symbolic MoCE for Table Reasoning

Updated 1 July 2025
  • TableMoE is a neuro-symbolic Mixture-of-Connector-Experts architecture designed for robust table reasoning amid noisy, real-world data.
  • It leverages a novel neuro-symbolic routing mechanism that dynamically assigns table elements to specialized experts using confidence-aware gating.
  • Pretraining on a large-scale TableMoE-Align corpus ensures state-of-the-art performance and interpretability across diverse and degraded table formats.

TableMoE is a neuro-symbolic Mixture-of-Connector-Experts (MoCE) architecture, specifically tailored for robust, interpretable, and generalizable reasoning over multimodal table data under real-world “WildStruct” conditions—characterized by visual degradation, symbolic complexity, incomplete layouts, and multilinguality. TableMoE leverages a novel Neuro-Symbolic Routing mechanism to dynamically assign table elements to specialist experts, using confidence-aware, graph-informed gating based on fine-grained token role predictions. The system is pre-trained with large-scale alignment-driven supervision across modalities (HTML, JSON, code) and sets new standards of performance and robustness on challenging table understanding benchmarks.

1. Architectural Principles: Neuro-Symbolic Routing and Mixture-of-Connector-Experts

TableMoE’s architecture is centered on the Mixture-of-Connector-Experts (MoCE) paradigm, wherein table representations pass through a sequence of dedicated, interpretable modules:

  • Vision Encoder: Extracts visual features from table images.
  • Neuro-Symbolic Router: Predicts token-level semantic roles (e.g., header, data, axis, formula) via a classifier, forming a symbolic reasoning graph.
  • Connector Expert Modules: Specialized for distinct table operations:
    • Table-to-HTML: Structural table parsing.
    • Table-to-JSON: Spatial and attribute extraction.
    • Table-to-Code: Generative rendering (e.g., executable Matplotlib code).
    • General Expert: Generic visual feature adaptation.
  • Confidence-Aware Gating: Routing weights are adjusted according to prediction entropy over token roles, modulating expert assignment based on certainty.
  • Backbone LLM: Consumes fused expert outputs and text queries for downstream table reasoning tasks.

Key mathematical expressions governing expert selection and fusion include: r~i=Softmax(frole(xi))RR\tilde{\mathbf{r}}_i = \mathrm{Softmax}(f_{\mathrm{role}}(\mathbf{x}_i)) \in \mathbb{R}^R

αi=1H(r~i)logR\alpha_i = 1 - \frac{H(\tilde{\mathbf{r}}_i)}{\log R}

wi,e=exp(αiai,e)eexp(αiai,e)\mathbf{w}_{i,e} = \frac{\exp(\alpha_i a_{i,e})}{\sum_{e'} \exp(\alpha_i a_{i,e'})}

x^i=e=1Ewi,efe(xi)\hat{\mathbf{x}}_i = \sum_{e=1}^E \mathbf{w}_{i,e} \cdot f_e(\mathbf{x}_i)

where r~i\tilde{\mathbf{r}}_i is the role distribution, H()H(\cdot) is entropy, and fef_e are expert connectors.

2. Robust Reasoning under WildStruct Conditions

The WildStruct context refers to the highly variable, error-prone conditions of tables encountered in natural settings—incorporating blur, skew, watermarking, incomplete or non-rectangular layouts, hierarchical nesting, symbolic (e.g., formulaic/multilingual) density, and cell/structure omissions.

TableMoE addresses these challenges through:

  • Semantic role prediction and symbolic graph construction to recover structural meaning even from deteriorated images.
  • Confidence-aware expert routing, abstaining from brittle or erroneous specialization if token ambiguity is high.
  • Curriculum-guided training (“neuro-symbolic annealing”), which progressively increases structural and visual noise exposure, improving generalization and resilience.

Empirical evidence shows that traditional MLLMs/VLMs degrade or hallucinate in these settings, while TableMoE maintains accuracy and interpretable behavior.

3. Neuro-Symbolic Routing and Interpretability

A central innovation is the use of symbolic reasoning graphs, constructed from token roles, to guide expert routing alongside statistical confidence assessment. The model explicitly routes:

  • Header tokens to the HTML expert.
  • Data/attribute tokens to the JSON expert.
  • Formula tokens to the Code expert.
  • High-entropy/ambiguous tokens to a Generalist expert.

Confidence-aware gating, parametrized by entropy in the role distribution, ensures that unreliable role assignments do not lead to brittle or incoherent expert invocation: αi=1H(r~i)logR\alpha_i = 1 - \frac{H(\tilde{\mathbf{r}}_i)}{\log R} Higher entropy (uncertainty) reduces reliance on specialized experts, favoring more robust representations.

Interpretability is achieved by exposing intermediate artifacts (HTML parses, JSON structures, code snippets). Qualitative analyses in the benchmarks show that TableMoE can provide explicit rationale or abstain under severe uncertainty, a property not observed in prior monolithic models.

4. Pretraining with TableMoE-Align: Multimodal Alignment and Expert Specialization

Effective expertise is imparted via pretraining on the TableMoE-Align corpus, which includes 1.2 million table–HTML–JSON–code quadruples:

  • The dataset is drawn from FinTabNet, PubTabNet, TableBank, and WTW, encompassing finance, science, biomedicine, industry, and bilingual (Chinese/English) samples with real-world noise.
  • Each modality serves as a pretraining target for a specific expert (e.g., Table→HTML alignment for the HTML expert).
  • The corpus embeds WildStruct-specific degradations, enhancing robustness through both alignment and exposure.

Domain transfer results show that models pretrained on this data generalize across benchmarks and table domains.

5. Evaluation on WildStruct Benchmarks and Ablation Analysis

TableMoE is benchmarked against state-of-the-art models (including GPT-4o and Table-LLaVA) on four WildStruct datasets:

  • WMMFinQA: Financial math QA on visually complex tables.
  • WMMTatQA: Noisy, hybrid financial QA.
  • WMMTabDialog: Multilingual, dialog-oriented QA.
  • WMMFinanceMath: Math-heavy QA on noisy images.

TableMoE surpasses previous best-performing models by up to 9.2% and narrows the gap to human expert performance. Domain transfer to public datasets (MMMU-Table, FinanceMath) supports the model’s claims of generality.

Extensive ablation studies demonstrate that each neuro-symbolic component (role predictor, symbolic routing graph, confidence fusion, connector experts) is necessary; omission invariably leads to performance drops, with the absence of the expert modules or neuro-symbolic routing causing the largest degradations.

Interpretability is further documented through case analyses:

  • Stepwise rationales are provided and mapped to table image evidence.
  • The system abstains or highlights ambiguity under severe degradation or noise.

6. Significance and Implications

TableMoE establishes a new state of the art for real-world multimodal table reasoning, especially in situations of structural and visual incompleteness. Core features contributing to this performance include:

  • Modular, interpretable, expert-driven design.
  • Neuro-symbolic routing with uncertainty quantification.
  • Curriculum-aligned pretraining on diverse, real-world data.

This approach demonstrates that modular symbolic reasoning, combined with statistical learning and confidence calibration, yields material gains in both robustness and explainability over monolithic MLLMs. The identification, explicit handling, and interpretable abstention in ambiguous cases suggest broader applicability of these principles in future multimodal and neuro-symbolic AI research.


Aspect TableMoE Implementation
Architecture Neuro-symbolic Mixture-of-Connector-Experts (MoCE)
Novelty Semantic role prediction, symbolic routing, expert gating
Pretraining 1.2M Table–HTML–JSON–Code quadruple corpus
Challenges Addressed Visual, structural, symbolic WildStruct degradation
Evaluation SOTA on WMMFinQA, WMMTatQA, WMMTabDialog, WMMFinanceMath
Core Benefits Robustness, interpretability, out-of-domain generalization

TableMoE’s source code and evaluation datasets are available at https://github.com/ai-agi/TableMoE, with WildStruct datasets hosted on HuggingFace at https://huggingface.co/datasets/darkme-ai/.