Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

140 tokens/sec

GPT-4o

7 tokens/sec

Gemini 2.5 Pro Pro

46 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

TableMoE: Neuro-Symbolic MoCE for Table Reasoning

Updated 1 July 2025

TableMoE is a neuro-symbolic Mixture-of-Connector-Experts architecture designed for robust table reasoning amid noisy, real-world data.
It leverages a novel neuro-symbolic routing mechanism that dynamically assigns table elements to specialized experts using confidence-aware gating.
Pretraining on a large-scale TableMoE-Align corpus ensures state-of-the-art performance and interpretability across diverse and degraded table formats.

TableMoE is a neuro-symbolic Mixture-of-Connector-Experts (MoCE) architecture, specifically tailored for robust, interpretable, and generalizable reasoning over multimodal table data under real-world “WildStruct” conditions—characterized by visual degradation, symbolic complexity, incomplete layouts, and multilinguality. TableMoE leverages a novel Neuro-Symbolic Routing mechanism to dynamically assign table elements to specialist experts, using confidence-aware, graph-informed gating based on fine-grained token role predictions. The system is pre-trained with large-scale alignment-driven supervision across modalities (HTML, JSON, code) and sets new standards of performance and robustness on challenging table understanding benchmarks.

1. Architectural Principles: Neuro-Symbolic Routing and Mixture-of-Connector-Experts

TableMoE’s architecture is centered on the Mixture-of-Connector-Experts (MoCE) paradigm, wherein table representations pass through a sequence of dedicated, interpretable modules:

Vision Encoder: Extracts visual features from table images.
Neuro-Symbolic Router: Predicts token-level semantic roles (e.g., header, data, axis, formula) via a classifier, forming a symbolic reasoning graph.
Connector Expert Modules: Specialized for distinct table operations:
- Table-to-HTML: Structural table parsing.
- Table-to-JSON: Spatial and attribute extraction.
- Table-to-Code: Generative rendering (e.g., executable Matplotlib code).
- General Expert: Generic visual feature adaptation.
Confidence-Aware Gating: Routing weights are adjusted according to prediction entropy over token roles, modulating expert assignment based on certainty.
Backbone LLM: Consumes fused expert outputs and text queries for downstream table reasoning tasks.

Key mathematical expressions governing expert selection and fusion include: $\tilde{\mathbf{r}}_i = \mathrm{Softmax}(f_{\mathrm{role}}(\mathbf{x}_i)) \in \mathbb{R}^R$

$\alpha_i = 1 - \frac{H(\tilde{\mathbf{r}}_i)}{\log R}$

$\mathbf{w}_{i,e} = \frac{\exp(\alpha_i a_{i,e})}{\sum_{e'} \exp(\alpha_i a_{i,e'})}$

$\hat{\mathbf{x}}_i = \sum_{e=1}^E \mathbf{w}_{i,e} \cdot f_e(\mathbf{x}_i)$

where $\tilde{\mathbf{r}}_i$ is the role distribution, $H(\cdot)$ is entropy, and $f_e$ are expert connectors.

2. Robust Reasoning under WildStruct Conditions

The WildStruct context refers to the highly variable, error-prone conditions of tables encountered in natural settings—incorporating blur, skew, watermarking, incomplete or non-rectangular layouts, hierarchical nesting, symbolic (e.g., formulaic/multilingual) density, and cell/structure omissions.

TableMoE addresses these challenges through:

Semantic role prediction and symbolic graph construction to recover structural meaning even from deteriorated images.
Confidence-aware expert routing, abstaining from brittle or erroneous specialization if token ambiguity is high.
Curriculum-guided training (“neuro-symbolic annealing”), which progressively increases structural and visual noise exposure, improving generalization and resilience.

Empirical evidence shows that traditional MLLMs/VLMs degrade or hallucinate in these settings, while TableMoE maintains accuracy and interpretable behavior.

3. Neuro-Symbolic Routing and Interpretability

A central innovation is the use of symbolic reasoning graphs, constructed from token roles, to guide expert routing alongside statistical confidence assessment. The model explicitly routes:

Header tokens to the HTML expert.
Data/attribute tokens to the JSON expert.
Formula tokens to the Code expert.
High-entropy/ambiguous tokens to a Generalist expert.

Confidence-aware gating, parametrized by entropy in the role distribution, ensures that unreliable role assignments do not lead to brittle or incoherent expert invocation: $\alpha_i = 1 - \frac{H(\tilde{\mathbf{r}}_i)}{\log R}$ Higher entropy (uncertainty) reduces reliance on specialized experts, favoring more robust representations.

Interpretability is achieved by exposing intermediate artifacts (HTML parses, JSON structures, code snippets). Qualitative analyses in the benchmarks show that TableMoE can provide explicit rationale or abstain under severe uncertainty, a property not observed in prior monolithic models.

4. Pretraining with TableMoE-Align: Multimodal Alignment and Expert Specialization

Effective expertise is imparted via pretraining on the TableMoE-Align corpus, which includes 1.2 million table–HTML–JSON–code quadruples:

The dataset is drawn from FinTabNet, PubTabNet, TableBank, and WTW, encompassing finance, science, biomedicine, industry, and bilingual (Chinese/English) samples with real-world noise.
Each modality serves as a pretraining target for a specific expert (e.g., Table→HTML alignment for the HTML expert).
The corpus embeds WildStruct-specific degradations, enhancing robustness through both alignment and exposure.

Domain transfer results show that models pretrained on this data generalize across benchmarks and table domains.

5. Evaluation on WildStruct Benchmarks and Ablation Analysis

TableMoE is benchmarked against state-of-the-art models (including GPT-4o and Table-LLaVA) on four WildStruct datasets:

WMMFinQA: Financial math QA on visually complex tables.
WMMTatQA: Noisy, hybrid financial QA.
WMMTabDialog: Multilingual, dialog-oriented QA.
WMMFinanceMath: Math-heavy QA on noisy images.

TableMoE surpasses previous best-performing models by up to 9.2% and narrows the gap to human expert performance. Domain transfer to public datasets (MMMU-Table, FinanceMath) supports the model’s claims of generality.

Extensive ablation studies demonstrate that each neuro-symbolic component (role predictor, symbolic routing graph, confidence fusion, connector experts) is necessary; omission invariably leads to performance drops, with the absence of the expert modules or neuro-symbolic routing causing the largest degradations.

Interpretability is further documented through case analyses:

Stepwise rationales are provided and mapped to table image evidence.
The system abstains or highlights ambiguity under severe degradation or noise.

6. Significance and Implications

TableMoE establishes a new state of the art for real-world multimodal table reasoning, especially in situations of structural and visual incompleteness. Core features contributing to this performance include:

Modular, interpretable, expert-driven design.
Neuro-symbolic routing with uncertainty quantification.
Curriculum-aligned pretraining on diverse, real-world data.

This approach demonstrates that modular symbolic reasoning, combined with statistical learning and confidence calibration, yields material gains in both robustness and explainability over monolithic MLLMs. The identification, explicit handling, and interpretable abstention in ambiguous cases suggest broader applicability of these principles in future multimodal and neuro-symbolic AI research.

Aspect	TableMoE Implementation
Architecture	Neuro-symbolic Mixture-of-Connector-Experts (MoCE)
Novelty	Semantic role prediction, symbolic routing, expert gating
Pretraining	1.2M Table–HTML–JSON–Code quadruple corpus
Challenges Addressed	Visual, structural, symbolic WildStruct degradation
Evaluation	SOTA on WMMFinQA, WMMTatQA, WMMTabDialog, WMMFinanceMath
Core Benefits	Robustness, interpretability, out-of-domain generalization

TableMoE’s source code and evaluation datasets are available at https://github.com/ai-agi/TableMoE, with WildStruct datasets hosted on HuggingFace at https://huggingface.co/datasets/darkme-ai/.

PDF Markdown Chat (Upgrade)