Papers
Topics
Authors
Recent
Search
2000 character limit reached

TagOp: Hybrid Table Aggregation for Financial QA

Updated 13 December 2025
  • TagOp Model is a hybrid table aggregation operator that processes both tabular and textual data for financial QA.
  • It employs sequence tagging to extract evidential spans and operator-based symbolic computation for precise numerical reasoning.
  • State-of-the-art TAT-QA performance is achieved despite challenges in accurate IO tagging of heterogeneous inputs.

TagOp (Table Aggregation Operator) is a QA model designed for numerical and semantic reasoning over hybrid input consisting of tabular and textual content, introduced as part of the TAT-QA benchmark for financial report question answering. Its architecture supports the extraction and reasoning over heterogeneous evidential spans using sequence tagging and operator-based symbolic computation, achieving state-of-the-art results on the TAT-QA dataset, while revealing persisting challenges in precise evidence localization and hybrid data reasoning (Zhu et al., 2021).

1. Input Representation and Shared Encoder

TagOp processes concatenated hybrid data, where the input is tokenized in the form: [CLS] q1 q2 ... qn [tb1] tb2 ... tbm [p1] p2 ... pk[\mathrm{CLS}]~q_1~q_2~...~q_n~[\mathrm{tb}_1]~\mathrm{tb}_2~...~\mathrm{tb}_m~[\mathrm{p}_1]~\mathrm{p}_2~...~\mathrm{p}_k, with qiq_i as question tokens, tbj\mathrm{tb}_j as flattened table tokens, and plp_l as paragraph tokens. The concatenated token sequence is fed to a 12-layer Transformer encoder (RoBERTa or Tapas), producing contextualized vectors h1...hNh_1...h_N shared by all subsequent submodules. This unified encoding enables joint reasoning across tables and text, capturing dependencies inherent in financial QA tasks.

2. Evidence Extraction via Sequence Tagging

TagOp frames the identification of relevant evidences as token-level IO (inside/outside) tagging over all sub-tokens, spanning the entire question, table, and text. Each sub-token tt receives a label t{I,O}\ell_t \in \{I, O\}. Any sub-token within a cell labeled II renders the entire cell “selected”, and consecutive IIs in paragraphs are merged as a single evidential span.

For each sub-token tt with contextual embedding hth_t, the tagging probability is computed as:

pttag=softmax(FFN(ht)),p^\mathrm{tag}_t = \mathrm{softmax}(\mathrm{FFN}(h_t)),

where FFN is a two-layer feed-forward network with GELU non-linearity. Ground-truth tags GtagG^\mathrm{tag} are assigned by matching annotated answer content in table-first, then text, order.

3. Operator-Based Symbolic Reasoning

TagOp incorporates a set of ten aggregation operators (with an “Other” catch-all) that apply symbolic reasoning over extracted evidences E={e1,...,er}E = \{e_1,...,e_r\}. Formal operator definitions are as follows:

Operator Formal Semantics Selection Logic
Span-in-text argmaxsEptag(s)\arg\max_{s \in E} p^\mathrm{tag}(s) Max-scoring text span
Cell-in-table argmaxcEptag(c)\arg\max_{c \in E} p^\mathrm{tag}(c) Max-scoring table cell
Spans Concatenate all selected spans/cells All evidences
Sum \sum of val(ei)val(e_i) for numeric eie_i All numerics
Count E|E| All evidences
Average 1Eval(ei)\frac{1}{|E|} \sum val(e_i) All numerics
Multiplication val(ei)\prod val(e_i) All numerics
Difference val(e1)val(e2)val(e_1) - val(e_2) (top-2 by ptagp^\mathrm{tag}) Ordered pair
Division val(e1)÷val(e2)val(e_1) \div val(e_2) Ordered pair
Change-ratio (val(e1)val(e2))/val(e2)(val(e_1) - val(e_2)) / val(e_2) Ordered pair

Operator selection is performed using a multi-class softmax over the [CLS] vector:

pop=softmax(FFN(h[CLS]8693;))p^\mathrm{op} = \mathrm{softmax}(\mathrm{FFN}(h_{[\mathrm{CLS}]8693;}))

with chosen operator o=argmaxpopo^* = \arg\max p^\mathrm{op}. For operators requiring operand order (Difference, Division, Change-ratio), a binary number-order classifier is computed as:

porder=softmax(FFN(h1+h22)),p^\mathrm{order} = \mathrm{softmax}(\mathrm{FFN}(\frac{h_1+h_2}{2})),

where h1,h2h_1,h_2 are encoder vectors for the top-2 evidences. The binary label indicates whether input order matches (e1,e2)(e_1, e_2).

4. Scale Prediction and Numeric Calculation

To accommodate the financial context, TagOp predicts a question-specific scale S{S \in \{None, Thousand, Million, Billion, Percent}\}. Scale prediction utilizes representations aggregated from table and paragraph tokens:

htab=1Mtablehi,  hp=1Kparahj,h_\mathrm{tab} = \frac{1}{M} \sum_\mathrm{table} h_i,~~ h_p = \frac{1}{K} \sum_\mathrm{para} h_j,

pscale=softmax(FFN([h[CLS];htab;hp])),p^\mathrm{scale} = \mathrm{softmax}(\mathrm{FFN}([h_{[\mathrm{CLS}]}; h_\mathrm{tab}; h_p])),

After symbolic execution, the model computes

Afinal=Asym×scale_factor(S),A_\mathrm{final} = A_\mathrm{sym} \times \mathrm{scale\_factor}(S),

where AsymA_\mathrm{sym} is the operator’s output.

5. Joint Training Objective

All submodules are trained jointly with cross-entropy losses. The total loss is:

L=NLL(logptag,Gtag)+NLL(logpop,Gop)+NLL(logpscale,Gscale)+NLL(logporder,Gorder),L = \mathrm{NLL}(\log p^\mathrm{tag}, G^\mathrm{tag}) + \mathrm{NLL}(\log p^\mathrm{op}, G^\mathrm{op}) + \mathrm{NLL}(\log p^\mathrm{scale}, G^\mathrm{scale}) + \mathrm{NLL}(\log p^\mathrm{order}, G^\mathrm{order}),

where GtagG^\mathrm{tag} marks answer-related sub-tokens, GopG^\mathrm{op} is the oracle operator, GscaleG^\mathrm{scale} is scale ground-truth, and GorderG^\mathrm{order} is defined for ordered operators only.

6. Empirical Results and Failure Analysis

On the TAT-QA test split, TagOp achieves 50.1% EM and 58.0% F1 (11.1% absolute gain over prior best), with human performance at EM 84.1 / F1 90.8. Operator classifier and scale predictor accuracies are as follows:

Operator % of Q’s Accuracy (%)
Span-in-text 21.3 91.6
Cell-in-table 21.6 86.7
Spans 12.6 93.8
Sum 2.5 76.2
Count 2.4 100.0
Average 5.9 100.0
Multiplication 0.1 0.0
Division 1.0 87.5
Difference 15.9 96.6
Change ratio 10.2 95.3
Other 6.6 0.0
Scale % of Q’s Accuracy (%)
None 50.3 90.1
Thousand 19.2 95.3
Million 12.9 90.2
Billion
Percent 17.7 95.9

Error analysis attributes approximately 55% of failures to incorrect evidence tagging and 29% to missing evidence, underscoring that precise IO tagging remains the chief challenge for hybrid-table-text QA (Zhu et al., 2021). A plausible implication is that further improvements in token-level evidence extraction are highly likely to yield performance gains.

7. Significance, Challenges, and Benchmark Status

TagOp’s architecture demonstrates that modeling both tabular and unstructured textual information with unified embeddings, explicit evidence extraction, symbolic reasoning, and contextual scale normalization enables tractable numerical QA over hybrid financial data. Nevertheless, performance remains significantly lower than the human benchmark. The chief identified bottleneck is precise IO tagging on heterogeneous inputs, a challenge occupying over half of all error cases. TagOp and TAT-QA together serve as a demanding benchmark for hybrid data QA, requiring advances across evidence localization, operator prediction, and symbolic computation (Zhu et al., 2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TagOp Model.