Merge Point Predictor Table (MPPT)
- MPPT is a predictive structure that aggregates sequential history, feature embeddings, and merge signals to identify reconvergence points in processor flows and decode table cell spans.
- Its hardware implementation, featuring a 128-entry, 4-way set-associative table and a wrong-path buffer, achieves merge prediction accuracies up to 95%, reducing mispredictions significantly.
- In table structure recognition, MPPT underpins a transformer and GRU-based decoder that lifts F1 scores by nearly 6 percentage points over baseline methods on complex tables.
The Merge Point Predictor Table (MPPT) is a predictive structure designed to either identify reconvergence points in program control flow following branch mispredictions within processor front ends or decode cell spanning decisions in table structure recognition tasks. Across both domains, MPPT encapsulates the logic for reconstructing solution spaces by aggregating sequential history, feature embeddings, and explicit merge signals. This article presents a technical account of MPPTs in two research contexts: (1) dynamic merge point prediction for hard-to-predict branches in superscalar microarchitectures (Pruett et al., 2020), and (2) merge-point decoding in accurate table structure recognition models (Zhang et al., 2021).
1. MPPT in Processor Front-Ends: Table Organization and Replacement
The MPPT in dynamic merge point prediction is a 128-entry, 4-way set-associative hardware table designed to store merge point hypotheses for hard-to-predict program control branches (Pruett et al., 2020). Each entry comprises:
- Tag: partial program counter (PC) of the branch, distinguishing branches mapping to the same set.
- Merge-Point PC: predicted PC where correct and wrong path rejoin.
- Predicted Distance: 8-bit count of dynamic instructions separating branch retirement and reconvergence (range 0–100).
- Independent-Register Set: bit vector marking registers downstream of the merge that are data-independent of the branch.
- Confidence Counter: 3-bit saturating counter updated by correct and incorrect merge predictions.
Indexing uses low five bits of branch PC; replacement prioritizes minimal-confidence eviction, falling back to largest predicted merge distance. LRU is not used within the MPPT. Typical total storage is ≈1.6 KB, excluding tags.
2. Runtime Merge Point Detection and Algorithmic Workflow
Dynamic detection within the MPPT pipeline employs a Wrong-path Buffer (WPB), also 128-entry/4-way associative, which tracks instructions executed on speculative wrong-paths upon a branch misprediction. For each misprediction:
- WPB entries accumulate destination register sets and instruction counts for all instructions after the mispredicted branch up to conditions including loopbacks or a 100-instruction upper bound.
- Upon pipeline flush, retiring correct-path PCs are checked against WPB entries. A match denotes a dynamic merge point, after which predicted distance and register independence information are synthesized and a new MPPT entry installed.
- Confidence counters in MPPT entries are incremented on prediction correctness and decremented otherwise; WPB entries are invalidated post-processing.
The algorithm detects merge points even for branches marked "impossible to predict" by infinite-size predictors. A confidence–cost system arbitrates use of MPPT versus conventional TAGE branch prediction by evaluating confidence (from saturating counters and JRS confidence predictions) and average branch resolution latency. Logic table:
| Latency | Conf-Low | Conf-Med | Conf-High |
|---|---|---|---|
| Low | MPPT | TAGE | TAGE |
| High | MPPT | MPPT | TAGE |
3. Hardware Integration, Overheads, and Latency
The MPPT, WPB (~1 KB), and an Update List (8 × 14 bytes) operate in parallel to BTB and TAGE lookups. All three structures support single-cycle lookup at 3.2 GHz. The aggregate incremental area is <0.01 mm² (28 nm process), with power overhead ≪1 % of core fetch power. When MPPT predicts, fetch is redirected to the merge-point PC, skipping the wrong-path and mitigating speculative work on mispredicted paths.
4. Quantitative Evaluation and Effects on Branch Misprediction
Evaluation on a cycle-accurate x86 simulator, across the SPEC CPU2006 Integer suite, demonstrates that the MPPT achieves a merge prediction accuracy of 95 % (versus 78 % for reconvergence-inf predictors) (Pruett et al., 2020). Across benchmarks:
- Coverage is ≈71 % of all hard mispredictions.
- 58 % of all branch mispredictions replaced by merges.
- MPKI reduction is 43 % (fewer mispredictions per thousand instructions).
- In MPPmax variant, MPKI falls by 56 % relative to BP-only.
- Predicted merge distances average ∼30 dynamic instructions (average overestimate <5 instructions).
5. MPPT in Table Structure Recognition: Input Features and Decoder Architecture
In table structure recognition, MPPT corresponds to the "Merger" module in the Split, Embed and Merge (SEM) framework (Zhang et al., 2021). After the Splitter produces an grid, and the Embedder yields a -dimensional embedding for each grid cell, the Merger uses as input.
Embeddings are constructed by concatenating:
- Vision features: RoIAlign extraction on FPN maps, followed by FFN.
- Text features: OCR–BERT pipeline, projected to . Concatenated [] vectors are projected and subjected to one Transformer attention layer.
The Merger is a two-stage GRU decoder equipped with additive attention over embeddings and a convolutional memory of previously assigned grid cells, producing a sequence of binary masks , one per true cell at each time step.
Key equations governing this include:
- Attention energy:
- Context update:
- Binarization at threshold $0.5$ using sigmoid activation.
- Loss: normalized binary cross-entropy over grid cells and prediction steps.
The module is self-regressive, and grid cell merge decisions are temporally removed from future consideration via convolutional memory . No softmax is used; all predictions are per grid element.
6. Implementation and Empirical Results in Table Recognition
The MPPT-backed Merger is trained end-to-end with Splitter and Embedder, using ADADELTA (β₁=0.9, β₂=0.999, ), learning rates annealed from to , loss weights unity, batch size 8 on single V100 GPU.
Ablation study results (SciTSR and SciTSR-COMP datasets):
| System | SciTSR F1 | SciTSR-COMP F1 |
|---|---|---|
| T1 (no Merger) | 95.40 % | 89.77 % |
| T2 (Text+Merger) | 95.48 % | 90.99 % |
| T3 (Vision+Merger) | 96.68 % | 95.15 % |
| T4 (Full SEM: V+T+Merger) | 97.11 % | 95.72 % |
MPPT addition yields a 5.95 percentage point F1 lift on complex tables over Splitter-only. Visual features offer superior accuracy over text-only in merge-point decoding. Throughput is reduced versus simple splitter (from 16.5 FPS to 1.94 FPS for full SEM).
Attention visualizations demonstrate that Merger maps accurately highlight target cell regions, including complex, multi-row/column spans.
7. Domain-Specific Roles, Limitations, and Future Prospects
MPPT functions as a hardware reconvergence oracle in processor branch prediction and as a sequential binary mask decoder in table structure parsing. In both architectures, table-based merge-point history and attention-enhanced aggregation are central. Integration overhead remains modest relative to baseline structures in processors ( KB, <1 % power), and prediction enhances reliability and throughput for divergent control flow and tabular data recognition.
Limitations include table size constraints, dependence on accurate historic assignment (in both WPB and grid cell assignments), and performance throughput trade-offs in deep vision–language pipelines. Future directions plausibly include hybrid predictors leveraging more expressive memory models and enhanced dynamic tracking, or ML-based confidence–cost mediation across hardware and document-parsing MPPTs.