Doc2AHP: LLM-Enhanced AHP for Decision Modeling
- Doc2AHP is a structured inference framework that combines the semantic power of LLMs with the formal hierarchy and numerical rigor of AHP for decision modeling.
- It utilizes semantic tree construction, multi-agent collaboration, and adaptive consistency optimization to generate decision hierarchies, robust pairwise weights, and alternative rankings.
- The framework eliminates the need for manual expert annotation and achieves high accuracy and strict numerical consistency in benchmark evaluations.
Doc2AHP is a structured inference framework that integrates the generalization capacity of LLMs with the formal rigor of the Analytic Hierarchy Process (AHP) to enable automated, interpretable multi-criteria decision modeling from unstructured documents. By leveraging semantic tree construction, multi-agent collaboration, and adaptive consistency optimization, Doc2AHP generates decision hierarchies, computes robust pairwise criteria weights, and synthesizes alternative rankings—all while enforcing logical entailment and axiomatic numerical constraints intrinsic to classical AHP. This methodology eliminates the dependency on manual expert annotation and annotated training data, thus addressing scalability and reliability barriers inherent in generic LLM-based decision modeling (Wu et al., 23 Jan 2026).
1. Motivation and Theoretical Foundation
Doc2AHP is motivated by the structural and numerical weaknesses observed in generic LLM outputs when tasked with decision modeling. LLMs, while adept at semantic extraction, frequently produce criteria and pairwise judgements that lack document grounding and violate formal decision-theoretic axioms, leading to hallucinated, incoherent outputs. In contrast, AHP offers a systematic approach: it decomposes decision problems hierarchically and employs pairwise comparisons using a fixed scale (), with weights computed via eigendecomposition and consistency indices:
Here, is the principal eigenvalue of the comparison matrix, is the matrix dimension, and the random index. By requiring , AHP enforces transitivity and numerical reliability. Doc2AHP bridges the strengths of both paradigms by imposing these structural and numerical constraints on LLM-driven inference, yielding verifiable decision models (Wu et al., 23 Jan 2026).
2. Framework Architecture and Workflow
Doc2AHP comprises two sequential phases:
Phase I: Probabilistic AHP Construction
- Structure Generation: Semantic embeddings are computed at the paragraph level for each document. Ward’s hierarchical clustering yields a semantic tree pruned under cognitive constraints (maximum branching , depth , semantic verification threshold ).
- Weight Estimation: A Leader-Guided Multi-Agent Collaboration mechanism recruits Domain Expert Agents, each generating pairwise matrices . Their outputs are aggregated by weighted geometric mean and projected into AHP-consistent weight space via constrained optimization.
Phase II: Decision Inference
For each alternative and leaf criterion , the LLM participates in local utility estimation:
Aggregated utility scores are computed:
This complete pipeline ensures interpretability from raw documents through hierarchy , weights , to alternative scores (Wu et al., 23 Jan 2026).
3. Semantic Tree Generation and Hierarchy Mapping
Semantic tree construction commences with embedding paragraphs () from documents () into vectors (), followed by Ward’s method to build a hierarchical tree . Top-down recursive pruning, documented in Algorithm 1 of the source, yields an AHP hierarchy ():
- Root criterion () is attached to the tree root.
- At each node (depth ), the subtree is split into sub-clusters (), maximizing semantic separation.
- Each sub-cluster is summarized via LLM into a criterion label (), followed by entailment verification ().
- Links that pass semantic verification are recursively explored.
The resultant tree respects cognitive constraints and grounds criteria/subcriteria labels in document semantics (Wu et al., 23 Jan 2026).
4. Multi-Agent Judgement and Consensus Aggregation
Upon establishing the hierarchy, pairwise comparisons among sibling criteria are solicited from expert agents. Each agent generates a matrix with . Aggregation uses a weighted geometric mean:
Weights typically default to $1/K$ unless modified by the Leader Agent based on domain expertise. The resulting consensus matrix is then processed for consistency (Wu et al., 23 Jan 2026).
5. Adaptive Consistency Optimization
Doc2AHP applies convex Logarithmic Least Squares (LLS) optimization to project into the valid AHP space, incorporating leader-imposed domain constraints :
The optimization yields with:
If discretization is necessary, the ratios are rounded to the nearest AHP admissible value (Wu et al., 23 Jan 2026).
6. Empirical Results and Validation
Doc2AHP was evaluated using DecisionBench, a suite of six decision scenarios built atop IMDb, HotelRec, and Beer Advocate datasets, each presenting 20 candidate alternatives. Baseline comparisons include Standard-AHP (single-agent, without consistency enforcement) and Debate-AHP (multi-agent negotiation without formal constraints). Metrics include ranking accuracy (NDCG@5, NDCG@10), numerical reliability (, ), and pass rate .
Key findings:
- Doc2AHP achieved top NDCG@5 in five of six tasks (e.g., 0.854 vs. 0.830 Standard, 0.777 Debate in "Narrative Drama").
- Maintained 100% pass rates for across model variations (Llama-8B, Llama-70B, GPT-5.2); baselines ranged as low as 0%.
- Ablation studies indicated the critical impact of semantic structuring and consistency optimization on ranking quality and numerical rigor (Wu et al., 23 Jan 2026).
7. Discussion, Applications, and Future Directions
Doc2AHP demonstrates the viability of combining AHP’s formal, auditable scaffolds with LLM semantic generalization to elevate decision modeling above black-box intuition-based prompting toward verifiable, logically consistent outputs. The recursive semantic construction and optimization incur greater computational cost but are justified in high-stakes domains (e.g., medical, security) where reliability is paramount. For low-risk settings, simpler LLM-based methods may suffice.
Potential future developments include:
- Adaptive pruning for scalability across large document corpora.
- Cross-domain generalization of semantic tree-to-criterion mappings.
- Incorporation of human-in-the-loop feedback for dynamic updating of leader constraints .
A plausible implication is broader applicability of Doc2AHP for enabling non-expert, scalable decision modeling, facilitating transparent and auditably rational decision processes in diverse research and application contexts (Wu et al., 23 Jan 2026).