Quantitative Claim Provenance Rate
- Quantitative Claim Provenance Rate (CPR) is a metric that measures the percentage of research claims backed by complete, auditable provenance paths through source and reasoning nodes.
- The metric is computed by validating each claim’s provenance chain via stringent binary checks ensuring tool-turn alignment, source authenticity, and correct semantic relations.
- CPR’s application in systems like TRACER fosters transparency in automated research, making it a pivotal standard for evaluating claim-level traceability.
Quantitative Claim Provenance Rate (CPR) is a metric designed to quantify the fraction of claims in a generated research output—such as a scientific report or agent-generated answer—that possess complete, auditable provenance paths to supporting sources. Originally formulated in the context of the Auditable Autonomous Research (AAR) standard and subsequently operationalized in multimodal tool-using agent frameworks such as TRACER, CPR serves as a rigorous gauge for claim-level traceability, directly reflecting a model's ability to ground each factual assertion in verifiable evidence (Rasheed et al., 14 Feb 2026, Yu et al., 11 May 2026).
1. Formal Definition and Mathematical Formulation
Quantitative Claim Provenance Rate is defined as the proportion of output claims that are traceable via a complete provenance path from one or more source nodes, possibly through intermediate reasoning nodes, to the final claim:
Here, denotes the set of final claims (e.g., assertion sentences in a report), and denotes the provenance path for claim . A path is “complete” if there exists at least one directed sequence, starting from a source node, possibly traversing reasoning nodes, and ending at , with edges labeled "supports" (Rasheed et al., 14 Feb 2026). Expressed as a percentage, gives the fraction of all claims that are fully backed by at least one supporting source.
When operationalized in sentence-level provenance graphs—as in TRACER—this reduces to checking for every sentence/claim that all associated provenance items (tool-turn identifier, evidence unit, relation) meet a strict correctness gate. The formal equivalence is given by:
where is the number of sentences, is a composite Boolean (validity) check on provenance item for sentence 0, and 1 is the indicator function (Yu et al., 11 May 2026).
2. Provenance Structures and Claim-Evidence Graphs
Claim provenance is encoded as a directed graph 2 where
- 3 are source nodes (e.g., document passages, tool outputs),
- 4 are reasoning nodes (intermediate inference steps), and
- 5 are claim nodes (the system's assertions).
Each edge is typically labeled with a "supports" relation; additional labels may denote "contradicts" or other logical relations. In TRACER's implementation, each claim (answer sentence) 6 is associated with a provenance set 7, where:
- 8: tool-turn identifier,
- 9: extracted evidence unit,
- 0Quotation, Compression, Inference1: semantic support relation (Yu et al., 11 May 2026).
A claim is counted as covered for CPR if at least one valid provenance path exists from a source node to that claim.
3. Verification Procedures and Sub-Checks
For a claim to be considered fully supported—and hence CPR-incrementing—each associated provenance item must pass a triplet of binary checks:
- Tool-turn alignment 2: Verifies that the tool-turn identifier refers to an invoked tool call.
- Source authenticity 3: Checks that the evidence unit 4 is a substring or fragment of the referenced tool output.
- Relation rationality 5: Ensures that the declared relation type correctly links the evidence to the claim (e.g., Quotation 6; Compression 7; Inference 8 such that 9).
A composite validity 0 is assigned; all 1 for a given sentence/claim are required for coverage. Schema correctness (valid JSON, assignment to all sentences) is checked via a global 2 (Yu et al., 11 May 2026).
4. CPR in Context: Related Auditability Metrics
While CPR captures the binary presence/absence of fully traceable claims, interpretation is enhanced by complementary auditability metrics defined in the AAR standard:
- Provenance Soundness (PSnd): Fraction of claim-source pairs where the semantic entailment from source to claim exceeds a threshold.
- Contradiction Transparency (CTran): Fraction of source contradictions that the system makes explicit in the provenance graph.
- Audit Effort (AEff): Average human time required to manually verify a claim given the provenance structure (Rasheed et al., 14 Feb 2026).
A high CPR indicates structural traceability, but not necessarily evidence quality or entailment strength; low CPR flags missing evidence and motivates further inspection of PSnd, CTran, and AEff.
5. Calculation Protocols and Examples
The stepwise computation of CPR involves:
- Claim identification: Segment the output into atomic factual assertions.
- Provenance graph extraction/building: Map claims, sources, and reasoning steps into a directed graph structure.
- Path validation: For each claim 3, verify the existence of at least one valid support chain from a source node.
- Aggregation: Compute the CPR as the fraction of claims with complete provenance.
In the reference example (Rasheed et al., 14 Feb 2026), a black-box agent with only 1 of 3 claims grounded produces 4. In a provenance-transparent agent, 5 is obtained when all claims are linked to sources.
6. Implementation in Multimodal Tool-Using Agents: TRACER
TRACER implements CPR to enforce verifiable generative provenance in multimodal tool-using agents. Each answer is coupled with a sentence-level provenance graph, detailing which tool call and evidence support each claim, and the semantic link (Quotation, Compression, Inference). Validation is enforced via schema compliance, tool-turn alignment, source authenticity, and relation rationality gates. Experimental evaluation on TRACE-Bench shows that high answer accuracy and high CPR (over 93%) are jointly achievable, supporting robust claim-level auditability (Yu et al., 11 May 2026).
| Metric | Formula / Description | Reference |
|---|---|---|
| Claim Provenance Rate | 6 | (Rasheed et al., 14 Feb 2026, Yu et al., 11 May 2026) |
| Provenance Soundness | 7 | (Rasheed et al., 14 Feb 2026) |
| Contradiction Transparency | 8 | (Rasheed et al., 14 Feb 2026) |
7. Practical Considerations, Automation, and Limitations
Automation of CPR assessment requires robust claim detection (e.g., via sequence labeling or assertion-detection heuristics) and evidence linking (retrieval plus NLI-based reranking). Gold provenance graph construction often needs initial human annotation but can be streamlined using standardized schemas such as W3C PROV. Protocolized validation—integrated into synthesis workflows—reduces post-generation audit overhead.
Current limitations include susceptibility to errors in claim identification/linking, the imperfect reliability of NLI for entailment checks, and the challenge of systematically surfacing and representing source contradictions. Audit effort for large outputs remains a practical bottleneck.
A plausible implication is that continued CPR monitoring, alongside PSnd, CTran, and AEff, may help enforce policies such as "verification is cheaper than generation," promoting transparency and trust in automated research workflows.