Biconditional Correctness Criterion
- Biconditional Correctness Criterion is a formal test ensuring an 'if and only if' match between observed side-effects and declared approvals in systems and proofs.
- It is applied in agent runtimes, logic programming, and linear logic to detect mismatches such as gate bypass and audit forgery.
- By unifying soundness and completeness, it guarantees that every action has a corresponding log and every log reflects a real change, enhancing operational assurance.
The biconditional correctness criterion provides an "if and only if" (biconditional) formalization of system or artifact correctness, ensuring that observed effects and declared approvals are in precise correspondence. It emerges independently in agent runtime security and in mathematical logic (notably, logic programming and linear logic), where it serves as a definitive test for the agreement between implementation behavior and specification, between proof structure and sequent derivability, or between skill execution and approved operational effects. Recent work (“Skills as Verifiable Artifacts…” (Metere, 1 May 2026)) treats it as a linchpin in trust schemas for agent skills, while in logic it underpins classic notions of program realized models and proof net sequentialization (Drabent, 2014, Donna et al., 26 Jun 2025).
1. Formal Definitions in Agent Runtimes and Logic
In agent runtimes, the biconditional criterion is defined over an agent run starting with corpus state and ending in . Let be the multiset of observed side-effects ("delta"), and let be the set of audit log entries corresponding to approved and executed irreversible actions. The audit log passes the biconditional criterion if and only if the multiset projection of onto equals that of :
This ensures, simultaneously, that every actual side-effect is matched by an approval and vice versa—no ghost effects, and no illusory (unrealized) approvals (Metere, 1 May 2026).
In logic programming, the biconditional takes the form , i.e., every provable atom in 0 must be in the intended model 1 and vice versa. This is equivalently expressed as 2, where 3 is the least Herbrand model (Drabent, 2014).
In linear logic, the biconditional criterion characterizes proof-structure correctness via necessary and sufficient conditions: for suitable fragments, a proof structure 4 is sequentializable if and only if its correctness graph satisfies (i) all switchings are acyclic, (ii) the number of connected components matches 5, and (iii) a geometric 6-condition preventing erasing subnets from connecting to tensor/cut nodes (Donna et al., 26 Jun 2025).
2. Underlying Assumptions and Key Definitions
The agent runtime formulation is predicated on the explicit separation of observed reality (7) and the system’s claim of action (8). Skills are tuples 9, where 0 is a manifest assigning capabilities, a quoted verification level (unverified, declared, tested, formal), and other metadata. A verification procedure must pass the biconditional on adversarial exercises to elevate a skill to “tested” (Metere, 1 May 2026).
Logic programming places the criterion in the context of model-theoretic semantics: 1 is the specification, 2 is the program’s least model. Correctness (3) and completeness (4) correspond precisely when 5 (Drabent, 2014).
Linear logic proof-nets use correctness graphs, switchings, and erasing nodes connecting the formalization to geometric graph properties—in particular, preventing the formation of “erasing threads” that violate syntactic proof properties (Donna et al., 26 Jun 2025).
3. Rationale and Distinction from Soundness/Completeness
A biconditional criterion transparently unites “soundness” and “completeness.” Soundness ensures that all claimed (approved) actions or proofs actually occur or hold (no false positives); completeness ensures all real effects or legitimate derivations are claimed (no false negatives). The biconditional is unique in catching both failure types in runtime logging (e.g., unauthorized changes or missing actions), as well as in logic (missing or spurious derivations) (Metere, 1 May 2026, Drabent, 2014).
This is operationalized in agent skill verification by forbidding “gate bypasses” (altering state without audit) and “audit forgeries” (claiming to execute, but nothing happens). In logic, it eliminates programs or proofs that undershoot (failure to derive intended facts) or overshoot (deriving unintended facts) the specification/model.
4. Verification Workflows and Evaluation Methodologies
In agent systems, adversarial ensemble exercises are used: agents (e.g., Cleaners, Auditors, Critics) propose destructive or risky actions on a fixed corpus, driving the runtime under offensive input conditions to expose mismatches between 6 and 7. Pass/fail is binary and deterministic. The biconditional’s mechanical verification is the acceptance test for a skill to reach “tested” level and bypass human-in-the-loop (HITL) gating (Metere, 1 May 2026).
In logic programming, the biconditional is evaluated by showing 8 (correctness) and 9 (completeness), with sufficient conditions including “recurrent coverage,” semi-completeness with recurrence, or acceptable level mappings. Pruned SLD-trees (csSLD-trees) require compatibility to maintain biconditional correctness under restricted computation (Drabent, 2014).
In linear logic, combinatorial/graph-theoretic algorithms (Danos–Regnier connectivity, 0-conditions) are checked on proof-structure graphs. Only when all switchings satisfy the necessary and sufficient conditions, and the geometric constraints on erasing nodes are met, does sequentialization (i.e., existence of a corresponding sequent proof) hold (Donna et al., 26 Jun 2025).
5. Failure Modes and Detection
Enumerating possible failures is essential for operational assurance:
| Failure Mode | Manifestation | Detected by Biconditional |
|---|---|---|
| Gate bypass | 1: side-effect with no log | Yes |
| Audit forgery | 2: log entry with no actual change | Yes |
| Silent host failure | Approved log, no state change | Yes |
| Wrong-target execution | Log refers to 3, effect observed on 4 | Yes |
In all cases, 5, causing an immediate failure of the biconditional check. This yields high assurance in both sandboxed skill certification and runtime operation, elevating security and trust in automated or semi-automated agent systems (Metere, 1 May 2026).
6. Comparative Structures in Logic Programming and Linear Logic
The biconditional criterion in logic programming, 6, has a close parallel to the model-equality verification in runtime systems. Approximate specifications (7) further allow partial biconditionality where exact specification is impractical (Drabent, 2014).
In linear logic proof-theory, sequentializability becomes an “iff” property only when both Danos–Regnier acyclicity/connectivity and the geometric 8-condition are imposed. The 9-condition ensures that erasing subnets cannot attach in a way that would escape the control of the sequentializable proof calculus, structurally guaranteeing the biconditional property for wide fragments of MELL (Donna et al., 26 Jun 2025).
7. Integration with Trust Schemas and Broader Implications
In agent skill ecosystems, the biconditional criterion anchors the trust schema: only those skills that pass adversarial ensemble evaluation and the biconditional, achieving “tested” or “formal” verification status, are permitted to bypass continuous HITL gating. This approach both eliminates rubber-stamping risk and provides a reproducible, mechanical benchmark for skill certification. Failures cause session aborts for “trusted” skills and immediate operator alerts (Metere, 1 May 2026).
In logical and proof-theoretic contexts, biconditional criteria provide the foundation for both theoretical completeness results and practical verification workflows for programs and proof artifacts (Drabent, 2014, Donna et al., 26 Jun 2025).
A plausible implication is that biconditional correctness frameworks reveal and tightly capture the boundary between trusted automation and operator mediation, enforceable through both runtime instrumentation and formal methods in diverse computer science subfields.