Over-Exposure Rate (OER) in LLM Systems
- Over-Exposure Rate (OER) is an empirical metric that measures the frequency with which LLM agents divulge information beyond the Minimum Necessary Information (MNI), indicating potential security breaches.
- Experimental results show a clear monotonic increase in OER with higher trust levels across various models and frameworks, revealing model-specific risk profiles through metrics like Authorization Drift (AD).
- Defense mechanisms such as Sensitive Information Repartitioning and Guardian-Agent enablement effectively reduce OER, offering actionable strategies to mitigate information leakage in multi-agent scenarios.
Over-Exposure Rate (OER) is a formally defined empirical metric for quantifying the incidence of unintended information leakage by agents in LLM-based multi-agent systems. OER measures the probability with which a Custodian-Keeper (CK-Agent) divulges data exceeding the Minimum Necessary Information (MNI) required for a given task, thereby violating established security or privacy boundaries. OER was introduced and validated in the context of the Trust-Vulnerability Paradox (TVP) for LLM agent collaboration, where increases in explicit trust levels boost coordination but also elevate the frequency and scale of over-exposures (Xu et al., 21 Oct 2025).
1. Formal Definition
Let denote a scenario and parameterize the trust level between agents. For each pair , consider the set of agent-to-agent interaction chains , each producing a final CK-Agent output . Let represent the task-specific MNI baseline. OER is defined as:
where is the indicator function, and the numerator counts the number of chains where the output includes at least one token not present in the baseline . The denominator is the total number of independent runs at . This construction yields an empirical probability of over-exposure for each scenario and trust regime. Under strict safety (), OER reduces to the raw incidence of any sensitive disclosure.
2. Minimum Necessary Information (MNI) Baseline
The MNI principle requires that agents reveal no more data than strictly needed for task fulfillment. For each scenario :
- The CK-Agent is entrusted with a sensitive dataset (e.g., personal addresses, salaries).
- The formal task objective is the computation of an aggregate (e.g., total payroll).
- The MNI baseline ; any explicit content outside represents an over-exposure.
- In the most conservative setting (), any emitted information is considered a violation.
OER captures the empirical frequency with which agent outputs violate the MNI principle by exceeding . This directly quantifies policy nonconformance and the probability of boundary violation in agent communications.
3. Experimental Methodology
OER was evaluated over a purpose-built scenario-game dataset encompassing 3 macro scenes—enterprise collaboration, deep-sea archaeology, and Mars colonization—subdivided into 19 distinct sub-scenes. For each sub-scene, a CK-Agent (holding ) and a Seeker-Agent (SK-Agent) conduct pure agent-to-agent closed-loop dialogues.
Trust is encoded through a parameter , governing both a gating function (which modulates the rigidity of MNI gates and refusal thresholds) and a redundancy function (which adjusts descriptive detail). For each , 4 model backends (DeepSeek, Qwen, GPT, Llama-3-8B) and 3 orchestration frameworks (AgentScope, AutoGen, LangGraph) were used, with core model prompts and decoding hyperparameters held constant. External access, shared memory, and nonlocal tools were fully disabled. In total, 1488 CK–SK interaction chains were evaluated, and OER was systematically computed per scenario, backend, trust level, and framework.
4. Key Experimental Results
OER exhibits a robust, monotonic increase with respect to (trust level) across all tested models and frameworks. Under the AgentScope orchestration framework, aggregated results across models and sub-scenes yield:
| Model | OER(=0.1) | OER(=0.5) | OER(=0.9) | Authorization Drift (AD) |
|---|---|---|---|---|
| GPT | 0.05 | 0.34 | 0.41 | 0.0243 |
| DeepSeek | 0.12 | 0.42 | 0.50 | 0.0268 |
| Qwen | 0.13 | 0.37 | 0.56 | 0.0310 |
| Llama-3-8B | 0.05 | 0.22 | 0.71 | 0.0783 |
Here, Authorization Drift (AD) is the weighted variance of OER across , capturing the sensitivity of each model to changes in trust. Framework-wise trends (fixed model=DeepSeek) are:
| Framework | OER(0.1) | OER(0.5) | OER(0.9) | AD |
|---|---|---|---|---|
| AgentScope | 0.12 | 0.42 | 0.50 | 0.0268 |
| AutoGen | 0.38 | 0.55 | 0.56 | 0.0068 |
| LangGraph | 0.26 | 0.46 | 0.47 | 0.0296 |
These results demonstrate model- and framework-dependent variance in risk profiles: the slope and magnitude of OER reflect heterogeneous trust-to-risk mappings.
5. Effect of Defense Mechanisms
Two defenses were systematically assessed for their impact on OER: Sensitive Information Repartitioning and Guardian-Agent enablement.
- Sensitive Information Repartitioning (MNI-guided sharding with -of- reconstruction) structurally segments sensitive data. This reduces AD sharply (up to 88.4% for Llama-3-8B) and lowers high- OER substantially (up to 46.5% risk reduction for Llama-3-8B at ), albeit sometimes raising the low- OER baseline.
- Guardian-Agent (GA-Agent) Enablement (policy injection, pre-speak checks, refusal templates) delivers significant improvements, notably reducing high- OER by up to 49.3% and AD by over 80% for Llama-3-8B.
Combined deployment yields maximal robustness: repartitioning stabilizes trust sensitivity (flattens AD), while GA-Agent reduces both low-baseline and high-leakage risk profiles.
6. Metric Utility and Governance Recommendations
OER operationalizes the detection and monitoring of over-exposure events in agent outputs on a reproducible basis, while AD quantifies the system’s sensitivity to trust-level modulation. These metrics underpin rigorous policy compliance audits and risk evaluations. Governance recommendations include:
- Treating inter-agent trust () as a first-class, dynamically schedulable security variable.
- Embedding MNI gates and modular sharding for targeted leakage desensitization.
- Deploying lightweight GA-Agents for real-time policy enforcement and revocation.
- Continuous OER monitoring across trust levels, models, and orchestration stacks, with stratified sampling for high-risk or out-of-scope queries.
7. Summary and Implications
OER, jointly with AD, illuminates the fundamental Trust-Vulnerability Paradox in LLM-based multi-agent systems: while elevated trust improves collaborative task success, it also predictably amplifies the probability and scale of information over-exposure. The monotonic relationship between and OER highlights the inherent trade-off between efficiency and security in multi-agent orchestration. Adoption of MNI-compliant architectures and adaptive policy agents is essential for robust boundary management. Empirical OER tracking enables system designers to quantify, balance, and control the interplay between trust and exposure risk in dynamic, high-stakes agent deployments (Xu et al., 21 Oct 2025).