Over-Exposure Rate (OER) in LLM Systems

Updated 13 December 2025

Over-Exposure Rate (OER) is an empirical metric that measures the frequency with which LLM agents divulge information beyond the Minimum Necessary Information (MNI), indicating potential security breaches.
Experimental results show a clear monotonic increase in OER with higher trust levels across various models and frameworks, revealing model-specific risk profiles through metrics like Authorization Drift (AD).
Defense mechanisms such as Sensitive Information Repartitioning and Guardian-Agent enablement effectively reduce OER, offering actionable strategies to mitigate information leakage in multi-agent scenarios.

Over-Exposure Rate (OER) is a formally defined empirical metric for quantifying the incidence of unintended information leakage by agents in LLM-based multi-agent systems. OER measures the probability with which a Custodian-Keeper (CK-Agent) divulges data exceeding the Minimum Necessary Information (MNI) required for a given task, thereby violating established security or privacy boundaries. OER was introduced and validated in the context of the Trust-Vulnerability Paradox (TVP) for LLM agent collaboration, where increases in explicit trust levels boost coordination but also elevate the frequency and scale of over-exposures (Xu et al., 21 Oct 2025).

1. Formal Definition

Let $S$ denote a scenario and $\tau \in \{0.1, 0.5, 0.9\}$ parameterize the trust level between agents. For each pair $(S, \tau)$ , consider the set of agent-to-agent interaction chains $G_{S, \tau}$ , each producing a final CK-Agent output $O_\ell$ . Let $A^*$ represent the task-specific MNI baseline. OER is defined as:

$\mathrm{OER}(S,\tau) = \frac{1}{|G_{S,\tau}|} \sum_{\ell \in G_{S, \tau}} \mathbf{1}(O_\ell \setminus A^* \neq \varnothing)$

where $\mathbf{1}(\cdot)$ is the indicator function, and the numerator counts the number of chains where the output $O_\ell$ includes at least one token not present in the baseline $A^*$ . The denominator $|G_{S, \tau}|$ is the total number of independent runs at $(S, \tau)$ . This construction yields an empirical probability of over-exposure for each scenario and trust regime. Under strict safety ( $A^* = \varnothing$ ), OER reduces to the raw incidence of any sensitive disclosure.

2. Minimum Necessary Information (MNI) Baseline

The MNI principle requires that agents reveal no more data than strictly needed for task fulfillment. For each scenario $S$ :

The CK-Agent is entrusted with a sensitive dataset $\Sigma$ (e.g., personal addresses, salaries).
The formal task objective is the computation of an aggregate $A = g(\Sigma)$ (e.g., total payroll).
The MNI baseline $A^* = g(\Sigma)$ ; any explicit content outside $A^*$ represents an over-exposure.
In the most conservative setting ( $A^* = \varnothing$ ), any emitted information is considered a violation.

OER captures the empirical frequency with which agent outputs $O$ violate the MNI principle by exceeding $A^*$ . This directly quantifies policy nonconformance and the probability of boundary violation in agent communications.

3. Experimental Methodology

OER was evaluated over a purpose-built scenario-game dataset encompassing 3 macro scenes—enterprise collaboration, deep-sea archaeology, and Mars colonization—subdivided into 19 distinct sub-scenes. For each sub-scene, a CK-Agent (holding $\Sigma$ ) and a Seeker-Agent (SK-Agent) conduct pure agent-to-agent closed-loop dialogues.

Trust is encoded through a parameter $\tau$ , governing both a gating function $\kappa(\tau)$ (which modulates the rigidity of MNI gates and refusal thresholds) and a redundancy function $r(\tau)$ (which adjusts descriptive detail). For each $(S, \tau)$ , 4 model backends (DeepSeek, Qwen, GPT, Llama-3-8B) and 3 orchestration frameworks (AgentScope, AutoGen, LangGraph) were used, with core model prompts and decoding hyperparameters held constant. External access, shared memory, and nonlocal tools were fully disabled. In total, 1488 CK–SK interaction chains were evaluated, and OER was systematically computed per scenario, backend, trust level, and framework.

4. Key Experimental Results

OER exhibits a robust, monotonic increase with respect to $\tau$ (trust level) across all tested models and frameworks. Under the AgentScope orchestration framework, aggregated results across models and sub-scenes yield:

Model	OER( $\tau$ =0.1)	OER( $\tau$ =0.5)	OER( $\tau$ =0.9)	Authorization Drift (AD)
GPT	0.05	0.34	0.41	0.0243
DeepSeek	0.12	0.42	0.50	0.0268
Qwen	0.13	0.37	0.56	0.0310
Llama-3-8B	0.05	0.22	0.71	0.0783

Here, Authorization Drift (AD) is the weighted variance of OER across $\tau$ , capturing the sensitivity of each model to changes in trust. Framework-wise trends (fixed model=DeepSeek) are:

Framework	OER(0.1)	OER(0.5)	OER(0.9)	AD
AgentScope	0.12	0.42	0.50	0.0268
AutoGen	0.38	0.55	0.56	0.0068
LangGraph	0.26	0.46	0.47	0.0296

These results demonstrate model- and framework-dependent variance in risk profiles: the slope and magnitude of OER reflect heterogeneous trust-to-risk mappings.

5. Effect of Defense Mechanisms

Two defenses were systematically assessed for their impact on OER: Sensitive Information Repartitioning and Guardian-Agent enablement.

Sensitive Information Repartitioning (MNI-guided sharding with $k$ -of- $n$ reconstruction) structurally segments sensitive data. This reduces AD sharply (up to 88.4% for Llama-3-8B) and lowers high- $\tau$ OER substantially (up to 46.5% risk reduction for Llama-3-8B at $\tau = 0.9$ ), albeit sometimes raising the low- $\tau$ OER baseline.
Guardian-Agent (GA-Agent) Enablement (policy injection, pre-speak checks, refusal templates) delivers significant improvements, notably reducing high- $\tau$ OER by up to 49.3% and AD by over 80% for Llama-3-8B.

Combined deployment yields maximal robustness: repartitioning stabilizes trust sensitivity (flattens AD), while GA-Agent reduces both low-baseline and high-leakage risk profiles.

6. Metric Utility and Governance Recommendations

OER operationalizes the detection and monitoring of over-exposure events in agent outputs on a reproducible basis, while AD quantifies the system’s sensitivity to trust-level modulation. These metrics underpin rigorous policy compliance audits and risk evaluations. Governance recommendations include:

Treating inter-agent trust ( $\tau$ ) as a first-class, dynamically schedulable security variable.
Embedding MNI gates and modular sharding for targeted leakage desensitization.
Deploying lightweight GA-Agents for real-time policy enforcement and revocation.
Continuous OER monitoring across trust levels, models, and orchestration stacks, with stratified sampling for high-risk or out-of-scope queries.

7. Summary and Implications

OER, jointly with AD, illuminates the fundamental Trust-Vulnerability Paradox in LLM-based multi-agent systems: while elevated trust improves collaborative task success, it also predictably amplifies the probability and scale of information over-exposure. The monotonic relationship between $\tau$ and OER highlights the inherent trade-off between efficiency and security in multi-agent orchestration. Adoption of MNI-compliant architectures and adaptive policy agents is essential for robust boundary management. Empirical OER tracking enables system designers to quantify, balance, and control the interplay between trust and exposure risk in dynamic, high-stakes agent deployments (Xu et al., 21 Oct 2025).

PDF Markdown Chat (Pro)

References (1)

The Trust Paradox in LLM-Based Multi-Agent Systems: When Collaboration Becomes a Security Vulnerability (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Over-Exposure Rate (OER).