Inference-Time Context Leakage
- Inference-Time Context Leakage is the unintentional disclosure of secret or correlated information during inference, manifested via explicit side channels or collateral statistical correlations.
- The framework employs Hidden Markov Models and a secure-refinement order to mathematically model state updates and quantify adversarial vulnerabilities.
- It offers compositional techniques that extend security analysis to complex systems, aiding in the design of countermeasures for cryptographic protocols and multi-party computations.
Inference-time context leakage refers to the phenomenon where information that should remain secret or inaccessible is unintentionally or covertly revealed during the inference phase of a computation or model, even though no explicit references to the secret or its correlated variables occur in the observable outputs. The leakage may be direct (e.g., through explicit side-channel outputs or program leaks) or collateral, arising due to statistical correlations between a declared secret and latent variables elsewhere in the computational or environmental context. The formalism of inference-time leakage addresses both explicit program variables and the broader threat of adversaries who exploit auxiliary information at inference time to extract or revise their knowledge of correlated secrets, as exemplified by the Dalenius effect in statistical databases. Recent research establishes compositional frameworks—primarily Hidden Markov Models (HMMs) and rigorous quantitative flows—enabling the precise modeling, measurement, and mitigation of such leaks.
1. Secure-Refinement Order and Quantitative Information Flow
Central to the formal analysis of inference-time context leakage is the secure-refinement order on "hypers" (i.e., distributions of posterior distributions induced by a program's execution). For two programs and , is at least as secure as if, for every possible adversarial gain function (which can instantiate various entropy or vulnerability metrics), the updated adversary vulnerability after observing is no greater than after observing :
Here, quantifies the adversary’s ability to guess the secret given the "hyper" . The refinement order ensures that all operationally distinguishable leakages (as measurable by ) are bounded, supporting a rigorous, comparison-based approach to security that is amenable to compositional reasoning.
2. Hidden Markov Model-Based Semantics for Leakage
Programs that update secrets (not just read them) are modeled as Hidden Markov Models (HMMs), coupling explicit leak or observation channels with secret state update mechanisms. Each HMM step is a triple:
$\CM{C}{M}_{x, y, x'} = C_{x, y}\; M_{x, x'}$
- : a stochastic channel mapping secrets to observable outputs .
- : a Markov matrix for state updates from to .
The semantic effect of a program (composed of such steps) is captured by chaining these HMM steps. For adversarial purposes—where only the leakage path is of interest—the HMM is abstracted to an "effective channel":
$(\chan.H)_{x, y} = \sum_{x'} H_{x, y, x'}$
which aggregates the transition-augmented observations, enabling direct computation of vulnerabilities and, hence, leakage bounds.
3. Collateral Leakage: Beyond Explicit Secret Variables
Collateral leakage arises when secrets correlated with, but not declared in, the program state become vulnerable due to information flow about observed variables. Let be the explicit program secret and an external secret correlated via a joint distribution $\Pi \in \Dist(\mathcal{Z} \times \mathcal{X})$, factorized as:
$\Pi = \leftmarg{\pi}\, \Apply\, \rightchan\Pi$
where $\leftmarg{\pi}$ is the marginal over , and $\rightchan\Pi$ is the right-stochastic matrix representing . The leakage channel for is constructed by composing the correlation with the program's effective channel :
$D = \rightchan\Pi\;{\MMult}\;C$
The multiplicative -leakage for collateral secrets is then computed as:
$\call_g^{\text{collateral}} = \call_g(\leftmarg{\pi}, \rightchan\Pi\,{\MMult}\,C) = \lg\left(\frac{V_g(\leftmarg{\pi}\Apply(\rightchan\Pi\,{\MMult}\,C))}{V_g(\leftmarg{\pi})}\right)$
Recursive upper bounds on the leakage (the "collateral capacity" $\CCap.H$) are available even without precise knowledge of .
4. Inference-Time leakage Scenarios and Measurement
In practical settings, inference-time leakage encompasses both direct and collateral mechanisms, often realized in contexts where attackers possess auxiliary or post-hoc correlation information. Even if program outputs are distributionally benign when considered in isolation (e.g., password updates yielding uniform distributions for new passwords), knowledge of external correlations allows adversaries to eliminate possibilities or revise beliefs about secrets in a non-uniform fashion. For instance, stricter update policies that guarantee the new password differs from the old one result in a tighter, less uniform posterior for the adversary if the adversary knows the password is reused elsewhere.
Quantitative comparisons leverage the secure-refinement order: for any two program variants producing the same marginal output but yielding different posteriors for the correlated (collateral) variable, one can compute vulnerabilities across all gain functions to determine which is provably more (or less) secure in the presence of such collateral context.
5. Compositionality and Context Extension
A defining strength of the HMM-based, secure-refinement formalism is its compositional property with respect to program context. Refinement properties established with respect to explicit, declared variables automatically "lift" to broader contexts, meaning that the introduction of additional variables (correlated or not, program-visible or not) does not invalidate the security bounds, provided the correlations are properly accounted for in the collateral model. This compositionality underpins scalable semantic reasoning for large or open programs, including cryptographic algorithms and multi-party computation, where explicit enumeration of all correlated secrets is infeasible.
6. Example Applications and Countermeasures
The framework supports systematic analysis and mitigation of inference-time leakage across domains:
- Password Change Protocols: Quantitative bounds distinguish between strict and lax change policies, allowing system designers to balance usability and security with respect to external correlations.
- Cryptographic Implementations: For example, in public-key encryption key generation, if underlying secrets (e.g., private keys) are not independent, side-channel or algorithmic leaks from one session enable inference attacks on others. The framework quantifies this leakage as a function of the effective channel and known structural correlations.
- Program Transformations: The refinement order and collateral capacity bounds enable informed selection among implementations (e.g., randomized loop divisors in exponentiation to obfuscate side-channel timing information), supporting structured mitigation of both direct and collateral leakage.
In summary, inference-time context leakage encompasses both explicit and collateral exposure of secret or correlated information during inference. The HMM-based, secure-refinement theoretic approach provides a rigorous, compositional, and quantitative toolkit for understanding, bounding, and mitigating such leakage. By modeling programs, correlations, and observational channels mathematically, the framework enables principled security engineering even in open and complex computational environments.