Hide-and-Seek Attribution in ML

Updated 14 December 2025

Hide-and-seek attribution is a family of methods that systematically hide parts of the input to measure their impact on model outputs.
The approach is applied in areas like weakly supervised segmentation, model fingerprinting, and explainable AI to enhance accountability.
Techniques leverage selective occlusion and evolutionary feedback to achieve robust attribution, balancing interpretability with performance.

Hide-and-seek attribution refers to a family of techniques and analytical frameworks that leverage selective hiding, occlusion, or masking of information—such as input features, response components, or provenance sources—to reveal, quantify, or enforce explainability, model provenance, or privacy attribution in machine learning systems. These methods have been instantiated across weakly supervised medical segmentation, model fingerprinting, explainable AI, privacy-preserving attribution, and information provenance auditing. All approaches share a core operational motif: systematically hiding information components to infer, measure, or enforce their contribution to some model prediction, classification, or attribution assignment.

1. Formalization and General Principles

Hide-and-seek attribution methods introduce a dynamic procedure by which candidate components of data or model behavior (pixels, text segments, prompts, content sources) are selectively revealed or hidden. The system then quantifies the contribution of each component through downstream effects—such as classifier confidence, segmentation accuracy, or successful attribution—in the presence or absence of those components. The process thus supports attribution assignments or explanations grounded not only in correlation but also in (operationally defined) necessity or sufficiency.

Mathematically, hide-and-seek attribution can be characterized by functions of the form

$\Delta(M) = \frac{C(\text{output with } M \text{ present}) - C(\text{output with } M \text{ hidden})}{C(\text{output full}) - C(\text{output healthy}) + \varepsilon},$

where $M$ denotes a candidate region or information channel, $C$ is a classifier or scoring function, and $\varepsilon$ is a small constant for normalization (Atad et al., 7 Dec 2025).

2. Applications in Weakly Supervised Medical Segmentation

Hide-and-seek attribution has enabled accurate lesion segmentation in scenarios where only weak (global or region-level) labels are available. In "Hide-and-Seek Attribution: Weakly Supervised Segmentation of Vertebral Metastases in CT" (Atad et al., 7 Dec 2025), the method operates by:

Using a diffusion autoencoder (DAE) and a latent-space classifier to separate healthy from malignant vertebrae based on vertebra-level labels.
Creating a "healthy edit" for each input by moving the semantic latent toward a classifier-determined healthy region, generating a pseudo-healthy image.
Computing residual maps $D = I - I_{\text{healthy}}$ , extracting connected components as candidate lesions.
For each candidate $M$ , all other candidates are hidden (replaced with pseudo-healthy regions), the image is reconstructed via DAE, and the classifier assesses the independent malignant contribution of $M$ .
The final segmentation mask includes components whose normalized attribution scores exceed a threshold.

This approach demonstrated F1-scores of 0.91 (blastic) and 0.85 (lytic) without voxel-level mask supervision, outperforming CAM and anomaly-based baselines (Atad et al., 7 Dec 2025). The method isolates each region’s independent effect while suppressing spurious generator artifacts.

3. Hide-and-Seek in Model Fingerprinting and Attribution

In LLM fingerprinting, hide-and-seek attribution enables identification of model family, provenance, or deployment pipeline by using black-box discrimination rather than relying on model internals. The "Hide and Seek: Fingerprinting LLMs with Evolutionary Learning" framework (Iourovitski et al., 6 Aug 2024) involves:

An Auditor LLM (the "Hider") generates candidate prompts designed to elicit maximally discriminative outputs across a set of black-box models.
A Detective LLM (the "Seeker") is given only the outputs of models for those prompts and must select the pair that originate from the same family.
This interaction is iterated; feedback on Detective accuracy is used by the Auditor for in-context refinement of prompt strategies.
The Auditor’s prompt-generation process is governed by an evolutionary optimizer, using mutation (paraphrasing, constraint manipulation, task order shuffling) and crossover (head-tail splicing) to evolve the prompt population for high fitness.
The method achieves up to 72% accuracy in family attribution across Llama, Mistral, Gemma, and Phi LLMs (Iourovitski et al., 6 Aug 2024).
Semantic manifold analysis through projection and cluster analysis of sentence embeddings reveals stable family-specific behavior signatures.

This process highlights how systematic hiding and seeking (via prompt and response selection, evolutionary search, and feedback) exposes semantic manifold distinctions in LLM behavior and supports robust black-box model attribution.

4. Provenance and Audit in Search-Enabled LLMs

Hide-and-seek attribution is instrumental in quantifying and controlling information provenance in web-enabled LLMs. The "Attribution Crisis in LLM Search Results" (Strauss et al., 27 Jun 2025) provides a formal metric—the "attribution gap"—defined as

$\text{Gap}_i = |\{\text{Visited URLs}_i\}| - |\{\text{Cited URLs}_i\}|$

for query $i$ , measuring the degree to which relevant sources accessed (visited) during retrieval-augmented generation (RAG) are omitted from citations in model outputs. This metric enables systematic auditing of “hidden” versus “revealed” information sources.

Empirical results show that:

Median attribution gaps vary substantially across models: Sonar (5.0), Gemini (4.0), GPT-4o (0.0) (Strauss et al., 27 Jun 2025).
Models frequently display "no search" (no site visits), "no citation" (not citing sources), or "high-volume, low-credit" (reading many sources but citing few).

A negative binomial hurdle model further elucidates determinants of the gap, distinguishing between the probability of a gap and the expected gap magnitude conditional on a gap. Citation efficiency, defined as the number of extra citations per additional URL visited, is found to be driven primarily by pipeline design rather than technical limitations, with model-level coefficients ranging from 0.19 to 0.45 (Strauss et al., 27 Jun 2025).

Standardized telemetry, full search and citation trace logging, and propagating stable identifiers for retrieved documents are recommended as architectural solutions to eliminate systematic hidden-attribution practices.

5. Explainable AI: Hide-and-Seek Neural Architectures

Earlier instantiations of hide-and-seek attribution focused on explainable AI via trainable masking models. In "Hide-and-Seek: A Template for Explainable AI" (Tagaris et al., 2020), the method consists of a two-network system:

A "Hider" network generates a binary mask $H$ selecting a sparse subset of input features (e.g., pixels).
A "Seeker" classifier receives only the masked input $\tilde{x} = H \odot x$ and must perform standard classification.
The combined objective trades off between interpretability (fraction of features retained) and fidelity (classification accuracy), with formal metrics (FIR, FII) for quantifying this balance.
Backpropagation is enabled via straight-through or stochastic estimators to handle binary masking.

This yields competitive or superior interpretability-fidelity tradeoffs compared to Grad-CAM or occlusion-based methods, producing fine, fully differentiable attribution maps in a single forward pass.

6. Location Privacy and Attribution: Differential Privacy "Hide-and-Seek"

In the context of privacy-preserving attribution for UAV detection, the hide-and-seek mechanism is realized through differential privacy concepts. Obfuscated broadcasts use the planar Laplace mechanism to enforce $\varepsilon$ -Geo-Indistinguishability (Brighente et al., 2022):

$f(x\,|\,x_0) = \frac{\varepsilon^2}{2\pi} \exp(-\varepsilon\, d(x_0, x)),\qquad x\in\mathbb{R}^2$

This ensures that broadcasts cannot be uniquely attributed to precise positions within a radius determined by the noise parameter $\varepsilon$ . Detection frameworks such as ICARUS then operate by observing whether any broadcast falls inside a protected region within a temporal window, trading increased privacy (low $\varepsilon$ ) against slightly higher detection error and attribution delays. Empirical results show detection accuracy $>97\%$ at moderate privacy levels (mean obfuscation radius $2.5$ km, $\varepsilon=0.8\,\text{km}^{-1}$ ), with false positive rates $\sim$ 10\% and average detection delay of 304 ms (Brighente et al., 2022).

7. Implications, Challenges, and Future Directions

Hide-and-seek attribution fundamentally advances the rigor and auditability of attribution, interpretability, and provenance in machine learning systems. Key implications include:

Model provenance and security: Black-box attribution enables forensic identification and copyright enforcement (Iourovitski et al., 6 Aug 2024).
Explainable AI: Joint optimization of performance and interpretability elevates faithfulness and reliability of attributive explanations (Tagaris et al., 2020).
Information ecosystem health: Attribution gap monitoring reveals model design impacts on content credit, affecting creator incentives and policy (Strauss et al., 27 Jun 2025).
Privacy and responsible broadcasting: Differential privacy mechanisms realize controlled trade-offs between attribution and privacy in geo-sensitive monitoring (Brighente et al., 2022).

Technical challenges persist, including the potential for adversarial obfuscation ("attribution evasion" arms races), balancing privacy/utility, and ensuring interpretations remain trustworthy as models and data modalities evolve. A plausible implication is that as hide-and-seek attribution techniques mature and proliferate, regulatory and market mechanisms will increasingly draw on these frameworks to enforce transparency, privacy, and accountable AI practices.