Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 77 tok/s
Gemini 2.5 Pro 56 tok/s Pro
GPT-5 Medium 34 tok/s Pro
GPT-5 High 35 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 208 tok/s Pro
GPT OSS 120B 462 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Enterprise AI Must Enforce Participant-Aware Access Control (2509.14608v1)

Published 18 Sep 2025 in cs.CR and cs.AI

Abstract: LLMs are increasingly deployed in enterprise settings where they interact with multiple users and are trained or fine-tuned on sensitive internal data. While fine-tuning enhances performance by internalizing domain knowledge, it also introduces a critical security risk: leakage of confidential training data to unauthorized users. These risks are exacerbated when LLMs are combined with Retrieval-Augmented Generation (RAG) pipelines that dynamically fetch contextual documents at inference time. We demonstrate data exfiltration attacks on AI assistants where adversaries can exploit current fine-tuning and RAG architectures to leak sensitive information by leveraging the lack of access control enforcement. We show that existing defenses, including prompt sanitization, output filtering, system isolation, and training-level privacy mechanisms, are fundamentally probabilistic and fail to offer robust protection against such attacks. We take the position that only a deterministic and rigorous enforcement of fine-grained access control during both fine-tuning and RAG-based inference can reliably prevent the leakage of sensitive data to unauthorized recipients. We introduce a framework centered on the principle that any content used in training, retrieval, or generation by an LLM is explicitly authorized for \emph{all users involved in the interaction}. Our approach offers a simple yet powerful paradigm shift for building secure multi-user LLM systems that are grounded in classical access control but adapted to the unique challenges of modern AI workflows. Our solution has been deployed in Microsoft Copilot Tuning, a product offering that enables organizations to fine-tune models using their own enterprise-specific data.

Summary

  • The paper presents a security framework ensuring only authorized users access fine-tuned LLMs through deterministic, participant-aware controls.
  • It introduces a biclique-based model construction method that aligns document ACLs with user privileges to secure enterprise data.
  • The study outlines practical strategies for enforcing deterministic access control in RAG pipelines, mitigating risks like cross-prompt injection attacks.

Deterministic Participant-Aware Access Control for Enterprise AI

Introduction

The deployment of LLMs in enterprise environments introduces significant risks of data leakage, particularly when models are fine-tuned on sensitive internal data or integrated with Retrieval-Augmented Generation (RAG) pipelines. The paper "Enterprise AI Must Enforce Participant-Aware Access Control" (2509.14608) presents a formal security framework for multi-user LLM systems, arguing that only deterministic, fine-grained access control—enforced at every stage of the AI pipeline—can provide robust guarantees against unauthorized data exposure. The authors demonstrate the inadequacy of existing probabilistic defenses and propose a participant-aware access control paradigm, with practical deployment in Microsoft Copilot Tuning.

Data Leakage Risks in Fine-Tuned and RAG-Integrated LLMs

Fine-tuning LLMs on enterprise data enables domain adaptation but also creates a direct vector for data exfiltration. When a model is fine-tuned on documents with heterogeneous access controls, any user with access to the model may extract information—either verbatim or paraphrased—that they are not authorized to see. This risk is exacerbated by the scaling properties of LLMs, which increase memorization and leakage as model and dataset sizes grow.

RAG pipelines further amplify these risks. In RAG, user prompts are augmented with retrieved documents at inference time. If the retrieval mechanism does not strictly enforce access control lists (ACLs) for all participants in the interaction, restricted content may be introduced into the model’s context, leading to unauthorized disclosures. The paper provides a concrete attack scenario—cross-prompt injection—where an adversary can exfiltrate confidential data by embedding hidden prompts in seemingly benign emails. Figure 1

Figure 1: A demonstration of a cross-prompt-injection attack on a RAG-based intelligent email assistant, illustrating how an attacker can induce data leakage via prompt injection and insufficient access control.

Inadequacy of Probabilistic Defenses

The authors systematically analyze existing defense mechanisms, categorizing them as data-centric (e.g., deduplication, data scrubbing), output-level (e.g., output filtering, DLP), and training-level (e.g., regularization, differential privacy). All these approaches are fundamentally probabilistic and lack end-to-end guarantees:

  • Data-centric methods reduce memorization but cannot guarantee the absence of leakage, especially for structured or correlated enterprise data.
  • Output filtering and DLP tools are unable to track fine-grained information flow from training data to outputs and cannot enforce recipient-specific policies.
  • Differential privacy provides only bounded leakage guarantees and is impractical for group-level or correlated data, as required in enterprise settings.

The paper asserts that these probabilistic defenses are insufficient for enterprise adoption, as they cannot provide deterministic guarantees of confidentiality.

Formal Security Principle and Biclique-Based Model Construction

The core contribution is a formal security principle: a user UU may access a fine-tuned model MM only if UU is authorized to access all documents DD used in MM's fine-tuning. This requirement is operationalized by representing the relationship between documents and entities (users or groups) as a bipartite graph, where edges encode ACLs. Secure model construction then reduces to finding bicliques in this graph: a model can be fine-tuned on a set of documents DD and made available to a set of users EE if and only if (D,E)(D, E) forms a biclique (i.e., every user in EE is authorized for every document in DD).

The paper discusses practical strategies for biclique selection:

  • Maximum biclique: Maximizes both document and entity coverage but is NP-complete to compute.
  • Maximal biclique heuristic: Efficiently finds large bicliques by leveraging the structure of enterprise ACLs, where a small number of entity sets govern large document corpora.

This approach is implemented in Microsoft Copilot Tuning, enabling secure, participant-aware fine-tuning for enterprise customers.

Deterministic Access Control in RAG Pipelines

The paper extends the access control principle to RAG-based inference. In any multi-user interaction (e.g., email drafting, collaborative editing), the system must ensure that all content—whether retrieved or generated—is accessible to all active participants. This requires:

  • At inference time, retrieved documents must be filtered so that only those accessible to all recipients are included in the LLM context.
  • If no fine-tuned model exists that is accessible to all participants, the system must fall back to a general-purpose model.

The authors demonstrate that this approach eliminates covert channels for data exfiltration, such as cross-prompt injection attacks, by making unauthorized information flow provably impossible.

Limitations of Existing Prompt Injection Defenses

The paper provides a detailed critique of current prompt injection defenses, including input sanitization, output filtering, and system-level isolation. All are shown to be probabilistic, with no deterministic guarantee of blocking sophisticated or steganographically encoded attacks. The cross-prompt injection attack illustrated in Figure 1 bypasses these defenses by exploiting the indistinguishability between benign and malicious prompts and the lack of recipient-aware access control.

Practical Implications and Future Directions

The participant-aware access control framework has immediate implications for the design and deployment of enterprise AI systems:

  • Auditability and Compliance: Deterministic enforcement of ACLs enables provable compliance with internal and regulatory data governance requirements.
  • Scalability: The biclique-based approach allows for efficient construction of shared models with maximal utility, balancing document and entity coverage.
  • Dynamic Environments: The framework can be extended to handle dynamic ACLs, weighted importance of users/documents, and misconfiguration detection.

Future research directions include efficient algorithms for biclique discovery at scale, integration with dynamic access control systems, and formal verification of end-to-end information flow in complex AI pipelines.

Conclusion

This work establishes that deterministic, participant-aware access control is necessary for the secure deployment of LLMs in multi-user enterprise environments. By formalizing the security requirement and providing practical algorithms for model construction and inference-time enforcement, the paper addresses a critical gap in current AI security practices. The proposed framework offers strong, auditable guarantees against data leakage, setting a new standard for enterprise AI confidentiality and compliance.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Explain it Like I'm 14

Explaining “Enterprise AI Must Enforce Participant-Aware Access Control”

What is this paper about?

This paper talks about how companies use AI tools like chatbots and writing assistants that learn from the company’s private documents. The big problem: these tools can accidentally reveal secrets to people who aren’t allowed to see them. The authors argue for a simple, strong rule to stop this: only use information (for training or answering) that every person involved is allowed to access. They show why current “fixes” aren’t good enough and explain how to build systems that follow this rule. Microsoft has already put these ideas into a product called Copilot Tuning.

What questions are the authors trying to answer?

In everyday language, they’re asking:

  • How do we stop AI helpers inside a company from leaking private info?
  • Why do common safety tricks (like filtering outputs or trying to hide secrets in the data) still fail?
  • What clear, reliable rule can guarantee that an AI won’t tell secrets to the wrong person?
  • How can we apply that rule both when training the AI and when it looks up documents to answer a question?

How did they paper the problem? (Methods and key ideas)

To make the discussion concrete, the authors do two things: 1) Show realistic attacks on today’s systems: - Imagine Bob emails Alice. Alice asks her AI to draft a reply. The AI uses a “RAG” system (Retrieval-Augmented Generation), which is just a fancy way of saying it looks up related documents and puts them in the AI’s “brain” right before answering, like checking notes before writing an essay. - Bob hides secret instructions in the email (for example, white text on a white background). The AI reads them even though Alice can’t see them. The instructions tell the AI to fetch private info (like “Project X made $7 million”) and then sneak it into the reply in a tricky way (like suggesting a meeting on the 7th, which leaks “7”). - This is called a cross-prompt injection attack. It can work even if Alice reviews the draft, because the leak is subtle.

2) Propose a strict access control rule and show how to enforce it: - Access control is like keys for rooms in a school. Only students with the right key can enter a room. Similarly, only certain employees can see certain files. - The rule: any piece of content the AI uses—during training, retrieval, or writing—must be allowed for every person involved in the conversation. - Training-time rule (fine-tuning): If a model is trained on documents A, B, and C, then you can only give that model to people who are allowed to see A, B, and C. No exceptions. - Answer-time rule (RAG): If Alice asks the AI to reply to Bob, the AI should only bring in documents that both Alice and Bob are allowed to see. If Bob isn’t allowed, the AI must either ask Alice for explicit permission or not use that document at all.

They also explain a practical way to pick safe “training sets” and “user groups” using a simple picture idea:

  • Draw two columns: documents on the left, people (or groups) on the right. Connect a line wherever a person is allowed to see a document.
  • A safe training setup is a “perfect rectangle” of connections: a set of documents that every person in a chosen group is allowed to see. In math, this is called a biclique. In plain terms: pick a pile of documents that everyone in a chosen team can all access. Train the model only on that pile, and give the model only to that team.

This approach has been built into Microsoft Copilot Tuning so companies can fine-tune models on their data without breaking access rules.

What did they find, and why does it matter?

Main findings:

  • Current defenses are “probabilistic,” meaning they work only some of the time:
    • Input checks (like detecting prompt injection),
    • Output filters (like scanning for sensitive text), and
    • Special training tricks (like adding noise or sanitizing data)
    • can all be bypassed by clever attackers or hidden messages.
  • These partial fixes don’t give hard guarantees. It’s like wearing a seatbelt that works 90% of the time—you still can’t trust it when safety really matters.
  • A “deterministic” (always-on) rule based on access control does give strong guarantees:
    • If the AI never trains on, retrieves, or writes anything that someone in the conversation isn’t allowed to see, then leaks can’t happen through that path.
  • The email scenario proves the point: the attack only works because the system doesn’t verify Bob’s permissions before using Alice’s private info. If it did, the leak would be blocked.
  • For training (fine-tuning), the rule is simple and strict: only give a fine-tuned model to users who are allowed to see every document used to train it. This prevents the model from “flattening” the company’s data silos and leaking across departments.

Why this matters:

  • Companies won’t fully trust AI until it can protect secrets. This rule gives clear, checkable protection that fits how companies already manage permissions.
  • It reduces both accidental leaks (during normal use) and deliberate attacks (from adversaries).

What does this mean for the future?

  • Building AI that respects “who can see what” at every step is key to safe, large-scale use in businesses, schools, hospitals—anywhere with private information.
  • Systems should:
    • Enforce access control during training (who gets the fine-tuned model),
    • Enforce it during retrieval (which documents can appear in context),
    • Enforce it during generation (who is the output going to), and
    • Ask for explicit consent if needed.
  • In multi-person situations (reply-all, group chats, shared documents), the AI should only use content that every recipient is allowed to see.
  • This approach is already in use (e.g., Copilot Tuning), showing it’s practical, not just theoretical.

In short: The authors argue that the only reliable way to stop AI assistants from leaking secrets is to treat them like locked doors with proper keys. If everyone in the conversation has a key to the information, it can be used. If not, the AI must leave it out or ask for permission. This simple, strict rule closes the loopholes that attackers exploit and makes enterprise AI safer.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Knowledge Gaps

Unresolved gaps, limitations, and open questions

The paper leaves the following concrete knowledge gaps and open questions that future work could address:

  • Formal end-to-end guarantees: Provide a precise definition of “no unauthorized leakage” and prove confidentiality under the proposed access-control principle given LLM stochasticity, paraphrasing, and non-exact memorization (i.e., formal information-flow/security model for fine-tuning + RAG).
  • Utility–security trade-offs: Quantify how entity coverage and document coverage affect task performance and user satisfaction; report empirical results on real enterprise ACL graphs and workloads.
  • Scalable biclique discovery: Develop algorithms with approximation guarantees for maximum/maximal bicliques on massive, dense bipartite ACL graphs; support distributed/streaming computation.
  • Dynamic ACL changes: Design certified machine unlearning or incremental fine-tuning to revoke training influence when users lose access or documents are removed; provide operational SLAs and cost/latency bounds.
  • Fine-grained ACLs: Extend from document-level ACLs to paragraph/section/field-level controls; define safe compositional training and retrieval when content fragments have heterogeneous permissions.
  • Group membership semantics: Resolve nested groups, dynamic distribution lists, and cross-tenant identities deterministically; handle external recipients with partial or unknown ACLs.
  • Consent as a policy primitive: Specify what constitutes “explicit consent” in agentic mode, with auditable logs, revocation, and regulatory alignment (e.g., GDPR/CCPA); evaluate UX that avoids dark patterns.
  • Multi-model routing: Formalize safe composition of multiple fine-tuned models (different ACL domains) and public models; design routers with proofs that outputs cannot mix unauthorized training signals.
  • Full-pipeline ACL enforcement: Ensure ACL checks cover embeddings, retrieval, tool/function calls, caches, summarizers, and prompt assembly; prove there is no bypass via secondary tools or agent plans.
  • Steganography within authorized content: Assess residual risks when retrieved data is authorized but outputs are covertly encoded; develop detectors/guardrails without breaking determinism or utility.
  • Active participant definition: Precisely define recipients for enforcement (To/CC/BCC, delegated access, auto-forwarding, downstream sharing) and handle post-send disclosure to non-authorized parties.
  • Performance/latency impact: Measure inference overhead of per-participant ACL checks; design caching/indexing strategies that preserve guarantees while meeting enterprise latency targets.
  • Broader threat models: Analyze attacks beyond email (collab docs, calendars, attachments), cross-modal inputs (images/PDFs), tool/plugin ecosystems, and multi-hop retrieval; provide comprehensive coverage.
  • Base-model provenance risk: Address leakage risks from pre-trained weights that may memorize sensitive third-party data; specify requirements/auditing for base model training corpora.
  • Verifiable auditing: Build information-flow attestations proving outputs depended only on content authorized for all participants; standardize logs for compliance and incident response.
  • ACL misconfiguration handling: Detect and remediate incorrect or stale ACLs; paper their impact on biclique selection, model eligibility, and leakage risk.
  • Training data lineage: Maintain provenance linking documents → fine-tuned model versions; support rollback/revocation and differential analysis across model updates.
  • Fairness and access equity: Evaluate whether biclique-based selection disproportionately excludes certain teams/domains; propose metrics and mitigation strategies.
  • Controlled cross-silo sharing: Enable legitimate inter-department sharing through exception policies that retain deterministic guarantees (e.g., time-bounded, scope-limited disclosures).
  • Empirical defense evaluation: Benchmark the deterministic approach vs. probabilistic defenses (input checks, output filtering, isolation) on XPIA and jailbreak suites; report success/false-positive rates.
  • Model access security: Prevent unauthorized weight copying and output re-sharing; integrate attestation, confidential computing, tenant isolation, and enforcement of recipient-bound usage tokens.
  • Privacy technique interplay: Explore combining ACL enforcement with differential privacy, group privacy, or certified unlearning to protect individuals and correlated groups while retaining utility.
  • Enterprise-scale deployment: Engineer storage/index joins and ACL resolution for millions of documents and thousands of entities; evaluate horizontal scaling and failure modes.
  • Edge cases in mixed-ACL content: Define deterministic rules for summarization, aggregation, and redaction when retrieved context spans heterogeneous permissions; ensure outputs remain compliant.
  • Policy boundary conditions: Clarify “authorized for all users” for derived, aggregated, or statistical outputs (e.g., averages or trends); specify when summaries become disclosive and how to enforce limits.
Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Glossary

  • Access control enforcement: The act of ensuring that only authorized users can access specific resources or data throughout a system’s operations. "adversaries can exploit current fine-tuning and RAG architectures to leak sensitive information by leveraging the lack of access control enforcement."
  • Access control list (ACL): A permissions list attached to a resource specifying which users or groups are authorized to access it and how. "To this end, we examine the access control lists (ACLs) associated with data in commonly used file systems, such as SharePoint sites, Google Workspace, and similar platforms."
  • Active Directory security groups: Group-based entities in Microsoft Active Directory used to manage user permissions collectively. "An entity E could either be a single user or a group of users (e.g., distribution lists or Active Directory security groups)."
  • Agentic mode: An interaction mode where the assistant drafts outputs for user review before sending or acting. "In the {\em agentic mode}, the assistant acts as a drafting partner."
  • Biclique: A complete bipartite subgraph where every node on one side connects to every node on the other. "A biclique is a complete bipartite graph where each vertex in one set is connected to each vertex in other set."
  • Bipartite graph: A graph whose vertices can be divided into two disjoint sets with edges only across sets, not within. "A {\em bipartite graph} GG is a graph in which vertices can be divided into two disjoint sets such that no two vertices in the same set are connected."
  • Branch-and-bound techniques: A combinatorial search strategy that prunes the search space using bounds to find optimal solutions. "While algorithms such as branch-and-bound techniques~\cite{mbc,z3} can be employed to find a maximum biclique, they are computationally expensive, especially in dense graphs—a situation commonly encountered in enterprise settings."
  • Confidence masking: A defense that adds noise or reduces confidence in model outputs to obscure potentially memorized information. "Techniques such as confidence masking \cite{jia2019memguard} introduce noise into model outputs to obscure potential memorized content, but often degrade utility without providing robust guarantees against leakage."
  • Covert channel: A hidden communication method that transfers information in unintended ways, bypassing normal controls. "This attack creates a covert channel by abusing the assistant’s retrieval and generation pipeline to exfiltrate sensitive data—without Alice’s awareness or consent."
  • Cross-prompt injection attack (XPIA): An attack where malicious instructions are hidden in seemingly benign inputs processed by an LLM, causing unintended actions or leaks. "A cross-prompt injection attack (XPIA) occurs when an adversary embeds hidden instructions into benign-looking content—such as emails—which are subsequently processed by an LLM during tasks like reply generation."
  • Data exfiltration attacks: Techniques by which adversaries illicitly extract sensitive data from systems. "We demonstrate data exfiltration attacks on AI assistants where adversaries can exploit current fine-tuning and RAG architectures to leak sensitive information by leveraging the lack of access control enforcement."
  • Data Loss Prevention (DLP): Tools and methods that detect and prevent the exposure of sensitive information in outputs or communications. "Data Loss Prevention (DLP) tools \cite{hart2011text, neerbek2018rnn, purview} aim to detect and prevent the exposure of sensitive content—such as PII, protected health information~(PHI), or financial data—through pattern matching or rule-based classification."
  • Data scrubbing: Preprocessing that removes or redacts sensitive elements (e.g., PII) from datasets before training. "data scrubbing techniques \cite{lukas2023analyzing, vakili2022downstream} target the removal of personally identifiable information (PII)."
  • Differential privacy (DP): A formal privacy framework ensuring that outputs of an algorithm are statistically similar whether or not any single record is included. "Differential privacy offers a more formal approach to limiting information leakage \cite{dwork2006dp, ramaswamy2020training, kandpal2023user}."
  • Document coverage: The extent to which selected training documents capture essential domain knowledge for the model. "To maintain task relevance, we must selectively include documents containing essential domain knowledge-a property we refer to as {\em document coverage}."
  • Entity coverage: The breadth of users or groups who are permitted to use a fine-tuned model. "A more practical goal is to fine-tune models that can be shared across many users/entities—a property we refer to as {\em entity coverage.}"
  • Fine-grained access controls: Detailed permission rules specifying access at narrow units (e.g., per document or field) rather than coarse roles. "\textbf{We take the position that deterministic and provable enforcement of fine-grained access controls is essential for the safe deployment of LLMs in multi-user environments such as enterprises."
  • Fine-tuning: Continuing training of a pre-trained model on a smaller, task-specific dataset to adapt it to a domain or application. "Fine-tuning is the process of taking a pre-trained machine learning model—such as a LLM—and continuing its training on a smaller, task-specific dataset~\cite{treviso2023efficient}."
  • Fully agentic mode: An interaction mode where the assistant acts autonomously, sending outputs without human approval. "In contrast, the {\em fully agentic mode} removes the manual checkpoint entirely."
  • Group differential privacy: An extension of DP that protects groups of correlated records, not just individuals. "Protecting such large-scale information requires group differential privacy \cite{dwork2014algorithmic}, which increases the privacy budget ϵ\epsilon and degrades utility to the point that the method becomes nonviable for real-world deployment."
  • Input sanitization: Preprocessing that detects and strips potentially malicious or unsafe patterns from inputs before model processing. "Black-box strategies such as input sanitization~\cite{nemoguardrails} identify and strip potentially harmful patterns"
  • Maximal biclique: A biclique that cannot be enlarged by adding more vertices without losing its complete bipartite property. "Here, bicliques to which no more documents or entities can be added without violating the biclique property are called maximal bicliques."
  • Maximum biclique: The biclique with the greatest number of edges in a given bipartite graph. "A biclique with the largest number of edges is referred to as a maximum biclique."
  • Membership inference attacks: Attacks that try to determine whether specific data points were included in a model’s training set. "as well as membership inference attacks that aim to determine whether specific data was part of the training set"
  • N-gram deduplication: A technique that removes repeated n-grams from training data to reduce memorization. "N-gram deduplication \cite{lee2021deduplicating, kandpal2022deduplicating} reduces memorization by eliminating repeated sequences"
  • NP-Complete: A class of decision problems believed to be computationally intractable to solve exactly in polynomial time. "However, identifying a maximum biclique is known to be NP-Complete~\cite{PEETERS2003651}."
  • Obfuscation methods: Techniques that perturb data (e.g., adding noise) to make sensitive information harder to learn or reconstruct. "Obfuscation methods \cite{zhang2018privacy} perturb the input distribution by injecting noise."
  • Output filtering: Post-processing model outputs to remove or mask sensitive or disallowed content. "We show that existing defenses, including prompt sanitization, output filtering, system isolation, and training-level privacy mechanisms, are fundamentally probabilistic and fail to offer robust protection against such attacks."
  • Privacy parameter ε: The DP parameter controlling the bound on information leakage; smaller ε provides stronger privacy. "First, even a differentially private training algorithm allows bounded leakage, captured by the privacy parameter ϵ\epsilon, which often must be large to maintain acceptable utility."
  • Prompt injection: Maliciously crafted instructions that subvert an LLM’s intended behavior by being included in inputs or retrieved content. "However, his goal is to extract private information from Alice’s mailbox by manipulating the assistant’s behavior—specifically, by leveraging prompt injection to influence retrieval and generation."
  • Regularization: Training techniques that reduce overfitting and memorization by penalizing model complexity or constraining learning. "Regularization encompasses techniques aimed at reducing overfitting, thereby limiting the model’s tendency to memorize individual training examples~\cite{yin2021defending, li2021membership}."
  • Retrieval-Augmented Generation (RAG): An approach where external documents are retrieved at inference time to ground and improve model outputs. "These risks are exacerbated when LLMs are combined with Retrieval-Augmented Generation (RAG) pipelines that dynamically fetch contextual documents at inference time."
  • Semantic search: Retrieval based on meaning via vector similarity rather than exact keyword matching. "the system embeds this as well, and performs a semantic search over the index, retrieving the top-k most relevant items."
  • Semantic vector space: A numerical embedding space where semantically similar items are close together. "The assistant first indexes Alice’s mailbox—subject lines, message bodies, and attachments—by embedding each item into a semantic vector space."
  • Spotlighting: A defense that tags input segments with their origin (trusted vs. untrusted) to guide safe processing. "while spotlighting~\cite{hines2024defendingindirectpromptinjection} tags portions of the input with their origin (e.g., trusted user input vs. untrusted external content)."
  • Steganographic techniques: Methods of hiding information within innocuous-looking content so that it is not easily detected. "embedding hidden instructions in his email and then decoding the model’s response via steganographic techniques."
  • System-level isolation: Architectural controls that constrain how untrusted inputs affect system behavior and data flows. "probabilistic defenses---such as input sanitization, output filtering, system-level isolation, and training-level techniques---which, while useful, are fundamentally insufficient"
  • Taint tracking: A technique that tracks the flow of sensitive data through computations or generations to detect leakage. "while taint tracking~\cite{siddiqui2024permissiveinformationflowanalysislarge} traces the propagation of sensitive data through generation."
  • Task drift: A deviation of the model’s behavior from the intended task, often used as a signal for possible attacks or policy violations. "Techniques like TaskTracker~\cite{abdelnabi2025driftcatchingLLMtask} identify task drift to detect unexpected behavior."
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 10 likes.

Youtube Logo Streamline Icon: https://streamlinehq.com