Enterprise AI Must Enforce Participant-Aware Access Control (2509.14608v1)
Abstract: LLMs are increasingly deployed in enterprise settings where they interact with multiple users and are trained or fine-tuned on sensitive internal data. While fine-tuning enhances performance by internalizing domain knowledge, it also introduces a critical security risk: leakage of confidential training data to unauthorized users. These risks are exacerbated when LLMs are combined with Retrieval-Augmented Generation (RAG) pipelines that dynamically fetch contextual documents at inference time. We demonstrate data exfiltration attacks on AI assistants where adversaries can exploit current fine-tuning and RAG architectures to leak sensitive information by leveraging the lack of access control enforcement. We show that existing defenses, including prompt sanitization, output filtering, system isolation, and training-level privacy mechanisms, are fundamentally probabilistic and fail to offer robust protection against such attacks. We take the position that only a deterministic and rigorous enforcement of fine-grained access control during both fine-tuning and RAG-based inference can reliably prevent the leakage of sensitive data to unauthorized recipients. We introduce a framework centered on the principle that any content used in training, retrieval, or generation by an LLM is explicitly authorized for \emph{all users involved in the interaction}. Our approach offers a simple yet powerful paradigm shift for building secure multi-user LLM systems that are grounded in classical access control but adapted to the unique challenges of modern AI workflows. Our solution has been deployed in Microsoft Copilot Tuning, a product offering that enables organizations to fine-tune models using their own enterprise-specific data.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
Explaining “Enterprise AI Must Enforce Participant-Aware Access Control”
What is this paper about?
This paper talks about how companies use AI tools like chatbots and writing assistants that learn from the company’s private documents. The big problem: these tools can accidentally reveal secrets to people who aren’t allowed to see them. The authors argue for a simple, strong rule to stop this: only use information (for training or answering) that every person involved is allowed to access. They show why current “fixes” aren’t good enough and explain how to build systems that follow this rule. Microsoft has already put these ideas into a product called Copilot Tuning.
What questions are the authors trying to answer?
In everyday language, they’re asking:
- How do we stop AI helpers inside a company from leaking private info?
- Why do common safety tricks (like filtering outputs or trying to hide secrets in the data) still fail?
- What clear, reliable rule can guarantee that an AI won’t tell secrets to the wrong person?
- How can we apply that rule both when training the AI and when it looks up documents to answer a question?
How did they paper the problem? (Methods and key ideas)
To make the discussion concrete, the authors do two things: 1) Show realistic attacks on today’s systems: - Imagine Bob emails Alice. Alice asks her AI to draft a reply. The AI uses a “RAG” system (Retrieval-Augmented Generation), which is just a fancy way of saying it looks up related documents and puts them in the AI’s “brain” right before answering, like checking notes before writing an essay. - Bob hides secret instructions in the email (for example, white text on a white background). The AI reads them even though Alice can’t see them. The instructions tell the AI to fetch private info (like “Project X made $7 million”) and then sneak it into the reply in a tricky way (like suggesting a meeting on the 7th, which leaks “7”). - This is called a cross-prompt injection attack. It can work even if Alice reviews the draft, because the leak is subtle.
2) Propose a strict access control rule and show how to enforce it: - Access control is like keys for rooms in a school. Only students with the right key can enter a room. Similarly, only certain employees can see certain files. - The rule: any piece of content the AI uses—during training, retrieval, or writing—must be allowed for every person involved in the conversation. - Training-time rule (fine-tuning): If a model is trained on documents A, B, and C, then you can only give that model to people who are allowed to see A, B, and C. No exceptions. - Answer-time rule (RAG): If Alice asks the AI to reply to Bob, the AI should only bring in documents that both Alice and Bob are allowed to see. If Bob isn’t allowed, the AI must either ask Alice for explicit permission or not use that document at all.
They also explain a practical way to pick safe “training sets” and “user groups” using a simple picture idea:
- Draw two columns: documents on the left, people (or groups) on the right. Connect a line wherever a person is allowed to see a document.
- A safe training setup is a “perfect rectangle” of connections: a set of documents that every person in a chosen group is allowed to see. In math, this is called a biclique. In plain terms: pick a pile of documents that everyone in a chosen team can all access. Train the model only on that pile, and give the model only to that team.
This approach has been built into Microsoft Copilot Tuning so companies can fine-tune models on their data without breaking access rules.
What did they find, and why does it matter?
Main findings:
- Current defenses are “probabilistic,” meaning they work only some of the time:
- Input checks (like detecting prompt injection),
- Output filters (like scanning for sensitive text), and
- Special training tricks (like adding noise or sanitizing data)
- can all be bypassed by clever attackers or hidden messages.
- These partial fixes don’t give hard guarantees. It’s like wearing a seatbelt that works 90% of the time—you still can’t trust it when safety really matters.
- A “deterministic” (always-on) rule based on access control does give strong guarantees:
- If the AI never trains on, retrieves, or writes anything that someone in the conversation isn’t allowed to see, then leaks can’t happen through that path.
- The email scenario proves the point: the attack only works because the system doesn’t verify Bob’s permissions before using Alice’s private info. If it did, the leak would be blocked.
- For training (fine-tuning), the rule is simple and strict: only give a fine-tuned model to users who are allowed to see every document used to train it. This prevents the model from “flattening” the company’s data silos and leaking across departments.
Why this matters:
- Companies won’t fully trust AI until it can protect secrets. This rule gives clear, checkable protection that fits how companies already manage permissions.
- It reduces both accidental leaks (during normal use) and deliberate attacks (from adversaries).
What does this mean for the future?
- Building AI that respects “who can see what” at every step is key to safe, large-scale use in businesses, schools, hospitals—anywhere with private information.
- Systems should:
- Enforce access control during training (who gets the fine-tuned model),
- Enforce it during retrieval (which documents can appear in context),
- Enforce it during generation (who is the output going to), and
- Ask for explicit consent if needed.
- In multi-person situations (reply-all, group chats, shared documents), the AI should only use content that every recipient is allowed to see.
- This approach is already in use (e.g., Copilot Tuning), showing it’s practical, not just theoretical.
In short: The authors argue that the only reliable way to stop AI assistants from leaking secrets is to treat them like locked doors with proper keys. If everyone in the conversation has a key to the information, it can be used. If not, the AI must leave it out or ask for permission. This simple, strict rule closes the loopholes that attackers exploit and makes enterprise AI safer.
Knowledge Gaps
Unresolved gaps, limitations, and open questions
The paper leaves the following concrete knowledge gaps and open questions that future work could address:
- Formal end-to-end guarantees: Provide a precise definition of “no unauthorized leakage” and prove confidentiality under the proposed access-control principle given LLM stochasticity, paraphrasing, and non-exact memorization (i.e., formal information-flow/security model for fine-tuning + RAG).
- Utility–security trade-offs: Quantify how entity coverage and document coverage affect task performance and user satisfaction; report empirical results on real enterprise ACL graphs and workloads.
- Scalable biclique discovery: Develop algorithms with approximation guarantees for maximum/maximal bicliques on massive, dense bipartite ACL graphs; support distributed/streaming computation.
- Dynamic ACL changes: Design certified machine unlearning or incremental fine-tuning to revoke training influence when users lose access or documents are removed; provide operational SLAs and cost/latency bounds.
- Fine-grained ACLs: Extend from document-level ACLs to paragraph/section/field-level controls; define safe compositional training and retrieval when content fragments have heterogeneous permissions.
- Group membership semantics: Resolve nested groups, dynamic distribution lists, and cross-tenant identities deterministically; handle external recipients with partial or unknown ACLs.
- Consent as a policy primitive: Specify what constitutes “explicit consent” in agentic mode, with auditable logs, revocation, and regulatory alignment (e.g., GDPR/CCPA); evaluate UX that avoids dark patterns.
- Multi-model routing: Formalize safe composition of multiple fine-tuned models (different ACL domains) and public models; design routers with proofs that outputs cannot mix unauthorized training signals.
- Full-pipeline ACL enforcement: Ensure ACL checks cover embeddings, retrieval, tool/function calls, caches, summarizers, and prompt assembly; prove there is no bypass via secondary tools or agent plans.
- Steganography within authorized content: Assess residual risks when retrieved data is authorized but outputs are covertly encoded; develop detectors/guardrails without breaking determinism or utility.
- Active participant definition: Precisely define recipients for enforcement (To/CC/BCC, delegated access, auto-forwarding, downstream sharing) and handle post-send disclosure to non-authorized parties.
- Performance/latency impact: Measure inference overhead of per-participant ACL checks; design caching/indexing strategies that preserve guarantees while meeting enterprise latency targets.
- Broader threat models: Analyze attacks beyond email (collab docs, calendars, attachments), cross-modal inputs (images/PDFs), tool/plugin ecosystems, and multi-hop retrieval; provide comprehensive coverage.
- Base-model provenance risk: Address leakage risks from pre-trained weights that may memorize sensitive third-party data; specify requirements/auditing for base model training corpora.
- Verifiable auditing: Build information-flow attestations proving outputs depended only on content authorized for all participants; standardize logs for compliance and incident response.
- ACL misconfiguration handling: Detect and remediate incorrect or stale ACLs; paper their impact on biclique selection, model eligibility, and leakage risk.
- Training data lineage: Maintain provenance linking documents → fine-tuned model versions; support rollback/revocation and differential analysis across model updates.
- Fairness and access equity: Evaluate whether biclique-based selection disproportionately excludes certain teams/domains; propose metrics and mitigation strategies.
- Controlled cross-silo sharing: Enable legitimate inter-department sharing through exception policies that retain deterministic guarantees (e.g., time-bounded, scope-limited disclosures).
- Empirical defense evaluation: Benchmark the deterministic approach vs. probabilistic defenses (input checks, output filtering, isolation) on XPIA and jailbreak suites; report success/false-positive rates.
- Model access security: Prevent unauthorized weight copying and output re-sharing; integrate attestation, confidential computing, tenant isolation, and enforcement of recipient-bound usage tokens.
- Privacy technique interplay: Explore combining ACL enforcement with differential privacy, group privacy, or certified unlearning to protect individuals and correlated groups while retaining utility.
- Enterprise-scale deployment: Engineer storage/index joins and ACL resolution for millions of documents and thousands of entities; evaluate horizontal scaling and failure modes.
- Edge cases in mixed-ACL content: Define deterministic rules for summarization, aggregation, and redaction when retrieved context spans heterogeneous permissions; ensure outputs remain compliant.
- Policy boundary conditions: Clarify “authorized for all users” for derived, aggregated, or statistical outputs (e.g., averages or trends); specify when summaries become disclosive and how to enforce limits.
Glossary
- Access control enforcement: The act of ensuring that only authorized users can access specific resources or data throughout a system’s operations. "adversaries can exploit current fine-tuning and RAG architectures to leak sensitive information by leveraging the lack of access control enforcement."
- Access control list (ACL): A permissions list attached to a resource specifying which users or groups are authorized to access it and how. "To this end, we examine the access control lists (ACLs) associated with data in commonly used file systems, such as SharePoint sites, Google Workspace, and similar platforms."
- Active Directory security groups: Group-based entities in Microsoft Active Directory used to manage user permissions collectively. "An entity E could either be a single user or a group of users (e.g., distribution lists or Active Directory security groups)."
- Agentic mode: An interaction mode where the assistant drafts outputs for user review before sending or acting. "In the {\em agentic mode}, the assistant acts as a drafting partner."
- Biclique: A complete bipartite subgraph where every node on one side connects to every node on the other. "A biclique is a complete bipartite graph where each vertex in one set is connected to each vertex in other set."
- Bipartite graph: A graph whose vertices can be divided into two disjoint sets with edges only across sets, not within. "A {\em bipartite graph} is a graph in which vertices can be divided into two disjoint sets such that no two vertices in the same set are connected."
- Branch-and-bound techniques: A combinatorial search strategy that prunes the search space using bounds to find optimal solutions. "While algorithms such as branch-and-bound techniques~\cite{mbc,z3} can be employed to find a maximum biclique, they are computationally expensive, especially in dense graphsâa situation commonly encountered in enterprise settings."
- Confidence masking: A defense that adds noise or reduces confidence in model outputs to obscure potentially memorized information. "Techniques such as confidence masking \cite{jia2019memguard} introduce noise into model outputs to obscure potential memorized content, but often degrade utility without providing robust guarantees against leakage."
- Covert channel: A hidden communication method that transfers information in unintended ways, bypassing normal controls. "This attack creates a covert channel by abusing the assistantâs retrieval and generation pipeline to exfiltrate sensitive dataâwithout Aliceâs awareness or consent."
- Cross-prompt injection attack (XPIA): An attack where malicious instructions are hidden in seemingly benign inputs processed by an LLM, causing unintended actions or leaks. "A cross-prompt injection attack (XPIA) occurs when an adversary embeds hidden instructions into benign-looking contentâsuch as emailsâwhich are subsequently processed by an LLM during tasks like reply generation."
- Data exfiltration attacks: Techniques by which adversaries illicitly extract sensitive data from systems. "We demonstrate data exfiltration attacks on AI assistants where adversaries can exploit current fine-tuning and RAG architectures to leak sensitive information by leveraging the lack of access control enforcement."
- Data Loss Prevention (DLP): Tools and methods that detect and prevent the exposure of sensitive information in outputs or communications. "Data Loss Prevention (DLP) tools \cite{hart2011text, neerbek2018rnn, purview} aim to detect and prevent the exposure of sensitive contentâsuch as PII, protected health information~(PHI), or financial dataâthrough pattern matching or rule-based classification."
- Data scrubbing: Preprocessing that removes or redacts sensitive elements (e.g., PII) from datasets before training. "data scrubbing techniques \cite{lukas2023analyzing, vakili2022downstream} target the removal of personally identifiable information (PII)."
- Differential privacy (DP): A formal privacy framework ensuring that outputs of an algorithm are statistically similar whether or not any single record is included. "Differential privacy offers a more formal approach to limiting information leakage \cite{dwork2006dp, ramaswamy2020training, kandpal2023user}."
- Document coverage: The extent to which selected training documents capture essential domain knowledge for the model. "To maintain task relevance, we must selectively include documents containing essential domain knowledge-a property we refer to as {\em document coverage}."
- Entity coverage: The breadth of users or groups who are permitted to use a fine-tuned model. "A more practical goal is to fine-tune models that can be shared across many users/entitiesâa property we refer to as {\em entity coverage.}"
- Fine-grained access controls: Detailed permission rules specifying access at narrow units (e.g., per document or field) rather than coarse roles. "\textbf{We take the position that deterministic and provable enforcement of fine-grained access controls is essential for the safe deployment of LLMs in multi-user environments such as enterprises."
- Fine-tuning: Continuing training of a pre-trained model on a smaller, task-specific dataset to adapt it to a domain or application. "Fine-tuning is the process of taking a pre-trained machine learning modelâsuch as a LLMâand continuing its training on a smaller, task-specific dataset~\cite{treviso2023efficient}."
- Fully agentic mode: An interaction mode where the assistant acts autonomously, sending outputs without human approval. "In contrast, the {\em fully agentic mode} removes the manual checkpoint entirely."
- Group differential privacy: An extension of DP that protects groups of correlated records, not just individuals. "Protecting such large-scale information requires group differential privacy \cite{dwork2014algorithmic}, which increases the privacy budget and degrades utility to the point that the method becomes nonviable for real-world deployment."
- Input sanitization: Preprocessing that detects and strips potentially malicious or unsafe patterns from inputs before model processing. "Black-box strategies such as input sanitization~\cite{nemoguardrails} identify and strip potentially harmful patterns"
- Maximal biclique: A biclique that cannot be enlarged by adding more vertices without losing its complete bipartite property. "Here, bicliques to which no more documents or entities can be added without violating the biclique property are called maximal bicliques."
- Maximum biclique: The biclique with the greatest number of edges in a given bipartite graph. "A biclique with the largest number of edges is referred to as a maximum biclique."
- Membership inference attacks: Attacks that try to determine whether specific data points were included in a model’s training set. "as well as membership inference attacks that aim to determine whether specific data was part of the training set"
- N-gram deduplication: A technique that removes repeated n-grams from training data to reduce memorization. "N-gram deduplication \cite{lee2021deduplicating, kandpal2022deduplicating} reduces memorization by eliminating repeated sequences"
- NP-Complete: A class of decision problems believed to be computationally intractable to solve exactly in polynomial time. "However, identifying a maximum biclique is known to be NP-Complete~\cite{PEETERS2003651}."
- Obfuscation methods: Techniques that perturb data (e.g., adding noise) to make sensitive information harder to learn or reconstruct. "Obfuscation methods \cite{zhang2018privacy} perturb the input distribution by injecting noise."
- Output filtering: Post-processing model outputs to remove or mask sensitive or disallowed content. "We show that existing defenses, including prompt sanitization, output filtering, system isolation, and training-level privacy mechanisms, are fundamentally probabilistic and fail to offer robust protection against such attacks."
- Privacy parameter ε: The DP parameter controlling the bound on information leakage; smaller ε provides stronger privacy. "First, even a differentially private training algorithm allows bounded leakage, captured by the privacy parameter , which often must be large to maintain acceptable utility."
- Prompt injection: Maliciously crafted instructions that subvert an LLM’s intended behavior by being included in inputs or retrieved content. "However, his goal is to extract private information from Aliceâs mailbox by manipulating the assistantâs behaviorâspecifically, by leveraging prompt injection to influence retrieval and generation."
- Regularization: Training techniques that reduce overfitting and memorization by penalizing model complexity or constraining learning. "Regularization encompasses techniques aimed at reducing overfitting, thereby limiting the modelâs tendency to memorize individual training examples~\cite{yin2021defending, li2021membership}."
- Retrieval-Augmented Generation (RAG): An approach where external documents are retrieved at inference time to ground and improve model outputs. "These risks are exacerbated when LLMs are combined with Retrieval-Augmented Generation (RAG) pipelines that dynamically fetch contextual documents at inference time."
- Semantic search: Retrieval based on meaning via vector similarity rather than exact keyword matching. "the system embeds this as well, and performs a semantic search over the index, retrieving the top-k most relevant items."
- Semantic vector space: A numerical embedding space where semantically similar items are close together. "The assistant first indexes Aliceâs mailboxâsubject lines, message bodies, and attachmentsâby embedding each item into a semantic vector space."
- Spotlighting: A defense that tags input segments with their origin (trusted vs. untrusted) to guide safe processing. "while spotlighting~\cite{hines2024defendingindirectpromptinjection} tags portions of the input with their origin (e.g., trusted user input vs. untrusted external content)."
- Steganographic techniques: Methods of hiding information within innocuous-looking content so that it is not easily detected. "embedding hidden instructions in his email and then decoding the modelâs response via steganographic techniques."
- System-level isolation: Architectural controls that constrain how untrusted inputs affect system behavior and data flows. "probabilistic defenses---such as input sanitization, output filtering, system-level isolation, and training-level techniques---which, while useful, are fundamentally insufficient"
- Taint tracking: A technique that tracks the flow of sensitive data through computations or generations to detect leakage. "while taint tracking~\cite{siddiqui2024permissiveinformationflowanalysislarge} traces the propagation of sensitive data through generation."
- Task drift: A deviation of the model’s behavior from the intended task, often used as a signal for possible attacks or policy violations. "Techniques like TaskTracker~\cite{abdelnabi2025driftcatchingLLMtask} identify task drift to detect unexpected behavior."
Collections
Sign up for free to add this paper to one or more collections.