Chat Protect (CP): Secure Chat Framework
- Chat Protect (CP) is a security framework that isolates multi-app chat contexts to defend against cross-app context poisoning and multi-turn backdoor attacks.
- It employs provenance tagging and per-app subcontexts to manage message origin and secure context merging, inspired by mature multi-tenant systems.
- CP integrates architectural, data-centric, and runtime monitoring defenses to ensure robust protection against adversarial manipulations in conversational AI.
Chat Protect (CP) is a proposed security framework addressing emergent attack surfaces in multi-app conversational AI ecosystems, such as ChatGPT Apps. It is designed to defend against cross-app context poisoning and multi-turn backdoor attacks by imposing per-app context isolation, provenance tagging, and mediated context merging at both architectural and model levels. CP's blueprint formalizes threat models that exploit flat, untagged chat contexts and proposes robust defenses based on principles derived from mature multi-tenant computing environments (Wang et al., 30 May 2026, Hao et al., 2024).
1. Threat Landscape: Context Poisoning and Multi-Turn Backdoors
ChatGPT Apps implement an app-in-app paradigm in which third-party apps share a persistent, global chat context:
At each interaction turn , the LLM consumes the entire context to select tool calls and synthesize responses (Wang et al., 30 May 2026). This flat, untyped context enables two distinct but related classes of attacks:
Cross-App Context Poisoning: A malicious app injects an adversarial payload into at time , with the effect later surfacing during the invocation of a benign app at . The LLM processes the now-poisoned context , allowing the adversary to control downstream tool calls and parameters via indirect prompt injection.
Multi-Turn Backdoor Attacks: Models fine-tuned on multi-turn conversational data can be subverted with triggers distributed across several user turns. Only when all scenario triggers are present in the dialogue history does the backdoor activate, causing malicious behavior while the model otherwise behaves benignly (Hao et al., 2024).
These attack vectors exploit the absence of context isolation, provenance, or access controls in both LLM context management (prompt space) and model training pipelines.
2. Attack Channels and Mechanisms
Attacks leverage privileged APIs and context design flaws:
- sendFollowUpMessage API: Exposed to all connected apps, this API allows tool-role prompts to be appended to the global context. Wire-level effect: the platform inserts an assistant-role message with arbitrary content.
- Undocumented Amplifiers:
systemPrompt: Escalates payloads to system-role, granting highest priority in context processing.isVisible: When set to false, hides payloads from the user interface, rendering attacks invisible by default.
Combined (see code below), these allow silent, high-priority, and invisible context modifications: 4 (Wang et al., 30 May 2026)
Payload Styles:
- Conditional-Style (Attack I): Embeds dormant rules that activate on specific future queries.
- Imperative-Style (Attack II): Commands immediate tool invocation by the LLM irrespective of user input.
- Amplified Variants: Use undocumented amplifiers to make payloads system-priority and UI-invisible.
In the backdoor setting, attacks target the finetuning stage, poisoning a fraction of training dialogues by distributing sub-trigger scenarios across turns. The attack success rate (ASR) remains high, especially for models like Vicuna-7B, even after defense attempts such as downstream re-alignment (Hao et al., 2024).
3. Isolation Failures in Multi-Tenant LLM Contexts
Classic multi-tenant platforms implement explicit isolation mechanisms:
- Multics/VM: Hardware-enforced privilege rings, segregated virtual memory.
- UNIX/Android: User IDs (UIDs) with kernel-mediated inter-process communication (IPC) and access-control lists (ACLs).
- iOS: Per-app sandboxing, entitlements, and cross-app channel authorization.
In contrast, ChatGPT Apps manage all user, system, and app-generated messages as an undifferentiated sequence 0, devoid of provenance tags. Any app may write to the shared context through the API or by returning outputs, and the LLM processes all entries uniformly except for the role field. There is no per-app namespace, ACL, or kernel-like reference monitor. This untagged, flat architecture enables confused-deputy and cross-app data flow attacks (Wang et al., 30 May 2026).
4. Chat Protect (CP): Architectural Blueprint
Chat Protect addresses these fundamental flaws by introducing provenance-aware context partitioning and mediation:
- Context Re-Representation: Replace 1 with an event stream 2, where each 3 captures message content, role, and source (User, App4, ..., App5).
- Per-App Subcontexts: Maintain a separate subcontext 6 for each app 7 (8), disjoint from the global user context 9.
- Mediator/Merge Function: When invoking 0 or generating visible replies, the LLM receives only 1, with Summarize2 implemented by a constrained LLM trained to output only explicit app results and exclude instruction-like fragments.
- Controlled Return Path: When 3 produces a tool result 4, CP tags it 5. A filter LLM 6 classifies 7 as safe/unsafe before 8 is merged into 9:
0
- Access Controls: Apps have write access only to their own 1, not to 2 or other 3. The system disables
systemPromptandisVisiblefor all third-party apps and confines API effects to the origin app's context.
Diagrammatically: 5 This framework draws direct analogy to OS IPC with probabilistic filtering and reinstates the invariant that every request/action is origin-tagged and reference-monitored (Wang et al., 30 May 2026).
5. Empirical Findings: Attack Prevalence and Defense Limitations
A key quantitative evaluation assessed the effectiveness of attacks and baseline defenses:
| Model | Attack I | Attack II |
|---|---|---|
| GPT 5.5 Thinking | ✓ | ✓ |
| GPT 5.4 Thinking | ✓ | ✓ |
| GPT 5.3 Instance | ✓ | ✓ |
| GPT 5.2 Thinking | ✓† | ✓† |
| GPT 5.2 Instance | ✓ | ✓ |
| GPT o3 Reasoning | ✓ | ✓ |
† = GPT 5.2 Thinking processes tool-role follow-ups as user-priority, increasing attack effectiveness (Wang et al., 30 May 2026).
All tested ChatGPT models succumbed to both attack styles with 100% success using only documented APIs, and the use of undocumented amplifiers would prevent role-hierarchy-based defenses. In the backdoor regime, Vicuna-7B maintained an ASR above 90% post-attack, with re-alignment only lowering ASR to ≈70%—a level still considered insecure (Hao et al., 2024).
6. Towards a Comprehensive CP Framework
A robust Chat Protect (CP) framework integrates both context-level and training/data-level defenses:
- Data-Centric: Pre/fine-tuning data sanitation to excise scenario-trigger patterns; poison-aware sampling to prevent clustering of sub-trigger/malicious pairs.
- Model-Centric: Activation/representation monitoring (e.g., ONION), adversarial fine-tuning with adaptive attacks, and neuron-level inspection for correlating activations.
- Runtime Monitoring: Trigger fingerprinting to detect distributed trigger scenarios over turns; response classification via auxiliary safety filters prior to user delivery.
- Continuous Re-Alignment: Ongoing post-deployment re-finetuning with “hard negative” dialogue combinations.
These strategies address both open-context prompt injection and latent-model backdoors, aiming to restore security invariants absent from current conversational AI platforms. Stopgap mitigations such as stripping undocumented parameters or UI-tagging injected content are insufficient; only architectural adoption of provenance tags, per-app namespaces, and mediated context merging fulfills the requirements for robust, multi-tenant LLM ecosystems (Wang et al., 30 May 2026, Hao et al., 2024).