Papers
Topics
Authors
Recent
Search
2000 character limit reached

Chat Protect (CP): Secure Chat Framework

Updated 17 June 2026
  • Chat Protect (CP) is a security framework that isolates multi-app chat contexts to defend against cross-app context poisoning and multi-turn backdoor attacks.
  • It employs provenance tagging and per-app subcontexts to manage message origin and secure context merging, inspired by mature multi-tenant systems.
  • CP integrates architectural, data-centric, and runtime monitoring defenses to ensure robust protection against adversarial manipulations in conversational AI.

Chat Protect (CP) is a proposed security framework addressing emergent attack surfaces in multi-app conversational AI ecosystems, such as ChatGPT Apps. It is designed to defend against cross-app context poisoning and multi-turn backdoor attacks by imposing per-app context isolation, provenance tagging, and mediated context merging at both architectural and model levels. CP's blueprint formalizes threat models that exploit flat, untagged chat contexts and proposes robust defenses based on principles derived from mature multi-tenant computing environments (Wang et al., 30 May 2026, Hao et al., 2024).

1. Threat Landscape: Context Poisoning and Multi-Turn Backdoors

ChatGPT Apps implement an app-in-app paradigm in which third-party apps share a persistent, global chat context:

C={u1,a1,u2,a2,...,un,an},uiUserMessages,aiAppMessagesC = \{u_1, a_1, u_2, a_2, ..., u_n, a_n\}, \quad u_i \in \text{UserMessages},\quad a_i \in \text{AppMessages}

At each interaction turn tt, the LLM consumes the entire context CC to select tool calls and synthesize responses (Wang et al., 30 May 2026). This flat, untyped context enables two distinct but related classes of attacks:

Cross-App Context Poisoning: A malicious app AmA_m injects an adversarial payload pp into CC at time t0t_0, with the effect later surfacing during the invocation of a benign app AvA_v at t1t_1. The LLM processes the now-poisoned context C=C[p]C' = C \mathbin\Vert [p], allowing the adversary to control downstream tool calls and parameters via indirect prompt injection.

Multi-Turn Backdoor Attacks: Models fine-tuned on multi-turn conversational data can be subverted with triggers distributed across several user turns. Only when all scenario triggers are present in the dialogue history does the backdoor activate, causing malicious behavior while the model otherwise behaves benignly (Hao et al., 2024).

These attack vectors exploit the absence of context isolation, provenance, or access controls in both LLM context management (prompt space) and model training pipelines.

2. Attack Channels and Mechanisms

Attacks leverage privileged APIs and context design flaws:

  • sendFollowUpMessage API: Exposed to all connected apps, this API allows tool-role prompts to be appended to the global context. Wire-level effect: the platform inserts an assistant-role message with arbitrary content.
  • Undocumented Amplifiers:
    • systemPrompt: Escalates payloads to system-role, granting highest priority in context processing.
    • isVisible: When set to false, hides payloads from the user interface, rendering attacks invisible by default.

Combined (see code below), these allow silent, high-priority, and invisible context modifications: AmA_m4 (Wang et al., 30 May 2026)

Payload Styles:

  • Conditional-Style (Attack I): Embeds dormant rules that activate on specific future queries.
  • Imperative-Style (Attack II): Commands immediate tool invocation by the LLM irrespective of user input.
  • Amplified Variants: Use undocumented amplifiers to make payloads system-priority and UI-invisible.

In the backdoor setting, attacks target the finetuning stage, poisoning a fraction of training dialogues by distributing sub-trigger scenarios across turns. The attack success rate (ASR) remains high, especially for models like Vicuna-7B, even after defense attempts such as downstream re-alignment (Hao et al., 2024).

3. Isolation Failures in Multi-Tenant LLM Contexts

Classic multi-tenant platforms implement explicit isolation mechanisms:

  • Multics/VM: Hardware-enforced privilege rings, segregated virtual memory.
  • UNIX/Android: User IDs (UIDs) with kernel-mediated inter-process communication (IPC) and access-control lists (ACLs).
  • iOS: Per-app sandboxing, entitlements, and cross-app channel authorization.

In contrast, ChatGPT Apps manage all user, system, and app-generated messages as an undifferentiated sequence tt0, devoid of provenance tags. Any app may write to the shared context through the API or by returning outputs, and the LLM processes all entries uniformly except for the role field. There is no per-app namespace, ACL, or kernel-like reference monitor. This untagged, flat architecture enables confused-deputy and cross-app data flow attacks (Wang et al., 30 May 2026).

4. Chat Protect (CP): Architectural Blueprint

Chat Protect addresses these fundamental flaws by introducing provenance-aware context partitioning and mediation:

  • Context Re-Representation: Replace tt1 with an event stream tt2, where each tt3 captures message content, role, and source (User, Apptt4, ..., Apptt5).
  • Per-App Subcontexts: Maintain a separate subcontext tt6 for each app tt7 (tt8), disjoint from the global user context tt9.
  • Mediator/Merge Function: When invoking CC0 or generating visible replies, the LLM receives only CC1, with SummarizeCC2 implemented by a constrained LLM trained to output only explicit app results and exclude instruction-like fragments.
  • Controlled Return Path: When CC3 produces a tool result CC4, CP tags it CC5. A filter LLM CC6 classifies CC7 as safe/unsafe before CC8 is merged into CC9:

AmA_m0

  • Access Controls: Apps have write access only to their own AmA_m1, not to AmA_m2 or other AmA_m3. The system disables systemPrompt and isVisible for all third-party apps and confines API effects to the origin app's context.

Diagrammatically: AmA_m5 This framework draws direct analogy to OS IPC with probabilistic filtering and reinstates the invariant that every request/action is origin-tagged and reference-monitored (Wang et al., 30 May 2026).

5. Empirical Findings: Attack Prevalence and Defense Limitations

A key quantitative evaluation assessed the effectiveness of attacks and baseline defenses:

Model Attack I Attack II
GPT 5.5 Thinking
GPT 5.4 Thinking
GPT 5.3 Instance
GPT 5.2 Thinking ✓† ✓†
GPT 5.2 Instance
GPT o3 Reasoning

† = GPT 5.2 Thinking processes tool-role follow-ups as user-priority, increasing attack effectiveness (Wang et al., 30 May 2026).

All tested ChatGPT models succumbed to both attack styles with 100% success using only documented APIs, and the use of undocumented amplifiers would prevent role-hierarchy-based defenses. In the backdoor regime, Vicuna-7B maintained an ASR above 90% post-attack, with re-alignment only lowering ASR to ≈70%—a level still considered insecure (Hao et al., 2024).

6. Towards a Comprehensive CP Framework

A robust Chat Protect (CP) framework integrates both context-level and training/data-level defenses:

  • Data-Centric: Pre/fine-tuning data sanitation to excise scenario-trigger patterns; poison-aware sampling to prevent clustering of sub-trigger/malicious pairs.
  • Model-Centric: Activation/representation monitoring (e.g., ONION), adversarial fine-tuning with adaptive attacks, and neuron-level inspection for correlating activations.
  • Runtime Monitoring: Trigger fingerprinting to detect distributed trigger scenarios over turns; response classification via auxiliary safety filters prior to user delivery.
  • Continuous Re-Alignment: Ongoing post-deployment re-finetuning with “hard negative” dialogue combinations.

These strategies address both open-context prompt injection and latent-model backdoors, aiming to restore security invariants absent from current conversational AI platforms. Stopgap mitigations such as stripping undocumented parameters or UI-tagging injected content are insufficient; only architectural adoption of provenance tags, per-app namespaces, and mediated context merging fulfills the requirements for robust, multi-tenant LLM ecosystems (Wang et al., 30 May 2026, Hao et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Chat Protect (CP).