PAAC: Privacy-Aware Agentic Device-Cloud Collaboration

Published 9 May 2026 in cs.LG, cs.CL, and cs.DC | (2605.08646v1)

Abstract: LLM agents face a structural tension: cloud agents provide strong reasoning but expose user data, while on-device agents preserve privacy at the cost of overall capability. Existing device-cloud designs treat this boundary as a compute split rather than a trust boundary suited to agentic workloads, and existing sanitizers force a choice between policy flexibility and the structural fidelity tool calls require. In this work, we develop PAAC, a privacy-aware agentic framework that aligns planner--executor decomposition with the device-cloud boundary so that role specialization itself becomes the privacy mechanism. The cloud agent reasons over typed placeholder tokens that preserve each sensitive value's reasoning role while discarding its content, while the on-device agent identifies sensitive spans and distills each step's execution outcome into compact key findings. Sanitization confines the on-device LLM to proposing which spans to mask, while a deterministic registry performs all substitution and reversal, keeping actions directly executable on device. On three agentic benchmarks under strict privacy settings, PAAC dominates the Pareto frontier of privacy and accuracy, improving average accuracy by 15-36\% and reducing average leakage by 2-6$\times$ over state-of-the-art device-cloud baselines, with the largest margins on privacy targets outside fixed entity taxonomies. We find consistent improvements on 17 additional benchmarks spanning 10 domains, including math, science, and finance.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces a novel device-cloud collaboration that decouples reasoning and execution to maintain privacy while leveraging cloud-level LLM capabilities.
The paper employs a propose-verify-registry sanitization pipeline that substitutes sensitive spans with typed proxy tokens, significantly reducing privacy leakage.
The evaluation shows PAAC improves accuracy by 15–36% over baselines and achieves robust performance on diverse benchmarks with minimal data exposure.

Privacy-Aware Agentic Device-Cloud Collaboration with PAAC

Motivation and Problem Statement

The proliferation of agentic LLMs has exposed a core tension in their deployment: cloud-based LLM agents possess superior reasoning, planning, and tool use capabilities due to high computational budgets and vast parametric knowledge, but necessitate transmitting user data—including PII—across organizational boundaries, inducing substantial privacy concerns. Conversely, on-device LLM agents ensure privacy by keeping all data local, but are hamstrung by model size and compute constraints, resulting in significant reductions in multi-step reasoning performance. Prior device-cloud frameworks primarily operationalize the device-cloud boundary as a resource partition rather than an explicit trust boundary, and use sanitizer designs based on fixed taxonomies or full-query rewriting, which fail to support policy customization and break agentic tool-call structures necessary for complex reasoning.

PAAC Architecture: Trust-Aligned Agentic Partitioning

PAAC proposes a device-cloud collaboration paradigm that explicitly leverages the device-cloud boundary as a trust boundary. Core design principles are:

Planner-Executor Decomposition: The cloud agent specializes as a high-level Reasoner, operating exclusively over sanitized placeholder tokens that semantically annotate, but do not reveal, sensitive user values. The on-device agent performs privacy sanitization, executes necessary tool actions, and, critically, acts as Judge—distilling per-step execution outcomes into concise key findings. This decoupling preserves privacy while permitting full exploitation of cloud-based LLM reasoning capabilities.
Privacy-Aware Data Representation: Before transmission, the on-device agent deterministically identifies and substitutes sensitive spans with typed proxy tokens (e.g., BALANCE, USER_ID), maintaining referential and semantic fidelity required for tool chaining, while eliminating content leakage across the trust boundary.
Single-Step Distillation: The on-device agent only processes the current step’s outcomes, reducing trajectory-coupled context growth—a major bottleneck for resource-constrained devices. The cloud agent independently tracks the full sanitized reasoning trace.
User-Defined Sanitization Policy: The on-device sanitizer follows arbitrary user policies, specified as markdown checklists, supporting category and context adaptation beyond fixed NER label sets or hand-crafted regex.

Sanitization Pipeline: Propose-Verify-Registry Paradigm

Accurate and policy-aligned sanitization is NP-hard under natural language ambiguity and user-specific privacy scopes. PAAC splits the task as follows:

The on-device LLM proposes candidate sensitive spans and corresponding proxy token assignments per input or tool output.
An alignment verification step ensures that substitutions are contextually grounded in the input (i.e., desanitization deterministically reconstructs the original input for all committed pairs).
All substitution and reversal is performed by an append-only deterministic registry, using regex-based replacement to guarantee structural fidelity and cross-round consistency for tool argument binding.

This innovation structurally reduces the privacy attack surface: “first-turn” entities registered in the initial mapping are perpetually masked, and only over-masking (utility loss) is recoverable. Registry-managed token assignment resolves issues with identical surface forms carrying distinct semantics.

Evaluation: Benchmarks, Privacy-Utility Trade-off, and Robustness

Experimental Design

Extensive evaluation was conducted on both agentic and standard benchmarks, including T2-Bench Airline, T2-Bench Retail, GAIA, GSM8K, CLUTRR, and others drawn from science, finance, logic, and factual QA domains. Multi-step tool-augmented tasks with “open” (names, addresses, IDs) and “closed” (numbers, dates) vocabulary sensitive fields were emphasized to stress-test coverage under realistic privacy policies.

Baseline Comparison

Three classes of baselines are considered:

Pattern-Based Substitution (PBS): NER-based policy, fixed taxonomy.
Query-Rewriting: On-device LLM paraphrasing (PAPILLON).
Perturbation: NER+DP noise over entities (PRISM).

Results demonstrate a bimodal failure in PBS—low leakage only when test categories align with the taxonomy, catastrophic leakage for open-vocabulary and user-defined categories. Rewriting approaches (PAPILLON) disrupt tool-call structure, substantially fragmenting downstream planning and dramatically increasing leakage.

Quantitative Results

PAAC dominates the privacy-accuracy Pareto frontier on all agentic benchmarks. Key numbers:

Accuracy: PAAC improves SOTA device-cloud baseline accuracy rates by 15–36% across benchmarks;
Privacy Leakage: Reduces persistent policy-defined entity occurrence by factors of 2–6×, with the largest gains outside fixed entity taxonomies.
Category Generalization: On AI4Privacy (broad PII), PAAC achieves the lowest overall leakage and miss rates, halving best prior results.

Ablation shows architectural and sanitizer contributions are independent: even with no explicit privacy mechanism, the decoupled agentic design boosts baseline performance; replacing sanitize with PBS retains accuracy gains, with additional privacy-utility improvements only accessible via the full framework.

Adversarial Robustness

PAAC’s design yields almost zero recovery for passive inference on the cloud, as the only observable tokens are semantic proxies with no content correlation. Prompt injection attempts targeting on-device sanitizer and judge roles are structurally contained—the majority are rejected by plausibility filtering, and masked entities in the registry cannot be unmasked by downstream LLM-generated errors or adversarial tool output.

Practical and Theoretical Implications

System Impact

PAAC shifts privacy-preserving agentic LLM systems away from engineering-centric compute offloading to genuine trust-separation architectures. Semantic-preserving sanitization with registry-anchored proxy tokens ensures fidelity in multi-round agentic workflows, supporting practical deployment scenarios where privacy policies vary across users, applications, and jurisdictions.

Future Directions

Several open directions naturally follow:

On-device Model Scaling: Current bottlenecks in alignment and policy recall are defined by the LLM capability on device. Advances in efficient instruction-tuned local LLMs will directly improve coverage and fidelity.
Dynamic Tooling/Execution Policies: PAAC presupposes an adequately provisioned on-device tool environment. Synthesizing safe tool augmentation (e.g., via federated learning or user-audited code downloads) is required for tasks relying on external knowledge or capabilities.
Formal Privacy Guarantees: End-to-end privacy—defined as attack success probability under adversarial settings not limited to honest-but-curious—remains an area for formalization, especially as user-supplied policy intents become richer.
Automated Policy Synthesis and User Feedback: Interactive mechanisms for users to discover and iteratively refine privacy policies in situ, leveraging natural language interfaces and LLM critique loops, can further bridge expressivity and predictability gaps in sanitization.

Conclusion

PAAC introduces an architecture-level solution to the enduring trade-off between privacy preservation and LLM-powered agentic capability. By reframing the device-cloud boundary as a flexible, user- and context-aligned trust boundary, and binding agent roles to privacy semantics, PAAC achieves privacy-utility trade-offs unreachable by prior fixed-taxonomy or rewriting-based approaches. The propose-verify-registry sanitization pipeline is demonstrably robust across a spectrum of tasks and adversaries. The decoupled, role-specialized architecture is compatible with advances in both cloud and on-device LLMs, setting a direction for future research in deployable, privacy-aware agentic AI systems.

Reference: "PAAC: Privacy-Aware Agentic Device-Cloud Collaboration" (2605.08646)

Markdown Report Issue