DualTAP: Dual-Task Adversarial Defense
- DualTAP is a framework that jointly optimizes adversarial defense and utility preservation in complex, multi-task and multimodal systems.
- It decouples privacy protection and task functionality by employing contrastive attention, targeted perturbations, and dual-policy optimization.
- Empirical evaluations show significant privacy leakage reduction with minimal utility loss across mobile agents, perception modules, and reinforcement learning settings.
The Dual-Task Adversarial Protector (DualTAP) is a framework designed to defend complex machine learning systems, particularly multimodal and multi-task agents, against adversarial threats while ensuring critical utility. By formulating protection as an explicit dual-objective optimization, DualTAP advances beyond conventional, single-task adversarial defenses to address conflicting goals that arise in real-world deployments—most notably, simultaneous privacy protection and utility preservation for mobile agents and perception systems. The methodology extends across image, language, and control domains, and instantiates a new class of robust, adaptive, and efficient defense mechanisms for both black-box and white-box settings (Zhang et al., 17 Nov 2025, Wang et al., 2018, Pi et al., 5 Jan 2024, Li et al., 2023, Klingner et al., 2022).
1. Motivation and Problem Formulation
DualTAP addresses scenarios where protection against adversarial exploitation of one task (e.g., privacy leakage, safety violation, malicious behavior induction) must be balanced with the preservation of system functionality on a different, utility-centric task. In mobile MLLM-based GUI agents, the transmission of screen content to potentially untrusted routers creates a privacy hazard: routers may employ their own MLLMs to extract PII, while the agent’s MLLM must still accurately fulfill user-driven queries (utility task). The addition of any perturbation to suppress privacy risks (e.g., image-based adversarial noise) may degrade the agent’s functional accuracy—a trade-off that legacy perturbation or defense schemes cannot manage. Thus, DualTAP formalizes a dual-objective adversarial protection requirement:
- Minimize privacy sensitive information leakage to adversarial subsystems,
- Maximize correct utility task completion for the intended agent subsystem.
This dual-challenge is not unique to privacy—for example, in safety-critical RL, one must optimize control performance while always enforcing safety constraints, even under adversarial disturbance (Li et al., 2023).
2. Core Architectural Principles
DualTAP frameworks universally:
- Decouple the two objectives, assigning explicit losses or decoders for each,
- Employ architectural innovations (e.g., contrastive attention, multi-head decoders, policy dualization, or edge-consistency modules) to identify, localize, and treat regions/features that impact only one specific objective,
- Integrate lightweight generator or defense modules that act in a plug-and-play or wraparound fashion, requiring minimal modification to underlying perception or generative backbones,
- Leverage multi-task or dual-policy optimization to systematically negotiate the trade-off.
For mobile MLLM agents (Zhang et al., 17 Nov 2025), DualTAP consists of:
- A contrastive attention module that computes spatial maps highlighting PII-sensitive but utility-insensitive regions,
- An adversarial generator , implemented as a U-Net with per-layer attention-based feature modulation, which localizes perturbation to those regions,
- A dual-task adversarial objective, minimizing task-preservation loss for utility queries and maximizing privacy-interference loss for PII-targeted queries, all over a surrogate vision-LLM,
- Output perturbations are norm-bounded and applied only where contrast is high.
Collaborative multi-task training variants (Wang et al., 2018) and dually robust RL (Li et al., 2023) instantiate similar dual-branch or dual-policy structures, allowing joint optimization across primary and adversarial objectives, with careful design to guarantee gradient obfuscation or robust saddle-point convergence.
3. Dual-Task Adversarial Objective and Training
Consider a scenario with disjoint task sets: normal queries (utility) and privacy queries (threats). The generator is optimized via the composite loss: where
weight the preservation/suppression trade-off, and is the adversarially perturbed input constrained by . For perception tasks, similar objectives regularize multi-task outputs and their cross-modal consistency (Klingner et al., 2022).
Training leverages a surrogate MLLM (e.g., InternVL3_5-2B), and dual queries are synthesized from the PrivScreen dataset, constructed with paired normal/PII queries per screenshot (Zhang et al., 17 Nov 2025). For collaborative training (on classification), a joint loss over primary and robust-label heads penalizes both clean and adversarially perturbed samples (Wang et al., 2018).
4. Domain-Specific Instantiations
4.1 Mobile MLLM Agents (Zhang et al., 17 Nov 2025)
- Input: Mobile screenshot with embedded PII entities,
- Contrastive Attention: Computes , where is the spatial gradient map for QA log-likelihood,
- Generator: U-Net applies spatially targeted perturbation according to , bounded to ,
- Loss: Dual-task described above; task is evaluated over both sets of queries,
- Output Flow: is used for routing to both adversarial (router) and agent MLLMs; utility measured by normal QA accuracy, privacy by PII extraction match/semantic scores.
4.2 Perception Multi-tasking (Klingner et al., 2022)
- Architecture: Shared encoder feeding depth and segmentation decoders,
- Detection Module: Edge-consistency detector computes SSIM across modality-specific edges (RGB, segmentation, depth),
- Training: Edge consistency loss maximizes agreement on clean samples,
- Runtime: Detector flags perturbed input when pairwise SSIM falls below learned thresholds; fallback or conservative policy is triggered.
4.3 Collaborative Multi-task Classification (Wang et al., 2018)
- Architecture: Shared feature trunk outputs to two logits heads (classification, robust label), with a gradient-lock unit to prevent gradient matching during adversarial search,
- Detection: Static Classmap (valid (label, robust_label) pairs) identifies mismatches due to high-confidence attacks,
- Loss: Cross-entropy over both heads with balanced clean/adversarial samples.
4.4 Dually Robust RL (Li et al., 2023)
- Formulation: Constrained two-player zero-sum Markov game with task and safety policies, each with their own actors, critics, and adversaries,
- Training/Update: Dual policy iteration alternates between safety set expansion (maximum robust invariant set) and reward-maximizing policy improvement within that set.
5. Empirical Results and Evaluation Metrics
In each instantiation, evaluation is grounded in both privacy/adversary mitigation and utility preservation.
Mobile MLLM Agents (Zhang et al., 17 Nov 2025)
- Privacy Leakage (LR): Reduced from ~97% to ~23% with DualTAP (3x improvement over strongest baseline),
- Task Accuracy (Acc): Maintained at 80.8% vs. 83.6% for unprotected baseline,
- Semantic Metrics: BERTScore, Cosine Similarity, BLEU, ROUGE-L show lowest leakage for DualTAP among all baselines,
- Efficiency: Inference overhead <0.3s/frame (superior to FOA-Attack, VIP),
- Ablations: Only DualTAP’s dual-task + attention mechanism achieves strong privacy and utility together; simple blurring/masking sacrifices accuracy.
Collaborative Classification (Wang et al., 2018)
- Black-box Defensive Accuracy: Maintained at 80–89% under transferred low-confidence attacks,
- High-confidence Detection: Precision and recall >80% at strong attack settings (AUC ~0.97),
- Benign Overhead: ≤2.1% accuracy drop on CIFAR-10, negligible for MNIST; low false alarm rate on clean data.
Multi-task Perception (Klingner et al., 2022)
- Detection TPR: 97–100% for moderate/strong attacks (ε≥8), robust even to detector-aware attacks,
- Impact of Edge Consistency Loss: Removal significantly impairs detection; SSIM-based detection outperforms pixel-level alternatives.
6. Key Strengths, Limitations, and Extensions
Strengths:
- Explicitly optimizes the privacy/utility or safety/performance trade-off,
- Localizes perturbation and detection to critical regions, minimizing collateral utility damage,
- Modular and lightweight; easily integrated with legacy architectures,
- Provably robust to both low- and high-confidence attacks, including in the setting where defenses are known,
- Demonstrates strong empirical gains in both privacy suppression/leakage and task performance.
Limitations:
- Relies on white-box knowledge of surrogate models for contrastive attention and adversarial training,
- Transferability to completely black-box commercial APIs may vary,
- Synthetic PII or adversarial data may not capture all real-world variations,
- Static QA/evaluation ignores multi-turn or dynamic interaction scenarios.
Extensions:
- Expansion with real-user/all-domain data (e.g., PrivScreen augmentation),
- Extension to video/multi-modal (e.g., audio+GUI) inputs,
- Dynamic or online adaptation to evolving privacy threats,
- Inclusion of additional modalities or auxiliary consistency checks.
7. Applications and Impact
DualTAP and its variants constitute the foundation of modern, robust adversarial defenses for:
- Mobile agents and on-device ML systems with privacy/safety critical constraints,
- Perception systems for autonomous driving and robotics demanding both high-performance and adversarial assurance,
- Deep RL agents in domains requiring joint safety and adaptive performance (e.g., industrial control, autonomous navigation),
- Any multi-task or dual-objective scenario where adversarial manipulation of one objective risks catastrophic failure of the other.
Across all evaluated tasks, DualTAP establishes new standards for dual-objective defense effectiveness and efficiency, providing actionable templates for future research and practical deployments in adversarially rich environments (Zhang et al., 17 Nov 2025, Wang et al., 2018, Pi et al., 5 Jan 2024, Li et al., 2023, Klingner et al., 2022).