Compositional Adversarial Attacks
- Compositional adversarial attacks are defined by the systematic combination of multiple perturbation methods, such as additive, semantic, and spatial transformations.
- Methodologies include joint gradient-based optimization, genetic algorithm search, and cross-modal techniques to effectively construct and evaluate complex attack chains.
- Empirical results reveal significant drops in robust accuracy across models, underscoring the need for new defense strategies against combined threat models.
Compositional adversarial attacks, also termed composite or combinational adversarial attacks, refer to the systematic construction of adversarial examples through the composition—either by sequential application or functional combination—of multiple perturbation mechanisms, attack algorithms, or intent-carrying payloads. These threat models transcend the traditional single-norm or unidimensional approaches by explicitly considering mixtures of perturbations, such as the union of additive, semantic, functional, spatial, or higher-level manipulations. The compositional paradigm has fundamentally shifted the adversarial robustness landscape, revealing stark gaps in the protection conferred by defenses tailored to isolated attack types and prompting the development of new methodologies, evaluation protocols, and defense principles across vision, language, and multi-modal learning systems.
1. Formal Definitions and Taxonomy
A compositional adversarial attack is defined by the application of two or more threat models or attack mechanisms, either in a specified or jointly optimized order, to construct adversarial examples outside the scope of any single constituent attack. The concept is formalized in several settings:
- Sequential Threat Composition: For an input , composition produces , with drawn from a restricted function family (e.g., color transformations) and from an -ball, yielding perturbations that cannot be realized by or alone (Laidlaw et al., 2019).
- Composite Algorithmic Chains: The attack is specified as a policy (sequence) over a pool of base attacks , with each base attack parameterized by magnitude and iteration count, and input to one attack being the output of the previous (Mao et al., 2020).
- Multi-Component Semantic Attacks: Input is transformed by for a permutation over semantic perturbations (hue, rotation, contrast, etc.) and optional norm-bounded attack, with parameters optimized over (Hsiung et al., 2022).
- Multi-System and Multi-Intent Attacks: In the multi-defense setting, a compositional attack seeks to jointly maximize loss over a set of defenses ; in instruction-based LLM attacks, a prompt is constructed by function such that the composite prompt bypasses filters but still elicits a harmful output (Jiang et al., 2023, Rathbun et al., 2022).
This taxonomy encompasses:
- Functional + Additive Attacks, e.g., ReColorAdv+ℓ∞ (Laidlaw et al., 2019)
- Algorithmic Composites, e.g., sequences of PGD, CW, spatial, and corruption attacks (Mao et al., 2020)
- Semantic Compositions, e.g., hue, rotation, brightness, contrast (Hsiung et al., 2022)
- Cross-Modality/Instruction Compositions, e.g., vision-language and prompt-packed attacks (Shayegani et al., 2023, Jiang et al., 2023)
2. Theoretical Power and Strict Super-Set Properties
Compositional adversarial threat models are provably strictly stronger than their individual components:
- Strict Inclusion: The composition (e.g., ℓ∞+ReColorAdv) strictly subsumes both and . For , , with s.t. , compositional perturbations exist that are outside either individual threat model [(Laidlaw et al., 2019), Theorem 1].
- Optimization Landscape: Loss landscapes under composite perturbations are generally flatter and have more complex local maxima, necessitating specialized optimization schedules and revealing that practical defenses against one component (e.g., ) do not generalize to the composed regime (Hsiung et al., 2022).
- Empirical Non-Transferability: Transferability between models robust only to single perturbation types is low (e.g., only 21.6% mean transfer between distinct single-model defenses under composite attacks), motivating ensemble and game-theoretic strategies to manage the attack space (Rathbun et al., 2022).
3. Methodologies for Construction and Optimization
3.1 Joint Gradient-Based Optimization
- Compositional Adversarial Example Construction: Adapting projected gradient descent or Adam to jointly optimize all free parameters (functional, additive, or others), with iterative projection to enforce imperceptibility constraints (e.g., , for color) (Laidlaw et al., 2019).
- Component-Wise Projected Gradient Descent (Comp-PGD): Each semantic component is parameterized by ; PGD steps are interleaved over each dimension, sequentially applying component-wise transformations, optionally with automatic order scheduling (Hsiung et al., 2022).
3.2 Algorithmic and Search-Based Compositions
- Genetic Algorithm Search: The Composite Adversarial Attack (CAA) framework models the space of attack sequences as a discrete optimization problem. The NSGA-II evolutionary algorithm is used to jointly tune base-attack order, per-attack hyper-parameters, and sequence length, balancing success rate against query complexity (Mao et al., 2020).
- TextAttack Modular Composition: NLP composite attacks are synthesized by combining transformation modules (e.g., synonym and character-level swaps), composite constraints, and multi-stage search (greedy, beam, genetic), with direct control over semantic fidelity and attack strength (Morris et al., 2020).
3.3 Cross-Modality and Intent Packing
- Vision-Language Jailbreaks: Adversarial images are optimized to match the embedding of harmful prompts in the frozen vision encoder’s space while paired with benign text, leveraging structured triggers (OCR/textual, visual, combined) (Shayegani et al., 2023).
- Compositional Instruction Attacks (CIA) for LLMs: Harmful sub-prompts are embedded into "innocuous" shell prompts via standardized templates (T-CIA, W-CIA); success is defined by non-rejection, topic relevance, and harmfulness of output (Jiang et al., 2023).
4. Experimental Evaluation and Empirical Insights
4.1 Model-Domain Results
- Image Classification (CIFAR-10/ResNet-32): Under ReColorAdv+ℓ∞+StAdv, robust accuracy drops to 3.6–5.7% on adversarially trained/TRADES models; natural accuracy is only modestly affected (Laidlaw et al., 2019). Composite attacks yield 0% robust accuracy on naturally trained models.
- Generalized Semantic and Norm Ball Attacks: GAT-trained models maintain robust accuracy up to 43.5% (ImageNet), greatly surpassing ℓ∞-only robust models (14.0%); composite attacks involving both semantic and norm perturbations can reduce -robust models to <4% accuracy (Hsiung et al., 2022).
- Combinatorial and Ensemble Attacks: CAA discovers attack policies that lower robust accuracy below all single-step and prior composite baselines (e.g., 49.18% on CIFAR-10/AdvTrain vs. 49.25% for AutoAttack) at 6× reduced query cost (Mao et al., 2020).
4.2 NLP and Multi-Modal
- TextAttack: Composite synonym+char attacks surpass 40% success on SST-2 with semantic similarity ≥0.8; trade-offs occur between transformation set size, semantic fidelity, and computational cost (Morris et al., 2020).
- Vision-Language Jailbreaks: Attack success rates (ASR) using OCR/visual/combo triggers reach 85–87% on LLaVA and 60–63% on LLaMA-Adapter V2; text-only triggers are essentially ineffective (ASR<1%) (Shayegani et al., 2023).
- Instruction Packing for LLMs: T-CIA and W-CIA attacks achieve 83–91% ASR on harmful prompt sets and 95%+ for safety assessment sets across GPT-4, ChatGPT, and ChatGLM2, in contrast to 6–12% base rates (Jiang et al., 2023).
4.3 Game-Theoretic and Ensemble Defense
- GaME Framework: Multi-model, detector-integrated ensembles robustly resist compositional attacks, with robust accuracy gains of up to 38% (CIFAR-10) and 123% (Tiny ImageNet) over the best single-model defense by optimizing defense/attack strategies to Nash equilibrium (Rathbun et al., 2022).
- Attack Distribution in Mixed Nash: Over 80% of attack policy mass is allocated to composed (multi-model) attacks, with single-target attacks being systematically deprioritized.
5. Implications for Robustness and Defense Design
- Non-Transitive Robustness: Robustness to one threat (e.g., ℓ∞) does not imply robustness to compositions or orthogonal transformations (e.g., color, spatial) (Laidlaw et al., 2019, Hsiung et al., 2022).
- Defense Complexity: Defending against compositional adversarial attacks requires adversarial training or ensemble defense strategies spanning the full composition set, with increased computational and algorithmic complexity (Laidlaw et al., 2019, Rathbun et al., 2022).
- Generic vs. Domain-Specific Attacks: Domain-specific composite attacks (e.g., combining domain-preserving semantic and pixel-based transformations) substantially outperform either class alone—models must incorporate compositional robustness into both training and evaluation (Hsiung et al., 2022).
- Gradient Obfuscation and Detection: Composite attacks are sometimes harder to detect; specialized detectors (e.g., input anomaly, persona similarity) and structured consistency checks are essential for multi-modal and LLM settings (Jiang et al., 2023, Shayegani et al., 2023).
- Attack Automation and Search: Automated policy search (e.g., CAA) can discover stronger composite attack policies than human experts, underscoring the importance of automated, broad-based evaluation in security certification pipelines (Mao et al., 2020).
6. Challenges, Limitations, and Open Directions
- Computational Overhead: Simultaneous optimization over compositional attack parameters and schedules increases attack and adversarial training cost significantly (Hsiung et al., 2022).
- Certified Robustness: Theoretical certification against composite attacks remains largely unsolved beyond small composition sets.
- Generality and Scalability: Extending composite robustness to more complex or continuous transformations (e.g., full geometric, weather, or scenario-level modifications) introduces further methodological challenges (Hsiung et al., 2022).
- Detection and Alignment: For multi-modal and instruction attacks, robust intent detection, hierarchical policy checks, and modality alignment remain open research areas with only partial progress from current adversarial fine-tuning techniques (Jiang et al., 2023, Shayegani et al., 2023).
- Evaluation Standardization: Diverse attack chains and rapid evolution of techniques necessitate standardized, modular benchmarking frameworks to allow meaningful comparison and defense evaluation across model families (Mao et al., 2020, Morris et al., 2020).
7. Representative Approaches Across Domains
| Domain | Compositional Mechanism | Key Reference |
|---|---|---|
| Vision | Functional+additive (ReColorAdv+ℓ∞), semantic chains | (Laidlaw et al., 2019, Hsiung et al., 2022) |
| NLP | Modular (goal, constraints, transformations, search) | (Morris et al., 2020) |
| Multi-Modal | Image+text intent splitting, embedding alignment | (Shayegani et al., 2023) |
| LLM Security | Prompt packing (persona, story shell) | (Jiang et al., 2023) |
| Defense | Game-theoretic (multi-model, multi-attack) ensembles | (Rathbun et al., 2022) |
| Automated Policy | Genetic/multi-objective algorithmic search | (Mao et al., 2020) |
This comparative typology underscores the pervasiveness and critical impact of compositional adversarial attacks across modern machine learning settings, affirming the necessity for explicitly compositional robustness frameworks in both attack analysis and defense design.