Towards more transferable adversarial attack in black-box manner (2505.18097v1)

Published 23 May 2025 in cs.LG and cs.CV

Abstract: Adversarial attacks have become a well-explored domain, frequently serving as evaluation baselines for model robustness. Among these, black-box attacks based on transferability have received significant attention due to their practical applicability in real-world scenarios. Traditional black-box methods have generally focused on improving the optimization framework (e.g., utilizing momentum in MI-FGSM) to enhance transferability, rather than examining the dependency on surrogate white-box model architectures. Recent state-of-the-art approach DiffPGD has demonstrated enhanced transferability by employing diffusion-based adversarial purification models for adaptive attacks. The inductive bias of diffusion-based adversarial purification aligns naturally with the adversarial attack process, where both involving noise addition, reducing dependency on surrogate white-box model selection. However, the denoising process of diffusion models incurs substantial computational costs through chain rule derivation, manifested in excessive VRAM consumption and extended runtime. This progression prompts us to question whether introducing diffusion models is necessary. We hypothesize that a model sharing similar inductive bias to diffusion-based adversarial purification, combined with an appropriate loss function, could achieve comparable or superior transferability while dramatically reducing computational overhead. In this paper, we propose a novel loss function coupled with a unique surrogate model to validate our hypothesis. Our approach leverages the score of the time-dependent classifier from classifier-guided diffusion models, effectively incorporating natural data distribution knowledge into the adversarial optimization process. Experimental results demonstrate significantly improved transferability across diverse model architectures while maintaining robustness against diffusion-based defenses.

Summary

Towards More Transferable Adversarial Attack in Black-Box Manner

In the field of adversarial attacks, the paper "Towards more transferable adversarial attack in black-box manner" offers significant contributions to the understanding and development of approaches for enhancing the transferability of attacks in a black-box setting. The authors address a salient problem: while traditional black-box methods have focused on optimizing frameworks to improve transferability, they often neglect important dependencies on surrogate white-box model architectures. In this context, exploring methodologies that mitigate the computational overhead of diffusion-based adversarial purification processes, without comprising attack effectiveness, is of paramount importance.

Key Contributions

Novel Loss Function and Surrogate Model:
- The authors introduce a novel loss function and a unique surrogate model that capitalizes on the score of a time-dependent classifier from classifier-guided diffusion models. This approach effectively incorporates natural data distribution knowledge into adversarial optimization processes, offering improved transferability and robustness against defenses based on diffusion models.
Reduction in Computational Costs:
- The proposed method significantly reduces computational overhead compared to existing state-of-the-art approaches, such as DiffPGD, by circumventing the intensive denoising processes involved in diffusion models. The efficiency gains facilitate deployment in resource-constrained environments and enable large-scale robustness evaluations.
Integration with PGD Framework:
- The integration with a Projected Gradient Descent (PGD) framework, utilizing both the classification task's inductive bias and the noised data distribution knowledge, provides a universal solution that works effectively across varied model architectures and settings.

Experimental Results

The experiments demonstrate the efficacy of the proposed methods across a variety of scenarios:

Unprotected Classifier Attacks:
- U-ScorePGD consistently outperformed traditional PGD and DiffPGD approaches in both white-box and black-box settings, achieving up to 89.9% attack success rates on unprotected models while maintaining efficiency.
Protected Classifier Attacks:
- ScorePGD exhibited superior performance against diffusion-based adversarial purification, leveraging its inherent ability to disrupt purification processes by altering the guidance direction of reverse diffusion models.
Efficiency in Runtime:
- The proposed methodologies achieve substantial improvements in runtime efficiency, operating up to 10 times faster than competing methods such as DiffPGD across standard image resolutions and PGD settings.

Implications and Future Directions

The paper's findings have profound implications for both practical applications and theoretical advancements in adversarial security frameworks. By demonstrating that incorporating selective noised data distribution knowledge—rather than full diffusion processes—enhances attack transferability, it paves the way for more efficient adversarial techniques in various machine learning contexts.

Moreover, the ability to execute attacks with reduced resource demands expands the potential for deploying robust adversarial evaluations in real-world systems. Future work could focus on refining optimization strategies for increasingly complex models or exploring adaptive methodologies for selecting diffusion timesteps to further improve attack effectiveness and computational efficiency.

In conclusion, this paper contributes to a more nuanced understanding of adversarial attacks in black-box settings and offers practical solutions that balance attack performance with computational constraints. The innovative use of time-dependent classifiers and novel loss functions provides a promising direction for advancing adversarial research and developing resilient security measures in artificial intelligence systems.

Related Papers

YouTube

Show All Videos