AdvCLIP-LoRA: Efficient Adversarial Training

Updated 14 October 2025

The paper introduces a novel minimax optimization over LoRA adapters and adversarial perturbations to robustify CLIP models with minimal trainable parameters.
It achieves state-of-the-art adversarial resistance and improved clean accuracy by outperforming prompt tuning baselines in few-shot settings.
The framework demonstrates effective generalization across diverse benchmarks, highlighting its scalability for resource-constrained deployments.

AdvCLIP-LoRA is a parameter-efficient adversarial training methodology specifically developed to enhance the adversarial robustness of large contrastive vision–LLMs (VLMs), such as CLIP, adapted via Low-Rank Adaptation (LoRA) in few-shot regimes. By formulating fine-tuning as a minimax optimization over LoRA adapters and input perturbations, AdvCLIP-LoRA achieves robust adaptation with minimal trainable parameters, outperforming prompt tuning baselines in clean accuracy and adversarial resistance. This framework demonstrates state-of-the-art adversarial robustness and effective generalization across diverse datasets and CLIP architectures, making it particularly suited for resource-constrained environments and scalable deployment scenarios.

1. Motivation and Context

AdvCLIP-LoRA arises from the demonstrated vulnerability of vision–LLMs subjected to downstream adaptation via parameter-efficient fine-tuning techniques, in particular LoRA. Standard CLIP models, while effective on zero-shot retrieval and classification tasks, are susceptible to adversarial inputs—small perturbations that can sharply degrade model performance. Existing approaches to improving adversarial robustness, like adversarial prompt tuning, either require training many parameters or incur significant accuracy trade-offs on clean data. AdvCLIP-LoRA leverages LoRA to inject robustness in a resource-efficient manner by limiting adaptation to a small set of trainable low-rank matrices.

2. Methodological Framework

The central technique in AdvCLIP-LoRA is a minimax optimization over LoRA adapters and adversarial perturbations. Specifically, the framework simultaneously updates the low-rank matrices inserted into the CLIP vision and text transformer layers and generates adversarial input perturbations to train against. The general form of the optimization is:

$\min_{A, B} \max_{\delta \in \Delta} f(W_0 + BA, \delta)$

where $W_0$ are frozen pre-trained model weights, $A$ and $B$ are low-rank trainable matrices, and $\delta$ is an adversarial perturbation constrained to an $\ell_\infty$ -ball $\Delta$ .

Algorithmically, each training iteration alternates between ascending the loss on $\delta$ (using projected gradient ascent, typ. PGD) and descending the loss on $A$ , $B$ (standard gradient descent), simulating adversarial attacks during fine-tuning. The parameter updates are as follows:

Adversarial perturbation step:

$\delta_t = P_{\Delta}\left(\delta_{t-1} + \eta_{\delta} \frac{1}{M} \sum_{i=1}^M \nabla_{\delta} F(W_{t-1}, \delta_{t-1}; \xi_i)\right)$

LoRA adapter update steps:

$A_t = A_{t-1} - \eta_w \frac{1}{M} \sum_{i=1}^M \nabla_A F(W_{t-1}, \delta_t; \xi_i)$

$B_t = B_{t-1} - \eta_w \frac{1}{M} \sum_{i=1}^M \nabla_B F(W_{t-1}, \delta_t; \xi_i)$

$\tau$ , the number of inner maximization steps for PGD, is a controllable hyperparameter that influences the trade-off between robust and clean accuracy.

3. Empirical Evaluation and Results

AdvCLIP-LoRA has been extensively tested on eight vision benchmarks spanning object, texture, fine-grained, scene, and action recognition (ImageNet-1K, Caltech101, DTD, OxfordPets, Food101, Flowers102, SUN397, UCF101) using two CLIP transformer backbones: ViT-B/16 and ViT-B/32.

The evaluation includes:

Few-shot classification: with 1–16 labeled samples per class, comparing clean and adversarial (PGD) accuracy.
Adversarial base-to-new generalization: model fine-tuned on "base" categories and evaluated on held-out "new" categories under adversarial perturbation.
Cross-dataset transfer: model adversarially fine-tuned on a source dataset, evaluated on target datasets without further adaptation.

Key results demonstrate:

AdvCLIP-LoRA consistently surpasses adversarial prompt tuning in both clean and adversarial accuracy across all few-shot scenarios.
In base-to-new generalization, the method achieves higher accuracies on both base and new categories under attack, confirming robust transfer properties.
Cross-dataset experiments show that adversarially trained LoRA adapters yield superior robustness compared with zero-shot CLIP, with minimal degradation in clean performance.
The robustness gains are further improved with higher $\tau$ (more inner steps), but the clean accuracy reduction is minor in richer data regimes.

4. Technical Design of LoRA Adaptation

LoRA adaptation injects low-rank matrices at the weight projection points of both the vision and text transformer modules. The method is applicable to various LoRA ranks and placements, with empirical ablations supporting adaptation of both $W_q$ and $W_v$ (query and value projections) for optimal robustness-to-accuracy trade-off.

By training only the LoRA matrices—which constitute a small fraction of total model weights—the procedure is memory- and computewise efficient. This compactness supports scalable deployment on limited hardware and rapid iterative adaptation in production or federated settings.

5. Practical Implications and Scalability

AdvCLIP-LoRA’s parameter-efficient adversarial training strategy delivers robust adaptation suitable for edge devices and distributed systems where compute and communication overhead are constraining factors. Its established convergence rate under standard smoothness and bounded gradient assumptions makes the approach theoretically sound for large-scale minimax optimization.

A plausible implication is that AdvCLIP-LoRA, when coupled with scalable LoRA serving infrastructures featuring adaptive-tiling computation and flexible adapter orchestration (cf. (Mi et al., 1 Nov 2024)), will further accelerate robust deployment and reduce latency in vision–language tasks. The efficiency and robustness profile of AdvCLIP-LoRA position it as a practical solution for adversarial defense in real-world multimodal systems.

6. Comparative Perspective and Future Directions

Compared with AdvCLIP’s universal patch methodology (Zhou et al., 2023), AdvCLIP-LoRA focuses on model-internal robustness enhancement via adversarial and low-rank optimization, rather than externally generated adversarial examples. While defense mechanisms in AdvCLIP include input corruption, pruning, and adversarial training, AdvCLIP-LoRA specifically tailors adversarial training within the PEFT/LoRA paradigm, proving its effectiveness empirically.

Continued research will refine inner maximization procedures (adaptive $\tau$ ), explore layer selection for LoRA injection, and extend methodology to broader multimodal architectures, evaluation tasks, and application domains such as visual question answering and retrieval. AdvCLIP-LoRA's minimax low-rank adaptation framework is likely to inform the development of more general adversarially robust PEFT strategies for future vision–language systems.

PDF Markdown Chat (Pro)

References (2)

Empower Vision Applications with LoRA LMM (2024)

AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive Learning (2023)

Follow Topic

Get notified by email when new papers are published related to AdvCLIP-LoRA.