Directional Orthogonal Counterattack (DOC)
- DOC is a test-time defense mechanism that augments PGD by integrating orthogonal gradient exploration and momentum to better navigate the adversarial search space.
- It introduces a Directional Sensitivity Score (DSS) for adaptive gating, dynamically blending optimized and random perturbations to mitigate overfitting on clean inputs.
- Empirical results show DOC achieves robust accuracy gains (up to 31% on PGD-10) while maintaining competitive clean performance across multiple datasets.
Directional Orthogonal Counterattack (DOC) is a test-time defense mechanism designed to enhance the adversarial robustness of vision-language pre-training models, particularly CLIP, by expanding the diversity and coverage of counterattacks. Unlike standard Test-Time Counterattack (TTC) methods that rely solely on Projected Gradient Descent (PGD) steps, DOC introduces orthogonal exploration and momentum-based updates to address the limited search space and overfitting issues associated with traditional counterattack strategies. The method incorporates a novel directional sensitivity score for adaptive modulation of counterattack strength, ultimately improving the model's ability to neutralize a broader array of adversarial perturbations while maintaining competitive performance on clean data.
1. Optimization Objective and Update Scheme
DOC operates in the context of a possibly adversarial input , using the CLIP image encoder and a perturbation constrained within an budget . The objective extends the TTC maximization formulation:
DOC modifies the classic PGD iterative update by:
- Augmenting the standard loss gradient with an orthogonal random direction and a momentum buffer.
- Introducing a Directional Sensitivity Score (DSS) for adaptive modulation.
The iterative steps at each round are as follows:
- Surrogate loss:
- Normalized gradient:
- Orthogonal random augmentation:
- Sample ,
- Project to orthogonal subspace:
- Composite direction:
- Momentum buffer update:
- PGD step and projection:
Where is the orthogonal strength, is the momentum factor, is the step size, and denotes projection onto the -norm ball.
2. Complete Algorithmic Workflow
The DOC method is structured as follows, combining orthogonal augmentation with a sensitivity-based gating mechanism:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
tau_cos = 0 for m in range(1, M+1): eta_m = Uniform(-epsilon_ca, epsilon_ca) x_m = x_input + eta_m tau_cos += cos(I_theta(x_m), I_theta(x_input)) hat_tau = 1 - (tau_cos / M) w = sigmoid(gamma * (tau - hat_tau)) m_0 = 0 delta_ca_0 = Uniform(-epsilon_ca, epsilon_ca) delta_ca = delta_ca_0 for t in range(T): # Step 1: Compute normalized gradient g_t # Step 2: Sample random r, compute r^perp # Step 3: Compose d_t, update m_t # Step 4: PGD update and project # (as above) delta_ca_T = delta_ca final_delta_ca = w * delta_ca_T + (1-w) * delta_ca_0 return final_delta_ca |
The final counterattack perturbation is a convex combination of the fully optimized and the initial PGD random noise, weighted by the DSS-derived gate .
3. Orthogonal Augmentation and Momentum: Rationale and Mathematical Construction
Standard PGD counterattacks operate along , which, for the TTC objective, may only explore a restricted subset of the -ball. The key innovation in DOC is the addition of noise lying in the subspace orthogonal to the loss gradient:
- Any vector can be decomposed as .
- The DOC update discards the parallel component, forming , and injects it with strength .
- This composite step, especially after normalization, enables the optimizer to escape narrow basins and locate more generalizable counterattacks.
Momentum acts as a low-pass filter, amplifying directions that consistently improve the objective while filtering high-frequency steps, yielding more stable and effective traversal across the counterattack search space.
Empirical ablation demonstrates that orthogonal gradient augmentation (OGA) constitutes the primary driver for increased adversarial robustness, as evidenced by robust accuracy ≥31% (vs. ~21% for TTC) with only small reductions in clean accuracy.
4. Directional Sensitivity Score (DSS) and Adaptive Modulation
DSS is designed to quantify an input's embedding sensitivity to isotropic random perturbations, independent of embedding scale. For each input , random noise samples are generated, and the average cosine similarity between the perturbed and clean embeddings is computed:
Clean images yield near zero (cosine similarity close to 1), while adversarial images show increased embedding drift ( larger). Rather than employing hard thresholding, a soft gate is computed:
where is a sharpness hyperparameter and a preset threshold. The final output is a convex blend: . This adaptivity prevents excessive perturbation of already-stable (clean) inputs and ensures robust correction on sensitive (likely adversarial) samples.
5. Empirical Evaluation and Results
DOC was evaluated on 16 diverse datasets, including CIFAR-10/100, STL-10, ImageNet (zero-shot), Caltech-101/256, Oxford Pets, Flowers102, Food101, Stanford Cars, SUN397, Country211, FGVC-Aircraft, EuroSAT, DTD, and PCAM.
Three adversarial threat models were applied:
- PGD-10 (, )
- C&W (, )
- AutoAttack ensemble (, )
Key robust and clean accuracy results are summarized below:
| Setting | Robust Accuracy (%) | Clean Accuracy (%) |
|---|---|---|
| CLIP Baseline | 0.06 | 61.5 |
| TTC (prior state-of-the-art) | 21.22 (PGD-10) | 55.6 |
| DOC (full method) | 31.02 (PGD-10) | 58.3 |
| DOC (C&W) | 28.18 | — |
| DOC (AutoAttack) | ≈ 21.0 | — |
- DOC yields a +9.8% robust accuracy gain (PGD-10) over TTC, and +30.96% over baseline CLIP.
- Ablation reveals OGA alone provides ≈31.8% robust, ≈55.4% clean; DSS alone achieves ≈23.4% robust, ≈58.2% clean; full combination balances both metrics.
- Integration with adversarial fine-tuning (e.g., TeCoA, PMG-AFT, FARE) delivers a further 4–5% robust improvement.
The number of update steps saturates at 3–4, with default hyperparameter settings (, , , , , , ). All experiments use batch size 256 and a single NVIDIA 4090 GPU.
6. Practical Deployment Recommendations and Limitations
DOC requires no additional training but does induce test-time computational overhead proportional to and .
- Step count –$4$ is sufficient; further iterations provide negligible benefits.
- Orthogonal strength and momentum are robust intervals.
- Sensitivity sampling offers a stability-cost compromise; is viable.
- Soft gating with –$20$ smooths abrupt transitions.
- No formal guarantee of global optimality is provided; efficacy depends on the geometry of the CLIP embedding space.
- The method is directly applicable only to models with differentiable embedding functions and tractable projections.
- Potential extensions include adaptive orthogonal scaling, refined subspace sampling, and adaptation to alternative visio-linguistic backbones.
DOC advances test-time counterattack methodology by systematically incorporating orthogonality, momentum, and input-sensitive gating, leading to substantial gains in adversarial robustness with minimal sacrifice in clean evaluation accuracy.