Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
88 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
52 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
10 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
Gemini 2.5 Flash Deprecated
12 tokens/sec
2000 character limit reached

Gaussian Splatting Under Attack: Investigating Adversarial Noise in 3D Objects (2412.02803v1)

Published 3 Dec 2024 in cs.CV, cs.AI, and eess.IV

Abstract: 3D Gaussian Splatting has advanced radiance field reconstruction, enabling high-quality view synthesis and fast rendering in 3D modeling. While adversarial attacks on object detection models are well-studied for 2D images, their impact on 3D models remains underexplored. This work introduces the Masked Iterative Fast Gradient Sign Method (M-IFGSM), designed to generate adversarial noise targeting the CLIP vision-LLM. M-IFGSM specifically alters the object of interest by focusing perturbations on masked regions, degrading the performance of CLIP's zero-shot object detection capability when applied to 3D models. Using eight objects from the Common Objects 3D (CO3D) dataset, we demonstrate that our method effectively reduces the accuracy and confidence of the model, with adversarial noise being nearly imperceptible to human observers. The top-1 accuracy in original model renders drops from 95.4\% to 12.5\% for train images and from 91.2\% to 35.4\% for test images, with confidence levels reflecting this shift from true classification to misclassification, underscoring the risks of adversarial attacks on 3D models in applications such as autonomous driving, robotics, and surveillance. The significance of this research lies in its potential to expose vulnerabilities in modern 3D vision models, including radiance fields, prompting the development of more robust defenses and security measures in critical real-world applications.

Summary

  • The paper introduces M-IFGSM, a targeted white-box attack that reduces 2D Top-1 accuracy from 94.9% to 2.1% by perturbing object regions.
  • It employs zero-shot segmentation with SAM to confine noise, effectively transferring adversarial effects to both training and novel 3D renders.
  • The study highlights critical security risks in 3D vision pipelines and emphasizes the need for robust defenses in adversarial settings.

Analysis of "Gaussian Splatting Under Attack: Investigating Adversarial Noise in 3D Objects" (2412.02803)

This work systematically investigates the vulnerability of 3D radiance field reconstruction—specifically 3D Gaussian Splatting (3DGS)—to adversarial attacks targeting vision-LLMs such as CLIP. The primary technical novelty lies in the introduction of the Masked Iterative Fast Gradient Sign Method (M-IFGSM), which generates adversarial perturbations confined to object regions in multi-view images. These perturbations are then incorporated into the 3DGS reconstruction pipeline, allowing for an analysis of attack persistence and efficacy across both image and 3D model render spaces.

Methodology

The proposed M-IFGSM pipeline consists of:

  1. Semantic Mask Generation: Use of the Segment Anything Model (SAM) in zero-shot mode, without class-level supervision, to extract precise object masks in each view.
  2. Adversarial Example Generation: Perturbation of pixel values within the segmented regions using an iterative variant of FGSM. Gradients are computed with respect to a differentiable victim model (CLIP ViT-B/16 in the experiments), and perturbations are prevented from affecting background pixels.
  3. 3DGS Model Reconstruction: Aggregation of perturbed view images into a dense set, with reconstruction and rendering of the 3D model based on the adversarial images.

The attack is explicitly white-box, leveraging the availability of gradients from the target model.

Experimental Design

  • Dataset: Eight object categories from CO3D, representing common objects with multi-view image sets.
  • Evaluation Metrics: Top-1 and Top-5 classification accuracy, as well as average prediction confidence, are measured before and after adversarial attack at two cardinal points: perturbed 2D images and 3DGS model renders (from both training and held-out test viewpoints).
  • Hardware: Dual NVIDIA RTX 3090 GPUs, permitting efficient gradient-based attacks and dense 3DGS optimization.

Numerical Results and Claims

The application of M-IFGSM produces striking reductions in model performance:

  • 2D Input Images: Average Top-1 accuracy drops from 94.9% (clean images) to 2.1% (attacked), while Top-5 falls from 99.6% to 6.4%. Misclassification confidence increases, with the model assigning high certainty to incorrect predictions.
  • 3DGS Renders (Train Views): Top-1 accuracy decreases from 95.4% to 12.5%.
  • 3DGS Renders (Test Views): Top-1 accuracy decreases from 91.2% to 35.4%.
  • The adversarial noise remains nearly imperceptible, particularly due to its masking restriction to object regions, preserving background fidelity.

These results empirically demonstrate that adversarial vulnerabilities in 2D renderings successfully transfer through the 3D reconstruction pipeline, significantly degrading recognition accuracy in photorealistic renders synthesized from adversarial Gaussian point clouds.

A critical finding is that the success of the attack persists even in novel, held-out viewpoints, indicating the transferability of spatially-localized perturbations through 3DGS models. The paper further documents adverse edge cases with multi-instance scenes (e.g., partial masking in "couch" images), revealing limitations and opportunities for future segmentation-aware attacks.

Implications and Discussion

Practical Security and Robustness

The results underscore an acute security risk for applications relying on end-to-end 3D vision-language pipelines (e.g., autonomous vehicles, robotics, surveillance) that utilize multi-view image data for online or offline 3D scene understanding. Attackers could, by manipulating a subset of input images, meaningfully degrade or subvert downstream zero-shot object recognition in the resultant 3D reconstructions—without requiring conspicuous image corruption.

Theoretical Insights

The research provides clarity on the vulnerability surface of 3D radiance-field approaches, which are increasingly utilized for fast, high-fidelity rendering. It confirms that classical adversarial frameworks (FGSM variants) can be adapted to the 3D context with minimal algorithmic changes, provided careful masking and model access.

Limitations

  • Efficacy is highly dependent on mask quality; imprecise masks can allow the attacks to fail partially.
  • Performance drop is less drastic on out-of-training-set novel viewpoints, indicating some degree of adversarial overfitting to training poses.
  • Transferability to other object categories, larger scenes, or other radiance field architectures remains untested.

Potential Future Directions

  • Development of masking-agnostic or instance-forgiving attacks for multi-object scenes.
  • Extension to black-box or query-limited attacker scenarios, reducing reliance on white-box access.
  • Exploration of robust 3DGS and radiance field defenses, such as adversarial training at the image or latent point cloud level.
  • Generalization to time-varying (dynamic scene) models and higher-complexity environments.

Conclusion

This paper significantly advances the paper of adversarial robustness in 3D computer vision, demonstrating that carefully targeted 2D adversarial attacks can severely impact multi-view 3D object detection when incorporated into modern radiance field pipelines like 3D Gaussian Splatting. These findings reveal urgent needs for adversarial resilience and robust training paradigms in critical 3D vision applications, suggesting a fertile space for further research into secure 3D scene understanding.