Physical Adversarial Textures that Fool Visual Object Tracking (1904.11042v2)

Published 24 Apr 2019 in cs.RO, cs.CV, and cs.LG

Abstract: We present a system for generating inconspicuous-looking textures that, when displayed in the physical world as digital or printed posters, cause visual object tracking systems to become confused. For instance, as a target being tracked by a robot's camera moves in front of such a poster, our generated texture makes the tracker lock onto it and allows the target to evade. This work aims to fool seldom-targeted regression tasks, and in particular compares diverse optimization strategies: non-targeted, targeted, and a new family of guided adversarial losses. While we use the Expectation Over Transformation (EOT) algorithm to generate physical adversaries that fool tracking models when imaged under diverse conditions, we compare the impacts of different conditioning variables, including viewpoint, lighting, and appearances, to find practical attack setups with high resulting adversarial strength and convergence speed. We further showcase textures optimized solely using simulated scenes can confuse real-world tracking systems.

Citations (69)

View on Semantic Scholar

Summary

Overview of Physical Adversarial Textures for Object Tracking

The paper presents a novel approach to adversarial attacks on visual object tracking systems by introducing Physical Adversarial Textures (PATs). These PATs are designed to be inconspicuous yet capable of confusing regression-based neural object tracking models when displayed in the physical world. The research investigates mechanisms to generate these textures using several innovative strategies, including Expectation Over Transformation (EOT) and guided adversarial losses, which strike a balance between non-targeted and targeted adversarial objectives.

Methodology

The authors employ a white-box model assumption, utilizing the GOTURN architecture as a representative regression-based tracker for objects across video frames. The GOTURN tracker is typically trained to predict the bounding-box location of a target object in the current frame using information from the previous frame. The paper capitalizes on simulating several physical scenarios with Gazebo to assess PATs effectiveness in creating adversarial conditions that disrupt tracking accuracy.

The generation of PATs leverages EOT to produce robust adversaries that maintain effectiveness under varied transformations, including changes in viewpoint, lighting, and surrounding appearances. This requires differentiating through the rendering process, accounting for how textures are perceived through camera lenses under different environmental conditions. The PATs are optimized using various adversarial loss functions, including guided losses that encourage specific types of tracking deviations, notably shrinking or enlarging of prediction areas.

Findings

Initial evaluation demonstrates that guided adversarial losses, particularly those encouraging reduced prediction areas, yield strong adversaries with better convergence rates compared to traditional non-targeted and targeted losses. This insight underscores the potential for tailored loss functions that exploit unique vulnerabilities of sequential tracking models.

The research further explores the efficacy of different scene variables in enhancing adversarial strength. It concludes that while randomization of certain scene aspects is crucial, constrained variable ranges, such as fixed target poses, can sometimes accelerate convergence without compromising adversarial potency.

Imitation attacks, where the PAT mimics a source image, show that patterns of adversarial perturbations typically first emerge near vital visual cues in the image. Adversarial behaviors consistently cause the trackers to fix onto these critical patterns rather than the moving targets, suggesting that even highly constrained imitation attacks can demonstrate significant adversarial effects.

Real-World Implications

Evaluation extends to real-world conditions using a tracking drone equipped with a camera system, demonstrating that PATs designed entirely in simulated environments can effectively transfer to physical setups. The paper verifies robustness to diverse viewing conditions and partial target obstructions, reflecting practical adversarial risks in applications like surveillance, autonomous navigation, and drone-based object tracking.

However, the research suggests that real-world trackers, particularly in dynamic settings like drone flight, may exhibit resilience against PAT attacks due to motion blur and specular reflections, which remain unaddressed by the current simplified rendering assumptions.

Conclusion

This research highlights the susceptibility of modern visual tracking systems to physical-world adversarial attacks and calls for more robust integrations of auxiliary sensors (e.g., GPS, IMU) to safeguard against purely vision-based exploits. The groundwork laid here suggests possibilities for improved adversarial approaches and defenses, encouraging ongoing vigilance towards adversarial texture designs that more closely integrate with everyday visual patterns without detection.

Future work should explore black-box attack strategies, optimize non-differentiable attack objectives directly, and further investigate defenses against adversarial designs visually indistinguishable from natural items. The practicality and effectiveness of PATs in fooling sequential object trackers offer notable insights into the adversarial machine-learning domain, especially with applications beyond controlled simulation settings.