Disrupting Deepfakes: Adversarial Attacks Against Conditional Image Translation Networks and Facial Manipulation Systems (2003.01279v3)

Published 3 Mar 2020 in cs.CV, cs.CR, cs.CY, and cs.LG

Abstract: Face modification systems using deep learning have become increasingly powerful and accessible. Given images of a person's face, such systems can generate new images of that same person under different expressions and poses. Some systems can also modify targeted attributes such as hair color or age. This type of manipulated images and video have been coined Deepfakes. In order to prevent a malicious user from generating modified images of a person without their consent we tackle the new problem of generating adversarial attacks against such image translation systems, which disrupt the resulting output image. We call this problem disrupting deepfakes. Most image translation architectures are generative models conditioned on an attribute (e.g. put a smile on this person's face). We are first to propose and successfully apply (1) class transferable adversarial attacks that generalize to different classes, which means that the attacker does not need to have knowledge about the conditioning class, and (2) adversarial training for generative adversarial networks (GANs) as a first step towards robust image translation networks. Finally, in gray-box scenarios, blurring can mount a successful defense against disruption. We present a spread-spectrum adversarial attack, which evades blur defenses. Our open-source code can be found at https://github.com/natanielruiz/disrupting-deepfakes.

View on arXiv

Authors (3)

Nataniel Ruiz (32 papers)
Sarah Adel Bargal (29 papers)
Stan Sclaroff (56 papers)

Citations (107)

View on Semantic Scholar

Summary

Disrupting Deepfakes: Mitigating Unauthorized Facial Manipulations through Adversarial Attacks

The paper by Ruiz, Bargal, and Sclaroff from Boston University introduces a novel approach to combating unauthorized image manipulations, specifically targeting the pervasive challenge of deepfakes. Through the strategic application of adversarial attacks, the authors propose a method to disrupt image translation systems that leverage generative adversarial networks (GANs) for face modifications. This work is pivotal given the increasing accessibility and potential misuse of deepfake technology for non-consensual image alterations.

The crux of the research addresses the generation of adversarial perturbations that, while imperceptible to human observers, are effectively disruptive to GAN-based image translation models. The motive behind this disruption is to prevent the generation of coherent and realistic deepfakes by making the translated images noticeably flawed or perceptually unreliable. The authors' approach leverages several techniques traditionally used in classification tasks and adapts them to the generative context, particularly focusing on Fast Gradient Sign Method (FGSM), Iterative FGSM (I-FGSM), and Projected Gradient Descent (PGD).

A remarkable aspect of this paper is its exploration into class-conditional image translation networks. The paper introduces the problem of class transferable adversarial attacks, where the disruptive perturbation remains effective across different conditioning classes without prior knowledge of the specific attribute class targeted by a malicious actor. This enhancement in transferability is accomplished through iterative and joint class transferable disruptions, expanding the robustness of the adversarial approach.

Furthermore, the researchers advance the state of adversarial defenses by proposing adversarial training for GANs. By incorporating adversarial examples during the training phase of both generator and discriminator components, the approach enhances the model’s resistance to disruptions. This proposal aligns with existing literature on adversarial robustness but extends its application to generative models, offering initial but promising results in fortifying image translation pipelines against adversarial inputs.

The paper also explores the tactics for evading blur defenses, which could be employed as a reactive measure in gray-box scenarios. The spread-spectrum approach proposed here is designed to defeat a range of blur techniques, showcasing adaptability to unknown pre-processing transformations.

In their experimental evaluation, the authors provide compelling quantitative metrics and qualitative results demonstrating the efficacy of their disruptions across several image translation systems like StarGAN, GANimation, pix2pixHD, and CycleGAN. The findings indicate successful disruption in most scenarios tested, although varying resistance was noted among the architectures, with GANimation displaying a degree of robustness to low-magnitude perturbations.

Implications of this work are profound, offering practical tools for individuals and organizations seeking to protect visual content from unauthorized alterations. The approach not only introduces a direct line of defense against deepfakes but also stimulates further exploration into adversarial techniques capable of safeguarding privacy and identity in digital media. Future developments may witness more sophisticated adversarial frameworks blended with emerging defensive strategies to address the rapid evolution of deepfake generations.

In summary, the outlined methodology and experimentation in this paper present a significant advancement in the domain of adversarial machine learning, specifically tailored to disrupt and prevent the misuse of deepfake technologies. Through thoughtful adaptations of adversarial attack mechanisms, combined with pioneering adversarial training techniques, the paper offers a scalable defense strategy that could be crucial in mitigating the potentially harmful impacts of deepface generation in the society.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - natanielruiz/disrupting-deepfakes: 🔥🔥Defending Against Deepfakes Using Adversarial Attacks on Conditional Image Translation Networks (313 stars)

Tweets

https://twitter.com/natanielruizg/status/1242913322590179328

https://twitter.com/hightempunknown/status/1280490557366829056

https://twitter.com/hn_frontpage/status/1242873641550970880

YouTube

Show All Videos