Knockoff Nets: Stealing Functionality of Black-Box Models (1812.02766v1)

Published 6 Dec 2018 in cs.CV, cs.CR, and cs.LG

Abstract: Machine Learning (ML) models are increasingly deployed in the wild to perform a wide range of tasks. In this work, we ask to what extent can an adversary steal functionality of such "victim" models based solely on blackbox interactions: image in, predictions out. In contrast to prior work, we present an adversary lacking knowledge of train/test data used by the model, its internals, and semantics over model outputs. We formulate model functionality stealing as a two-step approach: (i) querying a set of input images to the blackbox model to obtain predictions; and (ii) training a "knockoff" with queried image-prediction pairs. We make multiple remarkable observations: (a) querying random images from a different distribution than that of the blackbox training data results in a well-performing knockoff; (b) this is possible even when the knockoff is represented using a different architecture; and (c) our reinforcement learning approach additionally improves query sample efficiency in certain settings and provides performance gains. We validate model functionality stealing on a range of datasets and tasks, as well as on a popular image analysis API where we create a reasonable knockoff for as little as $30.

Citations (494)

View on Semantic Scholar

Summary

The paper introduces a two-step framework that trains a knockoff model using input-output pairs from victim black-box models.
It demonstrates that an adversary can replicate model performance without knowledge of architecture, training data, or output details.
Reinforcement learning is used to enhance query efficiency, underscoring significant security risks in deployed machine learning systems.

Model Functionality Stealing: An Overview

The paper "Knockoff Nets: Stealing Functionality of Black-Box Models" addresses a significant issue in machine learning: the vulnerability of black-box models to functionality theft. This paper explores the potential for an adversary to replicate the functionality of such models by interacting with them solely through black-box access—where input data is queried, and the model's predictions are observed.

Key Contributions

Functionality Stealing Framework: The authors introduce a two-step approach for training a "knockoff" model. This involves querying a victim model with a set of images and using the image-prediction pairs to train the knockoff.
Adversary Constraints: The paper considers a scenario where the adversary has no knowledge of the victim model's training data, architecture, or output semantics, making this a particularly challenging and realistic threat scenario.
Model and Data Agnosticism: Remarkably, the knockoff model can perform well even when it uses a different architecture from the victim model and when queried with random images from different distributions than those used to train the victim model.
Reinforcement Learning for Query Efficiency: A reinforcement learning approach is proposed to improve sample efficiency, thereby reducing the number of queries required to train an effective knockoff model.

Experimental Validation

The concept was validated on various datasets and tasks, including a real-world image analysis API. For as little as $30, a reasonable knockoff was created that bypassed significant costs involved in training the original model.

Implications

Practical Implications

Security Risks: The findings highlight a practical security risk where companies deploying machine learning models as services may face model theft leading to reduced competitive edge.
Countermeasures: The paper also explores defensive strategies like output truncation but finds model functionality stealing resilient against such measures, suggesting a need for more robust defenses in deployment environments.

Theoretical Implications

Knowledge Transfer: Extends current understanding of knowledge transfer and distillation under weak assumptions, with potential for future exploration in unsupervised and semi-supervised settings.
Adversarial Machine Learning: Contributes to adversarial ML literature by emphasizing model endurance against exploitation purely based on input-output observation.

Future Directions

Expanding Adversary Models: Future work may investigate stronger adversary models with partial access to training data or architectures, offering deeper insights into model security.
Refinement of Defensive Strategies: Developing more resistance techniques can minimize the exploitability of models, potentially involving obfuscation methods or dynamic query response modifications.

In summary, the paper presents a thorough examination of model functionality phishing, highlighting the need for stronger safeguards in black-box model deployments. This research is crucial as it draws attention to the real-world implications of deploying vulnerable models and offers foundational insights for advancing security in AI systems.

PDF Markdown