Exploiting Explanations for Model Inversion Attacks (2104.12669v3)

Published 26 Apr 2021 in cs.CV, cs.CY, and cs.LG

Abstract: The successful deployment of AI in many domains from healthcare to hiring requires their responsible use, particularly in model explanations and privacy. Explainable artificial intelligence (XAI) provides more information to help users to understand model decisions, yet this additional knowledge exposes additional risks for privacy attacks. Hence, providing explanation harms privacy. We study this risk for image-based model inversion attacks and identified several attack architectures with increasing performance to reconstruct private image data from model explanations. We have developed several multi-modal transposed CNN architectures that achieve significantly higher inversion performance than using the target model prediction only. These XAI-aware inversion models were designed to exploit the spatial knowledge in image explanations. To understand which explanations have higher privacy risk, we analyzed how various explanation types and factors influence inversion performance. In spite of some models not providing explanations, we further demonstrate increased inversion performance even for non-explainable target models by exploiting explanations of surrogate models through attention transfer. This method first inverts an explanation from the target prediction, then reconstructs the target image. These threats highlight the urgent and significant privacy risks of explanations and calls attention for new privacy preservation techniques that balance the dual-requirement for AI explainability and privacy.

PDF Abstract

Exploiting Explanations for Model Inversion Attacks

The paper "Exploiting Explanations for Model Inversion Attacks" investigates the potential privacy risks associated with explainable artificial intelligence (XAI) systems. By exposing model explanations, the ability to understand model decisions is enhanced, but it also increases vulnerability to privacy attacks, specifically model inversion attacks. This research presents innovative architectures for such attacks, demonstrating how additional information from XAI can be harnessed to reconstruct private image data with higher accuracy.

Model Inversion and XAI

Model inversion attacks traditionally aim to reconstruct input datasets using the predictions of machine learning models. This paper expands on them by integrating model explanations—saliency maps, gradients, and more—into the inversion process. Multi-modal transposed CNN architectures leverage this extended information to significantly increase inversion accuracy compared to utilizing target model predictions alone. Notably, the paper examines which types of XAI explanations pose higher privacy risks, providing a detailed exploration of their sensitivity and utility to adversaries.

Architectures for Enhanced Attacks

The paper introduces several architectures aimed at maximally exploiting explanations. These include multi-modal architectures that integrate target prediction with flattened explanations, U-Net style networks to preserve spatial information, and hybrid models that combine multiple methods. The architectures are tested against several explanation types, revealing that methods relying on finer granularity images, like gradients, can offer attackers significant utility.

Surrogate Models and Attention Transfer

An especially critical finding is that even when target models are not directly explainable, attackers can exploit explanations generated by surrogate models. By transferring attention from a surrogate explainable model to the targeted system, the paper shows enhanced inversion performance, posing significant privacy challenges. This method essentially constructs explanations through an intermediate model, then uses them for more effective inversion attacks.

Implications for AI Systems

The implications of these findings are considerable for the future of responsible AI deployment, particularly in high-stakes domains that demand both transparency and privacy. As regulatory requirements on AI continue to evolve, balancing the dual necessities of explainability and privacy emerges as an urgent priority. The paper calls for the development of privacy-preserving techniques that do not compromise the interpretability of models. In doing so, it lays the groundwork for future studies into the defense mechanisms necessary to protect sensitive data against XAI-aware inversion threats.

Future Directions

As AI systems become more pervasive, understanding the trade-offs between providing useful explanations and maintaining data privacy is paramount. This paper opens avenues for future research into robust privacy-preserving strategies and the potential of advanced inversion techniques in other areas of AI. The balance between transparency and protection remains a pressing challenge for the field as it aims to meet societal and regulatory demands.

In conclusion, while XAI offers enhanced model transparency, it equally exposes new vectors for privacy attacks. This research underscores the need for continued innovation in safeguarding personal data while maintaining AI usability and transparency.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Xuejun Zhao (2 papers)
Wencan Zhang (6 papers)
Xiaokui Xiao (90 papers)
Brian Y. Lim (14 papers)

Citations (70)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos