Privacy Assessment on Reconstructed Images: Are Existing Evaluation Metrics Faithful to Human Perception? (2309.13038v2)

Published 22 Sep 2023 in cs.CV

Abstract: Hand-crafted image quality metrics, such as PSNR and SSIM, are commonly used to evaluate model privacy risk under reconstruction attacks. Under these metrics, reconstructed images that are determined to resemble the original one generally indicate more privacy leakage. Images determined as overall dissimilar, on the other hand, indicate higher robustness against attack. However, there is no guarantee that these metrics well reflect human opinions, which, as a judgement for model privacy leakage, are more trustworthy. In this paper, we comprehensively study the faithfulness of these hand-crafted metrics to human perception of privacy information from the reconstructed images. On 5 datasets ranging from natural images, faces, to fine-grained classes, we use 4 existing attack methods to reconstruct images from many different classification models and, for each reconstructed image, we ask multiple human annotators to assess whether this image is recognizable. Our studies reveal that the hand-crafted metrics only have a weak correlation with the human evaluation of privacy leakage and that even these metrics themselves often contradict each other. These observations suggest risks of current metrics in the community. To address this potential risk, we propose a learning-based measure called SemSim to evaluate the Semantic Similarity between the original and reconstructed images. SemSim is trained with a standard triplet loss, using an original image as an anchor, one of its recognizable reconstructed images as a positive sample, and an unrecognizable one as a negative. By training on human annotations, SemSim exhibits a greater reflection of privacy leakage on the semantic level. We show that SemSim has a significantly higher correlation with human judgment compared with existing metrics. Moreover, this strong correlation generalizes to unseen datasets, models and attack methods.

References (41)

Citations (5)

View on Semantic Scholar

Collections

Sign up for free to add this paper to one or more collections.

Sign Up

Summary

The paper reveals that traditional metrics (PSNR, SSIM, MSE) weakly correlate with human judgment in assessing privacy risks from image reconstruction.
It introduces SemSim, a new learning-based measure that employs a triplet loss function and human annotations to gauge semantic similarity.
SemSim shows robust performance and higher correlation with human evaluation across diverse datasets and reconstruction attack methods.

The paper "Privacy Assessment on Reconstructed Images: Are Existing Evaluation Metrics Faithful to Human Perception?" introduces a critical analysis of commonly used hand-crafted image quality metrics—specifically PSNR (Peak Signal-to-Noise Ratio), SSIM (Structural Similarity Index), and MSE (Mean Squared Error)—in evaluating privacy risks posed by image reconstruction attacks. The analysis is based on the premise that these traditional metrics, which assess pixel-level similarity, may not align with human perception regarding the amount of privacy information leakage from reconstructed images.

Key Findings:

Weak Correlation with Human Perception: The paper reveals that the aforementioned metrics, along with CNN-based metrics such as LPIPS (Learned Perceptual Image Patch Similarity), generally exhibit weak correlation with human judgment when it comes to privacy leakage through reconstructed images. This inconsistency signifies that the metrics may not accurately reflect the semantic information leakage from the perspective of human observers.
Introduction of SemSim: To address the discrepancy found in existing metrics, the paper proposes a new learning-based measure called SemSim (Semantic Similarity). SemSim is designed to evaluate semantic similarity between original and reconstructed images by using human annotations indicating whether a reconstructed image is recognizable. The model employs a standard triplet loss function during training, which improves its correlation with human judgment.
Generalization and Robustness: SemSim demonstrates a significantly higher correlation with human evaluation compared to existing metrics across a variety of datasets, classification models, and reconstruction attack methods. The measure is tested on five diverse datasets (ranging from CIFAR-100 to Stanford Dogs), and its robustness is further proven by generalizing well to unseen scenarios, including new datasets, model architectures, and attack techniques.
Evaluation and Comparison: The paper presents both visual and quantitative evaluations comparing SemSim to traditional metrics. Results show SemSim's superior capability in capturing semantic-level privacy leakage, with strong ranking correlations against human annotation scores. Despite the variations in dataset complexity, such as the fine-grained distinctions in the Stanford Dogs dataset, SemSim maintains reliable performance.
Implications and Future Directions: This work suggests that a shift towards semantic-focused evaluation metrics is crucial for more trustworthy assessments of privacy risks in image reconstruction scenarios. The authors plan to expand the collection of human annotations and explore more sophisticated neural architectures to improve the generalizability and precision of SemSim.

In summary, the paper points out significant limitations in current evaluation metrics for privacy risk assessment in image reconstruction tasks. By introducing SemSim, the authors offer a promising route toward aligning metric outputs with human perception, thus enhancing the credibility of privacy assessments in the field.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Authors (6)

YouTube

Show All Videos