Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
91 tokens/sec
Gemini 2.5 Pro Premium
51 tokens/sec
GPT-5 Medium
32 tokens/sec
GPT-5 High Premium
24 tokens/sec
GPT-4o
86 tokens/sec
DeepSeek R1 via Azure Premium
95 tokens/sec
GPT OSS 120B via Groq Premium
460 tokens/sec
Kimi K2 via Groq Premium
208 tokens/sec
2000 character limit reached

Privacy Assessment on Reconstructed Images: Are Existing Evaluation Metrics Faithful to Human Perception? (2309.13038v2)

Published 22 Sep 2023 in cs.CV

Abstract: Hand-crafted image quality metrics, such as PSNR and SSIM, are commonly used to evaluate model privacy risk under reconstruction attacks. Under these metrics, reconstructed images that are determined to resemble the original one generally indicate more privacy leakage. Images determined as overall dissimilar, on the other hand, indicate higher robustness against attack. However, there is no guarantee that these metrics well reflect human opinions, which, as a judgement for model privacy leakage, are more trustworthy. In this paper, we comprehensively study the faithfulness of these hand-crafted metrics to human perception of privacy information from the reconstructed images. On 5 datasets ranging from natural images, faces, to fine-grained classes, we use 4 existing attack methods to reconstruct images from many different classification models and, for each reconstructed image, we ask multiple human annotators to assess whether this image is recognizable. Our studies reveal that the hand-crafted metrics only have a weak correlation with the human evaluation of privacy leakage and that even these metrics themselves often contradict each other. These observations suggest risks of current metrics in the community. To address this potential risk, we propose a learning-based measure called SemSim to evaluate the Semantic Similarity between the original and reconstructed images. SemSim is trained with a standard triplet loss, using an original image as an anchor, one of its recognizable reconstructed images as a positive sample, and an unrecognizable one as a negative. By training on human annotations, SemSim exhibits a greater reflection of privacy leakage on the semantic level. We show that SemSim has a significantly higher correlation with human judgment compared with existing metrics. Moreover, this strong correlation generalizes to unseen datasets, models and attack methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255, 2009.
  2. DC Dowson and BV666017 Landau. The fréchet distance between multivariate normal distributions. Journal of multivariate analysis, 12(3):450–455, 1982.
  3. On the discrepancy between the theoretical analysis and practical implementations of compressed communication for distributed deep learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 3817–3824, 2020.
  4. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Computer Vision and Pattern Recognition Workshop, 2004.
  5. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pages 1322–1333, 2015.
  6. Privacy-preserving collaborative learning with automatic transformation search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 114–123, 2021.
  7. Inverting gradients-how easy is it to break privacy in federated learning? Advances in Neural Information Processing Systems, 2020.
  8. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  9. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in neural information processing systems, 2017.
  10. Image quality metrics: Psnr vs. ssim. In 2010 20th international conference on pattern recognition, pages 2366–2369, 2010.
  11. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017.
  12. Evaluating gradient inversion attacks and defenses in federated learning. Advances in Neural Information Processing Systems, 34:7232–7241, 2021.
  13. Scope of validity of psnr in image/video quality assessment. Electronics letters, 44(13):800–801, 2008.
  14. Cafe: Catastrophic data leakage in vertical federated learning. Advances in Neural Information Processing Systems, 34:994–1006, 2021.
  15. Maurice G Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81–93, 1938.
  16. Novel dataset for fine-grained image categorization: Stanford dogs. In Proc. CVPR workshop on fine-grained visual categorization (FGVC), volume 2, 2011.
  17. Learning multiple layers of features from tiny images. 2009.
  18. On the limited memory bfgs method for large scale optimization. Mathematical programming, 45(1-3):503–528, 1989.
  19. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  20. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), 2015.
  21. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, volume 2, pages 416–423, 2001.
  22. Towards perspective-free object counting with deep learning. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14, pages 615–629, 2016.
  23. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  24. Full-reference image quality metrics: Classification and evaluation. Foundations and Trends® in Computer Graphics and Vision, 7(1):1–80, 2012.
  25. Privacy-preserving deep learning: Revisited and enhanced. In Applications and Techniques in Information Security: 8th International Conference, ATIS 2017, Auckland, New Zealand, July 6–7, 2017, Proceedings, pages 100–110, 2017.
  26. Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015.
  27. Improved techniques for training gans. Advances in neural information processing systems, 29, 2016.
  28. Image quality assessment through fsim, ssim, mse and psnr—a comparative study. Journal of Computer and Communications, 7(3):8–18, 2019.
  29. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823, 2015.
  30. Charles Spearman. The proof and measurement of association between two things. 1961.
  31. Soteria: Provable defense against privacy leakage in federated learning from representation perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9311–9319, 2021.
  32. Beyond inferring class representatives: User-level privacy leakage from federated learning. In IEEE INFOCOM 2019-IEEE conference on computer communications, pages 2512–2520, 2019.
  33. A universal image quality index. IEEE signal processing letters, 9(3):81–84, 2002.
  34. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
  35. Adversarial learning of privacy-preserving and task-oriented representations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 12434–12441, 2020.
  36. Clinical skin lesion diagnosis using representations inspired by dermatologist criteria. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1258–1266, 2018.
  37. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
  38. A survey on gradient inversion: Attacks, defenses and future directions. arXiv preprint arXiv:2206.07284, 2022.
  39. idlg: Improved deep leakage from gradients. arXiv preprint arXiv:2001.02610, 2020.
  40. R-gap: Recursive gradient attack on privacy. Proceedings ICLR 2021, 2021.
  41. Deep leakage from gradients. Advances in neural information processing systems, 32, 2019.
Citations (5)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper reveals that traditional metrics (PSNR, SSIM, MSE) weakly correlate with human judgment in assessing privacy risks from image reconstruction.
  • It introduces SemSim, a new learning-based measure that employs a triplet loss function and human annotations to gauge semantic similarity.
  • SemSim shows robust performance and higher correlation with human evaluation across diverse datasets and reconstruction attack methods.

The paper "Privacy Assessment on Reconstructed Images: Are Existing Evaluation Metrics Faithful to Human Perception?" introduces a critical analysis of commonly used hand-crafted image quality metrics—specifically PSNR (Peak Signal-to-Noise Ratio), SSIM (Structural Similarity Index), and MSE (Mean Squared Error)—in evaluating privacy risks posed by image reconstruction attacks. The analysis is based on the premise that these traditional metrics, which assess pixel-level similarity, may not align with human perception regarding the amount of privacy information leakage from reconstructed images.

Key Findings:

  1. Weak Correlation with Human Perception: The paper reveals that the aforementioned metrics, along with CNN-based metrics such as LPIPS (Learned Perceptual Image Patch Similarity), generally exhibit weak correlation with human judgment when it comes to privacy leakage through reconstructed images. This inconsistency signifies that the metrics may not accurately reflect the semantic information leakage from the perspective of human observers.
  2. Introduction of SemSim: To address the discrepancy found in existing metrics, the paper proposes a new learning-based measure called SemSim (Semantic Similarity). SemSim is designed to evaluate semantic similarity between original and reconstructed images by using human annotations indicating whether a reconstructed image is recognizable. The model employs a standard triplet loss function during training, which improves its correlation with human judgment.
  3. Generalization and Robustness: SemSim demonstrates a significantly higher correlation with human evaluation compared to existing metrics across a variety of datasets, classification models, and reconstruction attack methods. The measure is tested on five diverse datasets (ranging from CIFAR-100 to Stanford Dogs), and its robustness is further proven by generalizing well to unseen scenarios, including new datasets, model architectures, and attack techniques.
  4. Evaluation and Comparison: The paper presents both visual and quantitative evaluations comparing SemSim to traditional metrics. Results show SemSim's superior capability in capturing semantic-level privacy leakage, with strong ranking correlations against human annotation scores. Despite the variations in dataset complexity, such as the fine-grained distinctions in the Stanford Dogs dataset, SemSim maintains reliable performance.
  5. Implications and Future Directions: This work suggests that a shift towards semantic-focused evaluation metrics is crucial for more trustworthy assessments of privacy risks in image reconstruction scenarios. The authors plan to expand the collection of human annotations and explore more sophisticated neural architectures to improve the generalizability and precision of SemSim.

In summary, the paper points out significant limitations in current evaluation metrics for privacy risk assessment in image reconstruction tasks. By introducing SemSim, the authors offer a promising route toward aligning metric outputs with human perception, thus enhancing the credibility of privacy assessments in the field.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube