SoK: Unintended Interactions among Machine Learning Defenses and Risks (2312.04542v2)
Abstract: Machine learning (ML) models cannot neglect risks to security, privacy, and fairness. Several defenses have been proposed to mitigate such risks. When a defense is effective in mitigating one risk, it may correspond to increased or decreased susceptibility to other risks. Existing research lacks an effective framework to recognize and explain these unintended interactions. We present such a framework, based on the conjecture that overfitting and memorization underlie unintended interactions. We survey existing literature on unintended interactions, accommodating them within our framework. We use our framework to conjecture on two previously unexplored interactions, and empirically validate our conjectures.
- J. Aalmoes et al., “Leveraging algorithmic fairness to mitigate blackbox attribute inference attacks,” arXiv preprint arXiv:2211.10209, 2022.
- M. Abadi et al., “Deep learning with differential privacy,” in CCS, 2016, p. 308–318. [Online]. Available: https://doi.org/10.1145/2976749.2978318
- Y. Adi et al., “Turning your weakness into a strength: Watermarking deep neural networks by backdooring,” in USENIX Security, 2018, pp. 1615–1631. [Online]. Available: https://www.usenix.org/conference/usenixsecurity18/presentation/adi
- A. Agarwal et al., “A reductions approach to fair classification,” in ICML, vol. 80, 2018, pp. 60–69.
- U. Aïvodji et al., “Model extraction from counterfactual explanations,” arXiv preprint arXiv:2009.01884, 2020.
- G. Alves et al., “Survey on fairness notions and related tensions,” EURO Journal on Decision Processes, 2023.
- M. Ancona et al., “Towards better understanding of gradient-based attribution methods for deep neural networks,” in ICLR, 2018. [Online]. Available: https://openreview.net/forum?id=Sy21R9JAW
- D. Arpit et al., “A closer look at memorization in deep networks,” in ICML, 2017, p. 233–242.
- G. Ateniese et al., “Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers,” Int. J. Secur. Netw., vol. 10, no. 3, p. 137–150, Sep. 2015.
- B. G. Atli Tekgul and N. Asokan, “On the effectiveness of dataset watermarking,” in IWSPA, 2022, p. 93–99. [Online]. Available: https://doi.org/10.1145/3510548.3519376
- E. Bagdasaryan et al., “Differential privacy has disparate impact on model accuracy,” in NeurIPS, 2019, pp. 15 479–15 488.
- A. Balagopalan et al., “The road to explainability is paved with bias: Measuring the fairness of explanations,” in FaccT, 2022, p. 1194–1206. [Online]. Available: https://doi.org/10.1145/3531146.3533179
- T. Baluta et al., “Membership inference attacks and generalization: A causal perspective,” in CCS, 2022, p. 249–262.
- H. Baniecki et al., “Fooling partial dependence via data poisoning,” in Machine Learning and Knowledge Discovery in Databases, 2023, pp. 121–136.
- M. Belkin et al., “Reconciling modern machine-learning practice and the classical bias–variance trade-off,” Proc. of National Academy of Sciences, vol. 116, no. 32, pp. 15 849–15 854, 2019. [Online]. Available: https://www.pnas.org/doi/abs/10.1073/pnas.1903070116
- P. Benz et al., “Robustness may be at odds with fairness: An empirical study on class-wise accuracy,” in NeurIPS Workshop on Pre-registration in ML, 2021, pp. 325–342. [Online]. Available: https://proceedings.mlr.press/v148/benz21a.html
- P. Blanchard et al., “Machine learning with adversaries: Byzantine tolerant gradient descent,” in NeurIPS, 2017. [Online]. Available: https://proceedings.neurips.cc/paper/2017/file/f4b9ec30ad9f68f89b29639786cb62ef-Paper.pdf
- F. Boenisch et al., “Gradient masking and the underestimated robustness threats of differential privacy in deep learning,” arXiv preprint arXiv:2105.07985, 2021.
- E. Borgnia et al., “Strong data augmentation sanitizes poisoning and backdoor attacks without an accuracy tradeoff,” in ICASSP, 2021, pp. 3855–3859.
- G. Brown et al., “When is memorization of irrelevant training data necessary for high-accuracy learning?” in STOC, 2021, p. 123–132.
- X. Cao et al., “Ipguard: Protecting intellectual property of deep neural networks via fingerprinting the classification boundary,” in AsiaCCS, 2021, p. 14–25. [Online]. Available: https://doi.org/10.1145/3433210.3437526
- N. Carlini et al., “The secret sharer: Evaluating and testing unintended memorization in neural networks,” in USENIX Security, 2019, pp. 267–284. [Online]. Available: https://www.usenix.org/conference/usenixsecurity19/presentation/carlini
- ——, “Extracting training data from large language models,” in USENIX Security, 2021, pp. 2633–2650. [Online]. Available: https://www.usenix.org/conference/usenixsecurity21/presentation/carlini-extracting
- ——, “Membership inference attacks from first principles,” in S&P, 2022, pp. 1897–1914.
- ——, “The privacy onion effect: Memorization is relative,” in NeurIPS, 2022. [Online]. Available: https://openreview.net/forum?id=ErUlLrGaVEU
- ——, “Quantifying memorization across neural language models,” arXiv preprint arXiv:2202.07646, 2022.
- ——, “Extracting training data from diffusion models,” in USENIX Security, 2023.
- H. Chang et al., “On adversarial bias and the robustness of fair machine learning,” 2020.
- H. Chang and R. Shokri, “On the privacy risks of algorithmic fairness,” in EuroS&P, 2021, pp. 292–303.
- H. Chaudhari et al., “Snap: Efficient extraction of private properties with poisoning,” in S&P, 2023, pp. 400–417. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/SP46215.2023.10179334
- J. Chen et al., “Hopskipjumpattack: A query-efficient decision-based attack,” in 2020 S&P, 2020, pp. 1277–1294.
- M. Chen and O. Ohrimenko, “Protecting global properties of datasets with distribution privacy mechanisms,” 2022. [Online]. Available: https://arxiv.org/abs/2207.08367
- Y. Chen et al., “Amplifying membership exposure via data poisoning,” in NeurIPS, 2022. [Online]. Available: https://openreview.net/forum?id=mT18WLu9J_
- C. A. Choquette-Choo et al., “Label-only membership inference attacks,” in ICML, 2021, pp. 1964–1974. [Online]. Available: https://proceedings.mlr.press/v139/choquette-choo21a.html
- J. R. Correia-Silva et al., “Copycat cnn: Stealing knowledge by persuading confession with random non-labeled data,” in IJCNN, 2018, pp. 1–8.
- F. Croce et al., “Robustbench: a standardized adversarial robustness benchmark,” in NeurIPS Datasets and Benchmarks Track, 2021. [Online]. Available: https://openreview.net/forum?id=SSKZPJCt7B
- R. Cummings et al., “On the compatibility of privacy and fairness,” in Adjunct Publication of UMAP, 2019, p. 309–315. [Online]. Available: https://doi.org/10.1145/3314183.3323847
- J. Dai et al., “Fairness via explanation quality: Evaluating disparities in the quality of post hoc explanations,” in AIES, 2022, p. 203–214. [Online]. Available: https://doi.org/10.1145/3514094.3534159
- Y. Dar et al., “A farewell to the bias-variance tradeoff? an overview of the theory of overparameterized machine learning,” arXiv preprint arXiv:2109.02355, 2021.
- E. De Cristofaro, “An overview of privacy in machine learning,” arXiv preprint arXiv:2005.08679, 2020.
- A. Demontis et al., “Why do adversarial attacks transfer? explaining transferability of evasion and poisoning attacks,” in USENIX Security, 2019, pp. 321–338. [Online]. Available: https://www.usenix.org/conference/usenixsecurity19/presentation/demontis
- G. J. V. den Burg and C. Williams, “On memorization in probabilistic deep generative models,” in NeurIPS, 2021. [Online]. Available: https://openreview.net/forum?id=PlGSgjFK2oJ
- A.-K. Dombrowski et al., “Explanations can be manipulated and geometry is to blame,” NeurIPS, vol. 32, 2019.
- V. Duddu and A. Boutet, “Inferring sensitive attributes from model explanations,” in CIKM, 2022, p. 416–425. [Online]. Available: https://doi.org/10.1145/3511808.3557362
- V. Duddu et al., “Towards privacy aware deep learning for embedded systems,” in SAC, p. 520–529. [Online]. Available: https://doi.org/10.1145/3477314.3507128
- ——, “Shapr: An efficient and versatile membership privacy risk metric for machine learning,” arXiv preprint arXiv:2112.02230, 2021.
- V. Feldman, “Does learning require memorization? a short tale about a long tail,” in STOC, 2020, pp. 954–959.
- V. Feldman and C. Zhang, “What neural networks memorize and why: Discovering the long tail via influence estimation,” NeurIPS, pp. 2881–2891, 2020.
- J. Ferry et al., “Exploiting fairness to enhance sensitive attributes reconstruction,” in SaTML, 2023, pp. 18–41.
- F. Fioretto et al., “Differential privacy and fairness in decisions and learning tasks: A survey,” in IJCAI, 2022, pp. 5470–5477. [Online]. Available: https://doi.org/10.24963/ijcai.2022/766
- M. Fredrikson et al., “Model inversion attacks that exploit confidence information and basic countermeasures,” in CCS, 2015, p. 1322–1333. [Online]. Available: https://doi.org/10.1145/2810103.2813677
- ——, “Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing,” in USENIX Security, 2014, p. 17–32.
- S. Galhotra et al., “Causal feature selection for algorithmic fairness,” in SIGMOD, 2022, p. 276–285.
- K. Ganju et al., “Property inference attacks on fully connected neural networks using permutation invariant representations,” in CCS, 2018, p. 619–633. [Online]. Available: https://doi.org/10.1145/3243734.3243834
- J. Geiping et al., “Inverting gradients - how easy is it to break privacy in federated learning?” in NeurIPS, 2020, pp. 16 937–16 947. [Online]. Available: https://proceedings.neurips.cc/paper/2020/file/c4ede56bbd98819ae6112b20ac6bf145-Paper.pdf
- S. Geman et al., “Neural networks and the bias/variance dilemma,” Neural Computation, vol. 4, no. 1, pp. 1–58, 1992.
- A. Gittens et al., “An adversarial perspective on accuracy, robustness, fairness, and privacy: Multilateral-tradeoffs in trustworthy ml,” IEEE Access, vol. 10, pp. 120 850–120 865, 2022.
- I. J. Goodfellow et al., “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
- K. Grosse et al., “On the (statistical) detection of adversarial examples,” arXiv preprint arXiv:1702.06280, 2017.
- T. Gu et al., “Badnets: Evaluating backdooring attacks on deep neural networks,” IEEE Access, vol. 7, pp. 47 230–47 244, 2019.
- R. Guerraoui et al., “The hidden vulnerability of distributed learning in byzantium,” in ICML, 2018, pp. 3521–3530.
- R. Guidotti et al., “A survey of methods for explaining black box models,” ACM Comput. Surv., vol. 51, no. 5, 2018. [Online]. Available: [https://doi.org/10.1145/3236009](https://doi.org/10.1145/3236009)
- F. Harder et al., “Interpretable and differentially private predictions,” pp. 4083–4090, 2020. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/5827
- M. Hardt et al., “Equality of opportunity in supervised learning,” NeurIPS, 2016.
- ——, “Train faster, generalize better: Stability of stochastic gradient descent,” in ICML, 2016, p. 1225–1234.
- V. Hartmann et al., “Distribution inference risks: Identifying and mitigating sources of leakage,” in SaTML, 2023, pp. 136–149. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/SaTML54575.2023.00018
- J. Hayes, “Trade-offs between membership privacy & adversarially robust learning,” 2020.
- J. Heo et al., “Fooling neural network interpretations via adversarial model manipulation,” 2019.
- S. Hong et al., “On the effectiveness of mitigating data poisoning attacks with gradient shaping,” arXiv preprint arXiv:2002.11497, 2020.
- H. Hu et al., “Membership inference attacks on machine learning: A survey,” ACM Comput. Surv., 2022. [Online]. Available: [https://doi.org/10.1145/3523273](https://doi.org/10.1145/3523273)
- Y. Hu et al., “Understanding the impact of adversarial robustness on accuracy disparity,” in ICML, 2023.
- T. Humphries et al., “Investigating membership inference attacks under data dependencies,” in CSF, 2023, pp. 473–488.
- A. Ilyas et al., “Adversarial examples are not bugs, they are features,” NeurIPS, vol. 32, 2019.
- M. Jagielski et al., “Differentially private fair learning,” in ICML, 2019, pp. 3000–3008. [Online]. Available: https://proceedings.mlr.press/v97/jagielski19a.html
- ——, “Subpopulation data poisoning attacks,” in CCS, 2021, pp. 3104–3122.
- M. Jagielski and A. Oprea, “Does differential privacy defeat data poisoning?” Workshop on Distributed and Private Machine Learning, ICLR, 2021.
- B. Jayaraman and D. Evans, “Evaluating differentially private machine learning in practice,” in USENIX Security, 2019.
- ——, “Are attribute inference attacks just imputation?” in CCS, 2022, p. 1569–1582. [Online]. Available: https://doi.org/10.1145/3548606.3560663
- R. Jia et al., “Efficient task-specific data valuation for nearest neighbor algorithms,” Proc. VLDB Endow., vol. 12, no. 11, p. 1610–1623, 2019.
- ——, “Towards efficient data valuation based on the shapley value,” in AISTATS, 2019, pp. 1167–1176. [Online]. Available: http://proceedings.mlr.press/v89/jia19a.html
- ——, “Scalability vs. utility: Do we have to sacrifice one for the other in data importance quantification?” in CVPR, 2021.
- M. Juuti et al., “Prada: protecting against dnn model stealing attacks,” in EuroS&P, 2019, pp. 512–527.
- F. Kamiran and T. Calders, “Data pre-processing techniques for classification without discrimination,” Knowledge and Information Systems, vol. 33, 2011.
- N. Kandpal et al., “Deduplicating training data mitigates privacy risks in language models,” in ICML, 2022.
- H. Karimi et al., “Characterizing the decision boundary of deep neural networks,” arXiv preprint arXiv:1912.11460, 2019.
- K. Khaled et al., “Careful what you wish for: on the extraction of adversarially trained models,” in PST, 2022, pp. 1–10.
- B. Kim et al., “Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav),” in ICML, 2018, pp. 2668–2677.
- K. Krishna et al., “Thieves of sesame street: Model extraction on bert-based apis,” in ICLR, 2020.
- M. Lecuyer et al., “Certified robustness to adversarial examples with differential privacy,” in S&P, 2019, pp. 656–672.
- K. Lee et al., “Deduplicating training data makes language models better,” in ACL, 2022, pp. 8424–8445. [Online]. Available: https://aclanthology.org/2022.acl-long.577
- G. Li et al., “Adversarial training over long-tailed distribution,” arXiv preprint arXiv:2307.10205, 2023.
- L. Li et al., “Sok: Certified robustness for deep neural networks,” in S&P, 2023, pp. 1289–1310. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/SP46215.2023.10179303
- Y. Li et al., “Untargeted backdoor watermark: Towards harmless and stealthy dataset copyright protection,” NeurIPS, vol. 35, pp. 13 238–13 250, 2022.
- J. Liu et al., “False claims against model ownership resolution,” arXiv preprint arXiv:2304.06607, 2023.
- Y. Liu et al., “{{\{{ML-Doctor}}\}}: Holistic risk assessment of inference attacks against machine learning models,” in USENIX Security, 2022, pp. 4525–4542.
- S. Lounici et al., “Blindspot: Watermarking through fairness,” in IH&MMSec, 2022, p. 39–50. [Online]. Available: https://doi.org/10.1145/3531536.3532950
- N. Lukas et al., “Deep neural network fingerprinting by conferrable adversarial examples,” in ICLR, 2021. [Online]. Available: https://openreview.net/forum?id=VqzVhqxkjH1
- ——, “Sok: How robust is image classification deep neural network watermarking?” in S&P, 2022, pp. 787–804.
- X. Ma et al., “On the tradeoff between robustness and fairness,” in NeurIPS, 2022. [Online]. Available: https://openreview.net/forum?id=LqGA2JMLwBw
- Y. Ma et al., “Data poisoning against differentially-private learners: Attacks and defenses,” in IJCAI, 2019, p. 4732–4738.
- G. R. Machado et al., “Adversarial machine learning in image classification: A survey toward the defender’s perspective,” ACM Comput. Surv., vol. 55, no. 1, 2021. [Online]. Available: [https://doi.org/10.1145/3485133](https://doi.org/10.1145/3485133)
- A. Madry et al., “Towards deep learning models resistant to adversarial attacks,” in ICLR, 2018. [Online]. Available: https://openreview.net/forum?id=rJzIBfZAb
- D. Mahajan et al., “The connection between out-of-distribution generalization and privacy of ml models,” 2020.
- ——, “Domain generalization using causal matching,” in ICML, 2021, pp. 7313–7324.
- S. Mahloujifar et al., “Property inference from poisoning,” in S&P, 2022, pp. 1120–1137.
- P. Maini et al., “Dataset inference: Ownership resolution in machine learning,” in ICLR, 2021. [Online]. Available: https://openreview.net/forum?id=hvdKKV2yt7T
- M. Malekzadeh et al., “Honest-but-curious nets: Sensitive attributes of private inputs can be secretly coded into the classifiers’ outputs,” in CCS, 2021, p. 825–844. [Online]. Available: https://doi.org/10.1145/3460120.3484533
- S. Mehnaz et al., “Are your sensitive attributes private? novel model inversion attribute inference attacks on classification models,” in USENIX Security, Boston, MA, 2022. [Online]. Available: https://www.usenix.org/conference/usenixsecurity22/presentation/mehnaz
- N. Mehrabi et al., “Exacerbating algorithmic bias through fairness attacks,” AIES, pp. 8930–8938, 2021. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/17080
- ——, “A survey on bias and fairness in machine learning,” ACM Comput. Surv., vol. 54, no. 6, 2021. [Online]. Available: [https://doi.org/10.1145/3457607](https://doi.org/10.1145/3457607)
- F. A. Mejia et al., “Robust or private? adversarial training makes models more vulnerable to privacy attacks,” in arXiv preprint arXiv:1906.06449, 2019.
- L. Melis et al., “Exploiting unintended feature leakage in collaborative learning,” in S&P, 2019, pp. 691–706.
- S.-M. Moosavi-Dezfooli et al., “Robustness via curvature regularization, and vice versa,” in CVPR, 2019, pp. 9078–9086.
- R. Nabi and I. Shpitser, “Fair inference on outcomes,” in AAAI, 2018.
- N.-B. Nguyen et al., “Re-thinking model inversion attacks against deep neural networks,” in CVPR, 2023, pp. 16 384–16 393.
- H. Nori et al., “Accuracy, interpretability, and differential privacy via explainable boosting,” in ICML, 2021, pp. 8227–8237. [Online]. Available: https://proceedings.mlr.press/v139/nori21a.html
- T. Orekondy et al., “Knockoff nets: Stealing functionality of black-box models,” in CVPR, 2019, pp. 4954–4963.
- N. Papernot et al., “Sok: Security and privacy in machine learning,” in EuroS&P, 2018, pp. 399–414.
- N. Patel et al., “Model explanations with differential privacy,” in FaccT, 2022, p. 1895–1904. [Online]. Available: https://doi.org/10.1145/3531146.3533235
- A. Paudice et al., “Detection of adversarial training examples in poisoning attacks through anomaly detection,” arXiv preprint arXiv:1802.03041, 2018.
- M. Pawelczyk et al., “On the privacy risks of algorithmic recourse,” in AISTATS, 2023, pp. 9680–9696.
- Z. Peng et al., “Fingerprinting deep neural networks globally via universal adversarial perturbations,” in CVPR, 2022, pp. 13 430–13 439.
- D. Pessach and E. Shmueli, “A review on fairness in machine learning,” ACM Comput. Surv., vol. 55, no. 3, feb 2022. [Online]. Available: [https://doi.org/10.1145/3494672](https://doi.org/10.1145/3494672)
- C. Pinzon et al., “On the incompatibility of accuracy and equal opportunity,” Machine Learning, pp. 1–30, 2023.
- G. Pleiss et al., “On fairness and calibration,” NeurIPS, 2017.
- P. Quan et al., “On the amplification of security and privacy risks by post-hoc explanations in machine learning models,” arXiv preprint arXiv:2206.14004, 2022.
- A. Raghunathan et al., “Understanding and mitigating the tradeoff between robustness and accuracy,” in ICML, 2020.
- M. A. Rahman et al., “Membership inference attack against differentially private deep learning model.” Trans. Data Priv., vol. 11, no. 1, pp. 61–79, 2018.
- K. Rodolfa et al., “Empirical observation of negligible fairness–accuracy trade-offs in machine learning for public policy,” Nature Machine Intelligence, vol. 3, no. 10, pp. 896–904, 2021.
- A. Sablayrolles et al., “Radioactive data: tracing through training,” in ICML, 2020, pp. 8326–8335.
- A. Salem et al., “Sok: Let the privacy games begin! a unified treatment of data inference privacy in machine learning,” in S&P, 2023, pp. 327–345. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/SP46215.2023.10179281
- R. R. Selvaraju et al., “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in ICCV, 2017, pp. 618–626.
- A. Shafahi et al., “Poison frogs! targeted clean-label poisoning attacks on neural networks,” in NeurIPS, 2018, p. 6106–6116.
- S. Shekhar et al., “Fairod: Fairness-aware outlier detection,” in AIES, 2021, p. 210–220. [Online]. Available: https://doi.org/10.1145/3461702.3462517
- Y. Shen et al., “Towards understanding the impact of model size on differential private classification,” 2021.
- R. Shokri et al., “Membership inference attacks against machine learning models,” in S&P, 2017, pp. 3–18.
- ——, “On the privacy risks of model explanations,” in AIES, 2021, p. 231–241. [Online]. Available: https://doi.org/10.1145/3461702.3462533
- A. Shrikumar et al., “Learning important features through propagating activation differences,” in ICML, 2017, p. 3145–3153.
- D. Z. Slack et al., “Counterfactual explanations can be manipulated,” in NeurIPS, 2021. [Online]. Available: https://openreview.net/forum?id=iaO_IH7CnGJ
- D. Smilkov et al., “Smoothgrad: removing noise by adding noise,” 2017.
- D. Solans et al., “Poisoning attacks on algorithmic fairness,” in ECML PKDD, 2020, p. 162–177.
- C. Song et al., “Machine learning models that remember too much,” in CCS, 2017, p. 587–601. [Online]. Available: https://doi.org/10.1145/3133956.3134077
- C. Song and V. Shmatikov, “Overlearning reveals sensitive attributes,” in ICLR, 2020.
- L. Song et al., “Privacy risks of securing machine learning models against adversarial examples,” in CCS, 2019, p. 241–257. [Online]. Available: https://doi.org/10.1145/3319535.3354211
- P. Stock et al., “Defending against reconstruction attacks using rényi differential privacy,” 2023. [Online]. Available: https://openreview.net/forum?id=e0GcQ9l4Dh
- M. Strobel and R. Shokri, “Data privacy and trustworthy machine learning,” IEEE Security & Privacy, no. 01, pp. 2–7, 5555.
- M. Sundararajan et al., “Axiomatic attribution for deep networks,” in ICML, 2017, p. 3319–3328.
- A. Suri et al., “Dissecting distribution inference,” in SaTML, 2023, pp. 150–164. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/SaTML54575.2023.00019
- A. Suri and D. Evans, “Formalizing and estimating distribution inference risks,” PETS, 2022.
- V. M. Suriyakumar et al., “Chasing your long tails: Differentially private prediction in health care settings,” in FaccT, 2021, p. 723–734. [Online]. Available: https://doi.org/10.1145/3442188.3445934
- S. Szyller and N. Asokan, “Conflicting interactions among protection mechanisms for machine learning models,” in AAAI, 2023, pp. 15 179–15 187.
- S. Szyller et al., “Dawn: Dynamic adversarial watermarking of neural networks,” in MM, 2021, p. 4417–4425. [Online]. Available: https://doi.org/10.1145/3474085.3475591
- L. Tao et al., “Better safe than sorry: Preventing delusive adversaries with adversarial training,” in NeurIPS, 2021. [Online]. Available: https://openreview.net/forum?id=I39u89067j
- Z. Tian et al., “A comprehensive survey on poisoning attacks and countermeasures in machine learning,” ACM Comput. Surv., vol. 55, no. 8, 2022. [Online]. Available: https://doi.org/10.1145/3551636
- S. Tople et al., “Alleviating privacy attacks via causal learning,” in ICML, 2020.
- F. Tramèr et al., “Stealing machine learning models via prediction apis,” in USENIX Security, 2016, p. 601–618.
- ——, “Truth serum: Poisoning machine learning models to reveal their secrets,” in CCS, 2022. [Online]. Available: https://doi.org/10.1145/3548606.3560554
- C. Tran et al., “Differentially private empirical risk minimization under the fairness lens,” in NeurIPS, 2021. [Online]. Available: https://openreview.net/forum?id=7EFdodSWee4
- ——, “A fairness analysis on private aggregation of teacher ensembles,” arXiv preprint arXiv:2109.08630, 2021.
- ——, “Fairness increases adversarial vulnerability,” 2022.
- D. Tsipras et al., “Robustness may be at odds with accuracy,” in ICLR, 2019. [Online]. Available: https://openreview.net/forum?id=SyxAb30cY7
- N. Tursynbek et al., “Robustness threats of differential privacy,” NeurIPS Privacy-Preserving Machine Learning Workshop, 2020.
- M.-H. Van et al., “Poisoning attacks on fair machine learning,” in DASFAA, 2022, pp. 370–386.
- A. K. Veldanda et al., “Fairness via in-processing in the over-parameterized regime: A cautionary tale with mindiff loss,” TMLR, 2023. [Online]. Available: https://openreview.net/forum?id=f4VyYhkRvi
- A. Waheed et al., “Grove: Ownership verification of graph neural networks using embeddings,” in S&P, 2024.
- H. Wang et al., “Partial and asymmetric contrastive learning for out-of-distribution detection in long-tailed recognition,” in ICML, 2022, pp. 23 446–23 458.
- Y. Wang and F. Farnia, “On the role of generalization in transferability of adversarial examples,” in UAI, 2023, pp. 2259–2270. [Online]. Available: https://proceedings.mlr.press/v216/wang23g.html
- Y. Wang et al., “Dualcf: Efficient model extraction attack from counterfactual explanations,” in FaccT, 2022, p. 1318–1329. [Online]. Available: https://doi.org/10.1145/3531146.3533188
- M. Weber et al., “Rab: Provable robustness against backdoor attacks,” in S&P, 2023, pp. 640–657. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/SP46215.2023.00037
- R. Wen et al., “Is adversarial training really a silver bullet for mitigating data poisoning?” in ICLR, 2023. [Online]. Available: https://openreview.net/forum?id=zKvm1ETDOq
- Y. Wen et al., “Fishing for user data in large-batch federated learning via gradient magnification,” in ICML, 2022.
- F. Wu et al., “Linkteller: Recovering private edges from graph neural networks via influence analysis,” in S&P, 2022.
- T. Wu et al., “Adversarial robustness under long-tailed distribution,” in CVPR, 2021, pp. 8659–8668.
- C. Xie et al., “Crfl: Certifiably robust federated learning against backdoor attacks,” in ICML, 2021, pp. 11 372–11 382.
- H. Xu et al., “To be robust or to be fair: Towards fairness in adversarial training,” in ICML, 2021, pp. 11 492–11 501. [Online]. Available: https://proceedings.mlr.press/v139/xu21b.html
- Y. Xu et al., “Exploring and exploiting decision boundary dynamics for adversarial robustness,” ArXiv, vol. abs/2302.03015, 2023.
- M. Yaghini et al., “Disparate vulnerability: on the unfairness of privacy attacks against machine learning,” in PETS, 2022.
- ——, “Learning with impartiality to walk on the pareto frontier of fairness, privacy, and utility,” arXiv preprint arXiv:2302.09183, 2023.
- Z. Yang et al., “Neural network inversion in adversarial setting via background knowledge alignment,” in CCS, 2019, p. 225–240. [Online]. Available: https://doi.org/10.1145/3319535.3354261
- ——, “Purifier: defending data inference attacks via transforming confidence scores,” in AAAI, 2023, pp. 10 871–10 879.
- ——, “Rethinking bias-variance trade-off for generalization of neural networks,” in ICML, 2020.
- D. Ye et al., “One parameter defense—defending against data inference attacks via differential privacy,” IEEE TIFS, vol. 17, pp. 1466–1480, 2022.
- J. Ye et al., “Enhanced membership inference attacks against machine learning models,” in CCS, 2022, pp. 3093–3106.
- S. Yeom et al., “Privacy risk in machine learning: Analyzing the connection to overfitting,” in CSF, 2018, pp. 268–282.
- H. Yin et al., “See through gradients: Image batch recovery via gradinversion,” in CVPR, 2021, pp. 16 337–16 346.
- M. B. Zafar et al., “Fairness Constraints: Mechanisms for Fair Classification,” in AISTATS, vol. 54, 2017, pp. 962–970. [Online]. Available: https://proceedings.mlr.press/v54/zafar17a.html
- R. Zhai et al., “Understanding why generalized reweighting does not improve over erm,” in ICLR, 2023.
- B. Zhang et al., “Privacy for all: Demystify vulnerability disparity of differential privacy against membership inference attack,” arXiv preprint arXiv:2001.08855, 2020.
- B. H. Zhang et al., “Mitigating unwanted biases with adversarial learning,” in AIES, 2018, p. 335–340.
- C. Zhang et al., “Understanding deep learning requires rethinking generalization,” in ICLR, 2017. [Online]. Available: https://openreview.net/forum?id=Sy8gdB9xx
- ——, “Counterfactual memorization in neural language models,” 2023.
- H. Zhang et al., “Data poisoning attacks against outcome interpretations of predictive models,” in KDD, 2021, p. 2165–2173. [Online]. Available: https://doi.org/10.1145/3447548.3467405
- ——, “Theoretically principled trade-off between robustness and accuracy,” in ICML, 2019, pp. 7472–7482. [Online]. Available: https://proceedings.mlr.press/v97/zhang19p.html
- J. Zhang et al., “Protecting intellectual property of deep neural networks with watermarking,” in AsiaCCS, 2018, p. 159–172. [Online]. Available: https://doi.org/10.1145/3196494.3196550
- ——, “Privacy leakage of adversarial training models in federated learning systems,” in CVPR Workshops, 2022, pp. 108–114.
- W. Zhang et al., “Leakage of dataset properties in Multi-Party machine learning,” in USENIX Security, 2021, pp. 2687–2704. [Online]. Available: https://www.usenix.org/conference/usenixsecurity21/presentation/zhang-wanrong
- ——, “Attribute privacy: Framework and mechanisms,” in FaccT, 2022, pp. 757–766.
- X. Zhang et al., “Interpretable deep learning under fire,” in USENIX Security, 2020.
- Y. Zhang et al., “The secret revealer: Generative model-inversion attacks against deep neural networks,” in CVPR, 2020.
- Z. Zhang et al., “Model inversion attacks against graph neural networks,” IEEE TKDE, 2022.
- X. Zhao et al., “Exploiting explanations for model inversion attacks,” in CVPR, 2021, pp. 682–692.
- Y. Zheng et al., “A dnn fingerprint for non-repudiable model ownership identification and piracy detection,” IEEE TIFS, vol. 17, pp. 2977–2989, 2022.
- L. Zhu et al., “Deep leakage from gradients,” in NeurIPS, 2019. [Online]. Available: https://proceedings.neurips.cc/paper/2019/file/60a6c4002cc7b29142def8871531281a-Paper.pdf
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.