Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Transferable Adversarial Robustness for Categorical Data via Universal Robust Embeddings (2306.04064v2)

Published 6 Jun 2023 in cs.LG

Abstract: Research on adversarial robustness is primarily focused on image and text data. Yet, many scenarios in which lack of robustness can result in serious risks, such as fraud detection, medical diagnosis, or recommender systems often do not rely on images or text but instead on tabular data. Adversarial robustness in tabular data poses two serious challenges. First, tabular datasets often contain categorical features, and therefore cannot be tackled directly with existing optimization procedures. Second, in the tabular domain, algorithms that are not based on deep networks are widely used and offer great performance, but algorithms to enhance robustness are tailored to neural networks (e.g. adversarial training). In this paper, we tackle both challenges. We present a method that allows us to train adversarially robust deep networks for tabular data and to transfer this robustness to other classifiers via universal robust embeddings tailored to categorical data. These embeddings, created using a bilevel alternating minimization framework, can be transferred to boosted trees or random forests making them robust without the need for adversarial training while preserving their high accuracy on tabular data. We show that our methods outperform existing techniques within a practical threat model suitable for tabular data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Altman, E. Synthesizing credit card transactions. In Proceedings of the Second ACM International Conference on AI in Finance, pp.  1–9, 2021.
  2. Provably robust boosted decision stumps and trees against adversarial attacks. Advances in Neural Information Processing Systems, 32, 2019.
  3. ” real attackers don’t compute gradients”: Bridging the gap between adversarial ml research and practice. arXiv preprint arXiv:2212.14315, 2022.
  4. Tabnet: Attentive interpretable tabular learning. arxiv. arXiv preprint arXiv:2004.13912, 2019.
  5. Early layers are more important for adversarial robustness. In ICLR 2022 Workshop on New Frontiers in Adversarial Machine Learning, 2022.
  6. A method for finding projections onto the intersection of convex sets in hilbert spaces. In Dykstra, R., Robertson, T., and Wright, F. T. (eds.), Advances in Order Restricted Statistical Inference, pp.  28–47, New York, NY, 1986. Springer New York. ISBN 978-1-4613-9940-7.
  7. Treant: training evasion-aware decision trees. Data Mining and Knowledge Discovery, 34:1390–1420, 2020.
  8. Robust decision trees against adversarial examples. In International Conference on Machine Learning, pp. 1122–1131. PMLR, 2019.
  9. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp.  785–794, 2016.
  10. {{\{{Cost-Aware}}\}} robust tree ensembles for security applications. In 30th USENIX Security Symposium (USENIX Security 21), pp. 2291–2308, 2021.
  11. Robustbench: a standardized adversarial robustness benchmark. arXiv preprint arXiv:2010.09670, 2020.
  12. Towards robustness against natural language word substitutions. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=ks5nebunVn_.
  13. Approximation methods for bilevel programming. arXiv preprint arXiv:1802.02246, 2018.
  14. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
  15. Revisiting deep learning models for tabular data. Advances in Neural Information Processing Systems, 34:18932–18943, 2021.
  16. Why do tree-based models still outperform deep learning on tabular data? arXiv preprint arXiv:2207.08815, 2022.
  17. Machine learning security in industry: A quantitative survey. IEEE Trans. Inf. Forensics Secur., 18:1749–1762, 2023.
  18. Turning the Tables: Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation. Advances in Neural Information Processing Systems, 2022.
  19. Kaggle. IEEE-CIS fraud detection, 2019. URL https://www.kaggle.com/c/ieee-fraud-detection.
  20. Evasion and hardening of tree ensemble classifiers. In International conference on machine learning, pp. 2387–2396. PMLR, 2016.
  21. Lightgbm: A highly efficient gradient boosting decision tree. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017a. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf.
  22. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30, 2017b.
  23. Adversarial robustness for tabular data through cost and utility awareness. arXiv preprint arXiv:2208.13058, 2022.
  24. Wilds: A benchmark of in-the-wild distribution shifts 2021. arXiv preprint arXiv:2012.07421, 2020.
  25. Langley, P. Crafting papers on machine learning. In Langley, P. (ed.), Proceedings of the 17th International Conference on Machine Learning (ICML 2000), pp.  1207–1216, Stanford, CA, 2000. Morgan Kaufmann.
  26. Classification and regression by randomforest. R News, 2(3):18–22, 2002. URL https://CRAN.R-project.org/doc/Rnews/.
  27. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJzIBfZAb.
  28. Machine learning in medical applications: A review of state-of-the-art methods. Computers in Biology and Medicine, 145:105458, 2022. ISSN 0010-4825. doi: https://doi.org/10.1016/j.compbiomed.2022.105458. URL https://www.sciencedirect.com/science/article/pii/S0010482522002505.
  29. Adversarial learning with cost-sensitive classes. IEEE Transactions on Cybernetics, 2022.
  30. Machine learning-driven credit risk: a systemic review. Neural Computing and Applications, 34, 07 2022. doi: 10.1007/s00521-022-07472-2.
  31. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
  32. On lp-norm robustness of ensemble decision stumps and trees. In International Conference on Machine Learning, pp. 10104–10114. PMLR, 2020.
  33. Click-through rate prediction in online advertising: A literature review. Information Processing & Management, 59(2):102853, 2022. ISSN 0306-4573. doi: https://doi.org/10.1016/j.ipm.2021.102853. URL https://www.sciencedirect.com/science/article/pii/S0306457321003241.
  34. Robust textual embedding against word-level adversarial attacks. arXiv preprint arXiv:2202.13817, 2022.
  35. You only propagate once: Accelerating adversarial training via maximal principle. Advances in Neural Information Processing Systems, 32, 2019.
  36. Cost-sensitive robustness against adversarial examples. ICLR, 2019.
  37. A comprehensive survey on transfer learning. arxiv e-prints. arXiv preprint arXiv:1911.02685, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Klim Kireev (10 papers)
  2. Maksym Andriushchenko (33 papers)
  3. Carmela Troncoso (54 papers)
  4. Nicolas Flammarion (63 papers)

Summary

We haven't generated a summary for this paper yet.