Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective
Abstract: Pre-training is known to generate universal representations for downstream tasks in large-scale deep learning such as LLMs. Existing literature, e.g., \cite{kim2020adversarial}, empirically observe that the downstream tasks can inherit the adversarial robustness of the pre-trained model. We provide theoretical justifications for this robustness inheritance phenomenon. Our theoretical results reveal that feature purification plays an important role in connecting the adversarial robustness of the pre-trained model and the downstream tasks in two-layer neural networks. Specifically, we show that (i) with adversarial training, each hidden node tends to pick only one (or a few) feature; (ii) without adversarial training, the hidden nodes can be vulnerable to attacks. This observation is valid for both supervised pre-training and contrastive learning. With purified nodes, it turns out that clean training is enough to achieve adversarial robustness in downstream tasks.
- Alayrac, J.-B., Uesato, J., Huang, P.-S., Fawzi, A., Stanforth, R., and Kohli, P. (2019), “Are labels required for improving adversarial robustness?” Advances in Neural Information Processing Systems, 32.
- Allen-Zhu, Z. and Li, Y. (2020), “Feature Purification: How Adversarial Training Performs Robust Deep Learning,” arXiv preprint arXiv:2005.10190.
- Arora, S., Khandeparkar, H., Khodak, M., Plevrakis, O., and Saunshi, N. (2019), “A theoretical analysis of contrastive unsupervised representation learning,” arXiv preprint arXiv:1902.09229.
- Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., et al. (2021), “On the opportunities and risks of foundation models,” arXiv preprint arXiv:2108.07258.
- Cai, Q.-Z., Du, M., Liu, C., and Song, D. (2018), “Curriculum adversarial training,” arXiv preprint arXiv:1805.04807.
- Carmon, Y., Raghunathan, A., Schmidt, L., Duchi, J. C., and Liang, P. S. (2019), “Unlabeled data improves adversarial robustness,” in Advances in Neural Information Processing Systems, pp. 11192–11203.
- Cemgil, T., Ghaisas, S., Dvijotham, K. D., and Kohli, P. (2019), “Adversarially robust representations with smooth encoders,” in International Conference on Learning Representations.
- Chen, D., Hu, H., Wang, Q., Yinli, L., Wang, C., Shen, C., and Li, Q. (2021), “CARTL: Cooperative Adversarially-Robust Transfer Learning,” in International Conference on Machine Learning, PMLR, pp. 1640–1650.
- Chen, L., Min, Y., Zhang, M., and Karbasi, A. (2020a), “More data can expand the generalization gap between adversarially robust and standard models,” in International Conference on Machine Learning, PMLR, pp. 1670–1680.
- Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. (2020b), “A simple framework for contrastive learning of visual representations,” in International conference on machine learning, PMLR, pp. 1597–1607.
- Chuang, C.-Y., Robinson, J., Yen-Chen, L., Torralba, A., and Jegelka, S. (2020), “Debiased contrastive learning,” arXiv preprint arXiv:2007.00224.
- Dai, B. and Lin, D. (2017), “Contrastive learning for image captioning,” arXiv preprint arXiv:1710.02534.
- Dan, C., Wei, Y., and Ravikumar, P. (2020), “Sharp Statistical Guaratees for Adversarially Robust Gaussian Classification,” in International Conference on Machine Learning, PMLR, pp. 2345–2355.
- Deng, Z., Zhang, L., Ghorbani, A., and Zou, J. (2021a), “Improving adversarial robustness via unlabeled out-of-domain data,” in International Conference on Artificial Intelligence and Statistics, PMLR, pp. 2845–2853.
- Deng, Z., Zhang, L., Vodrahalli, K., Kawaguchi, K., and Zou, J. (2021b), “Adversarial Training Helps Transfer Learning via Better Representations,” arXiv preprint arXiv:2106.10189.
- Fan, L., Liu, S., Chen, P.-Y., Zhang, G., and Gan, C. (2021), “When Does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning?” Advances in Neural Information Processing Systems, 34, 21480–21492.
- Goodfellow, I. J., Shlens, J., and Szegedy, C. (2015), “Explaining and Harnessing Adversarial Examples,” in 3rd International Conference on Learning Representations.
- Gowal, S., Rebuffi, S.-A., Wiles, O., Stimberg, F., Calian, D. A., and Mann, T. A. (2021), “Improving Robustness using Generated Data,” Advances in Neural Information Processing Systems, 34.
- Grigor’eva, M. and Popov, S. (2012), “An upper bound for the absolute constant in the nonuniform version of the Berry-Esseen inequalities for nonidentically distributed summands,” in Doklady Mathematics, Springer, vol. 86, pp. 524–526.
- HaoChen, J. Z. and Ma, T. (2022), “A Theoretical Study of Inductive Biases in Contrastive Learning,” arXiv preprint arXiv:2211.14699.
- HaoChen, J. Z., Wei, C., Kumar, A., and Ma, T. (2022), “Beyond separability: Analyzing the linear transferability of contrastive representations to related subpopulations,” arXiv preprint arXiv:2204.02683.
- He, K., Zhang, X., Ren, S., and Sun, J. (2016), “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778.
- Hendrycks, D., Lee, K., and Mazeika, M. (2019), “Using pre-training can improve model robustness and uncertainty,” arXiv preprint arXiv:1901.09960.
- Ho, C.-H. and Nvasconcelos, N. (2020), “Contrastive learning with adversarial examples,” Advances in Neural Information Processing Systems, 33, 17081–17093.
- Hyvarinen, A., Oja, E., Hoyer, P., and Hurri, J. (1998), “Image feature extraction by sparse coding and independent component analysis,” in Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No. 98EX170), IEEE, vol. 2, pp. 1268–1273.
- Javanmard, A. and Mehrabi, M. (2021), “Adversarial robustness for latent models: Revisiting the robust-standard accuracies tradeoff,” arXiv preprint arXiv:2110.11950.
- Javanmard, A. and Soltanolkotabi, M. (2022), “Precise statistical analysis of classification accuracies for adversarial training,” The Annals of Statistics, 50, 2127–2156.
- Javanmard, A., Soltanolkotabi, M., and Hassani, H. (2020), “Precise tradeoffs in adversarial training for linear regression,” in Conference on Learning Theory, PMLR, pp. 2034–2078.
- Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., Maschinot, A., Liu, C., and Krishnan, D. (2020), “Supervised contrastive learning,” arXiv preprint arXiv:2004.11362.
- Kim, M., Tack, J., and Hwang, S. J. (2020), “Adversarial Self-Supervised Contrastive Learning,” in Advances in Neural Information Processing Systems.
- Li, J., Zhou, P., Xiong, C., and Hoi, S. C. (2020), “Prototypical contrastive learning of unsupervised representations,” arXiv preprint arXiv:2005.04966.
- Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013), “Distributed representations of words and phrases and their compositionality,” in Advances in neural information processing systems, pp. 3111–3119.
- Min, Y., Chen, L., and Karbasi, A. (2020), “The curious case of adversarially robust models: More data can help, double descend, or hurt generalization,” arXiv preprint arXiv:2002.11080.
- Mo, Y., Wu, D., Wang, Y., Guo, Y., and Wang, Y. (2022), “When Adversarial Training Meets Vision Transformers: Recipes from Training to Architecture,” arXiv preprint arXiv:2210.07540.
- Najafi, A., Maeda, S.-i., Koyama, M., and Miyato, T. (2019), “Robustness to adversarial perturbations in learning from incomplete data,” in Advances in Neural Information Processing Systems, pp. 5542–5552.
- Nguyen, A. T., Lim, S. N., and Torr, P. (2022), “Task-Agnostic Robust Representation Learning,” arXiv preprint arXiv:2203.07596.
- Oord, A. v. d., Li, Y., and Vinyals, O. (2018), “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748.
- Petrov, A. and Kwiatkowska, M. (2022), “Robustness of Unsupervised Representation Learning without Labels,” arXiv preprint arXiv:2210.04076.
- Raghunathan, A., Xie, S. M., Yang, F., Duchi, J. C., and Liang, P. (2019), “Adversarial training can hurt generalization,” arXiv preprint arXiv:1906.06032.
- Rice, L., Wong, E., and Kolter, J. Z. (2020), “Overfitting in adversarially robust deep learning,” arXiv preprint arXiv:2002.11569.
- Salman, H., Ilyas, A., Engstrom, L., Kapoor, A., and Madry, A. (2020), “Do adversarially robust imagenet models transfer better?” arXiv preprint arXiv:2007.08489.
- Saunshi, N., Ash, J., Goel, S., Misra, D., Zhang, C., Arora, S., Kakade, S., and Krishnamurthy, A. (2022), “Understanding contrastive learning requires incorporating inductive biases,” arXiv preprint arXiv:2202.14037.
- Saunshi, N., Plevrakis, O., Arora, S., Khodak, M., and Khandeparkar, H. (2019), “A theoretical analysis of contrastive unsupervised representation learning,” in International Conference on Machine Learning, PMLR, pp. 5628–5637.
- Shafahi, A., Saadatpanah, P., Zhu, C., Ghiasi, A., Studer, C., Jacobs, D., and Goldstein, T. (2019), “Adversarially robust transfer learning,” arXiv preprint arXiv:1905.08232.
- Shen, K., Jones, R. M., Kumar, A., Xie, S. M., HaoChen, J. Z., Ma, T., and Liang, P. (2022), “Connect, not collapse: Explaining contrastive learning for unsupervised domain adaptation,” in International Conference on Machine Learning, PMLR, pp. 19847–19878.
- Sinha, A., Namkoong, H., and Duchi, J. (2018), “Certifying some distributional robustness with principled adversarial training,” .
- Taheri, M., Xie, F., and Lederer, J. (2021), “Statistical guarantees for regularized neural networks,” Neural Networks, 142, 148–161.
- Tian, Y., Sun, C., Poole, B., Krishnan, D., Schmid, C., and Isola, P. (2020), “What makes for good views for contrastive learning?” arXiv preprint arXiv:2005.10243.
- Tosh, C., Krishnamurthy, A., and Hsu, D. (2021), “Contrastive learning, multi-view redundancy, and linear models,” in Algorithmic Learning Theory, PMLR, pp. 1179–1206.
- Wang, Q., Wang, Y., Zhu, H., and Wang, Y. (2022), “Improving Out-of-Distribution Generalization by Adversarial Training with Structured Priors,” arXiv preprint arXiv:2210.06807.
- Wang, Y., Ma, X., Bailey, J., Yi, J., Zhou, B., and Gu, Q. (2019a), “On the convergence and robustness of adversarial training,” in International Conference on Machine Learning, pp. 6586–6595.
- Wang, Y., Zou, D., Yi, J., Bailey, J., Ma, X., and Gu, Q. (2019b), “Improving adversarial robustness requires revisiting misclassified examples,” in International Conference on Learning Representations.
- Wen, Z. and Li, Y. (2021), “Toward understanding the feature learning process of self-supervised contrastive learning,” in International Conference on Machine Learning, PMLR, pp. 11112–11122.
- Wu, B., Chen, J., Cai, D., He, X., and Gu, Q. (2020a), “Does Network Width Really Help Adversarial Robustness?” arXiv preprint arXiv:2010.01279.
- Xiao, J., Fan, Y., Sun, R., and Luo, Z.-Q. (2021), “Adversarial Rademacher Complexity of Deep Neural Networks,” .
- Xiao, J., Fan, Y., Sun, R., Wang, J., and Luo, Z.-Q. (2022a), “Stability analysis and generalization bounds of adversarial training,” arXiv preprint arXiv:2210.00960.
- Xiao, J., Qin, Z., Fan, Y., Wu, B., Wang, J., and Luo, Z.-Q. (2022b), “Adaptive Smoothness-weighted Adversarial Training for Multiple Perturbations with Its Stability Analysis,” arXiv preprint arXiv:2210.00557.
- Xiao, T., Wang, X., Efros, A. A., and Darrell, T. (2020), “What should not be contrastive in contrastive learning,” arXiv preprint arXiv:2008.05659.
- Xing, Y., Song, Q., and Cheng, G. (2021a), “On the Algorithmic Stability of Adversarial Training,” Advances in Neural Information Processing Systems, 34.
- — (2021b), “On the generalization properties of adversarial training,” in International Conference on Artificial Intelligence and Statistics, PMLR, pp. 505–513.
- Yin, D., Ramchandran, K., and Bartlett, P. (2018), “Rademacher complexity for adversarially robust generalization,” arXiv preprint arXiv:1810.11914.
- Zhang, H., Yu, Y., Jiao, J., Xing, E. P., Ghaoui, L. E., and Jordan, M. I. (2019), “Theoretically Principled Trade-off between Robustness and Accuracy,” in Proceedings of the 36th International Conference on Machine Learning, PMLR, vol. 97 of Proceedings of Machine Learning Research, pp. 7472–7482.
- Zhang, J., Sang, J., Yi, Q., Yang, Y., Dong, H., and Yu, J. (2021), “Pre-training also Transfers Non-Robustness,” arXiv preprint arXiv:2106.10989.
- Zhang, J., Xu, X., Han, B., Niu, G., Cui, L., Sugiyama, M., and Kankanhalli, M. (2020a), “Attacks which do not kill training make adversarial learning stronger,” in International Conference on Machine Learning, PMLR, pp. 11278–11287.
- Zhang, Y., Plevrakis, O., Du, S. S., Li, X., Song, Z., and Arora, S. (2020b), “Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality,” arXiv preprint arXiv:2002.06668.
- Zhao, Y., Chen, J., and Du, S. S. (2022), “Blessing of Class Diversity in Pre-training,” arXiv preprint arXiv:2209.03447.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.