Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Concept Removal (2310.05755v1)

Published 9 Oct 2023 in cs.LG

Abstract: We address the problem of concept removal in deep neural networks, aiming to learn representations that do not encode certain specified concepts (e.g., gender etc.) We propose a novel method based on adversarial linear classifiers trained on a concept dataset, which helps to remove the targeted attribute while maintaining model performance. Our approach Deep Concept Removal incorporates adversarial probing classifiers at various layers of the network, effectively addressing concept entanglement and improving out-of-distribution generalization. We also introduce an implicit gradient-based technique to tackle the challenges associated with adversarial training using linear classifiers. We evaluate the ability to remove a concept on a set of popular distributionally robust optimization (DRO) benchmarks with spurious correlations, as well as out-of-distribution (OOD) generalization tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. One-network adversarial fairness. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp.  2412–2420, 2019.
  2. An ensemble of simple convolutional neural network models for mnist digit recognition. arXiv preprint arXiv:2008.10400, 2020.
  3. Towards principled methods for training generative adversarial networks. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=Hk4_qw5xe.
  4. Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
  5. Network dissection: Quantifying interpretability of deep visual representations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  6541–6549, 2017.
  6. Recognition in terra incognita. In Proceedings of the European conference on computer vision (ECCV), pp.  456–473, 2018.
  7. Coresets via bilevel optimization for continual learning and streaming. Advances in Neural Information Processing Systems, 33:14879–14890, 2020.
  8. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pp. 1597–1607. PMLR, 2020a.
  9. Concept whitening for interpretable image recognition. Nature Machine Intelligence, 2(12):772–782, 2020b.
  10. Jonathan Crabbé and Mihaela van der Schaar. Concept activation regions: A generalized framework for concept-based explanations. arXiv preprint arXiv:2209.11222, 2022.
  11. Adversarial removal of demographic attributes from text data. arXiv preprint arXiv:1808.06640, 2018.
  12. Learning fair representations via an adversarial framework. arXiv preprint arXiv:1904.13341, 2019.
  13. Unsupervised domain adaptation by backpropagation. In International conference on machine learning, pp. 1180–1189. PMLR, 2015.
  14. Ian Goodfellow. NIPS 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160, 2016.
  15. Google Keynote. https://www.youtube.com/watch?v=lyRPyRKHO8M, 2019.
  16. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  17. Simple data balancing achieves competitive worst-group-accuracy. In Conference on Causal Learning and Reasoning, pp. 336–351. PMLR, 2022.
  18. On large-batch training for deep learning: Generalization gap and sharp minima. In International Conference on Learning Representations, 2016.
  19. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pp. 2668–2677. PMLR, 2018.
  20. Just train twice: Improving group robustness without training group information. In International Conference on Machine Learning, pp. 6781–6792. PMLR, 2021.
  21. Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision, pp.  3730–3738, 2015.
  22. Optimizing millions of hyperparameters by implicit differentiation. In International Conference on Artificial Intelligence and Statistics, pp.  1540–1552. PMLR, 2020.
  23. The variational fair autoencoder. arXiv preprint arXiv:1511.00830, 2015.
  24. Learning adversarially fair and transferable representations. In International Conference on Machine Learning, pp. 3384–3393. PMLR, 2018.
  25. A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR), 54(6):1–35, 2021.
  26. Invariant representations without adversarial training. Advances in Neural Information Processing Systems, 31, 2018.
  27. Fair contrastive learning for facial attribute classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10389–10398, 2022.
  28. Meta-learning with implicit gradients. Advances in neural information processing systems, 32, 2019.
  29. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint arXiv:1911.08731, 2019.
  30. Achieving causal fairness through generative adversarial networks. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019.
  31. Adversarial invariant learning. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  12441–12449. IEEE, 2021.
  32. Energy-based generative adversarial networks. In 5th International Conference on Learning Representations, ICLR 2017, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yegor Klochkov (13 papers)
  2. Jean-Francois Ton (25 papers)
  3. Ruocheng Guo (62 papers)
  4. Yang Liu (2253 papers)
  5. Hang Li (277 papers)

Summary

We haven't generated a summary for this paper yet.