Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 99 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 36 tok/s
GPT-5 High 40 tok/s Pro
GPT-4o 99 tok/s
GPT OSS 120B 461 tok/s Pro
Kimi K2 191 tok/s Pro
2000 character limit reached

Tangent Transformers for Composition, Privacy and Removal (2307.08122v3)

Published 16 Jul 2023 in cs.LG

Abstract: We introduce Tangent Attention Fine-Tuning (TAFT), a method for fine-tuning linearized transformers obtained by computing a First-order Taylor Expansion around a pre-trained initialization. We show that the Jacobian-Vector Product resulting from linearization can be computed efficiently in a single forward pass, reducing training and inference cost to the same order of magnitude as its original non-linear counterpart, while using the same number of parameters. Furthermore, we show that, when applied to various downstream visual classification tasks, the resulting Tangent Transformer fine-tuned with TAFT can perform comparably with fine-tuning the original non-linear network. Since Tangent Transformers are linear with respect to the new set of weights, and the resulting fine-tuning loss is convex, we show that TAFT enjoys several advantages compared to non-linear fine-tuning when it comes to model composition, parallel training, machine unlearning, and differential privacy. Our code is available at: https://github.com/tianyu139/tangent-model-composition

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp.  308–318, 2016.
  2. Lqf: Linear quadratic fine-tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  15729–15739, 2021.
  3. Ai model disgorgement: Methods and choices. arXiv preprint arXiv:2304.03545, 2023.
  4. Private empirical risk minimization: Efficient algorithms and tight error bounds. In 2014 IEEE 55th annual symposium on foundations of computer science, pp.  464–473. IEEE, 2014.
  5. Differentially private stochastic optimization: New results in convex and non-convex settings. Advances in Neural Information Processing Systems, 34:9317–9329, 2021.
  6. Machine unlearning. In 2021 IEEE Symposium on Security and Privacy (SP), pp. 141–159. IEEE, 2021.
  7. \\\backslash\a-la-carte prompt tuning (apt): Combining distinct data via composable prompting. arXiv preprint arXiv:2302.07994, 2023.
  8. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  9. Differentially private optimization on large model at small cost. arXiv preprint arXiv:2210.00038, 2022a.
  10. Differentially private bias-term only fine-tuning of foundation models. arXiv preprint arXiv:2210.00036, 2022b.
  11. Fusing finetuned models for better pretraining. arXiv preprint arXiv:2204.03044, 2022.
  12. Describing textures in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  3606–3613, 2014.
  13. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  14. Safe: Machine unlearning with shard graphs. arXiv preprint arXiv:2304.13169, 2023.
  15. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
  16. Improved convergence of differential private sgd with gradient clipping. In The Eleventh International Conference on Learning Representations, 2023.
  17. Eternal sunshine of the spotless net: Selective forgetting in deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  9304–9312, 2020a.
  18. Forgetting outside the box: Scrubbing deep networks of information accessible from input-output observations. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIX 16, pp.  383–398. Springer, 2020b.
  19. Mixed-privacy forgetting in deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  792–801, 2021.
  20. Mixed differential privacy in computer vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  8376–8386, 2022.
  21. Time matters in regularizing deep networks: Weight decay and data augmentation affect early learning dynamics, matter little near convergence. Advances in Neural Information Processing Systems, 32, 2019.
  22. Caltech-256 object category dataset. 2007.
  23. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pp. 2790–2799. PMLR, 2019.
  24. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  25. Evaluation of neural architectures trained with square loss vs cross-entropy in classification tasks. arXiv preprint arXiv:2006.07322, 2020.
  26. Neural tangent kernel: Convergence and generalization in neural networks. Advances in neural information processing systems, 31, 2018.
  27. Novel dataset for fine-grained image categorization: Stanford dogs. In Proc. CVPR workshop on fine-grained visual categorization (FGVC), volume 2. Citeseer, 2011.
  28. No matter how you slice it: Machine unlearning with sisa comes at the expense of minority classes. In First IEEE Conference on Secure and Trustworthy Machine Learning.
  29. 3d object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops, pp.  554–561, 2013.
  30. Rethinking the hyperparameters for fine-tuning. arXiv preprint arXiv:2002.11770, 2020.
  31. Tangent model composition for ensembling and continual fine-tuning. arXiv preprint arXiv:2307.08114, 2023.
  32. Integral continual learning along the tangent vector field of tasks. arXiv preprint arXiv:2211.13108, 2022.
  33. Fine-grained visual classification of aircraft. Technical report, 2013.
  34. Gradients as features for deep representation learning. arXiv preprint arXiv:2004.05529, 2020.
  35. Cats and dogs. In 2012 IEEE conference on computer vision and pattern recognition, pp.  3498–3505. IEEE, 2012.
  36. Recognizing indoor scenes. In 2009 IEEE conference on computer vision and pattern recognition, pp.  413–420. IEEE, 2009.
  37. A stochastic approximation method. The annals of mathematical statistics, pp.  400–407, 1951.
  38. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint arXiv:1312.6120, 2013.
  39. Caltech ucsd birds-200-2011. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.
  40. Differentially private empirical risk minimization with non-convex loss functions. In International Conference on Machine Learning, pp. 6526–6535. PMLR, 2019.
  41. Differentially private sgd with non-smooth losses. Applied and Computational Harmonic Analysis, 56:306–336, 2022a.
  42. Dualprompt: Complementary prompting for rehearsal-free continual learning. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVI, pp.  631–648. Springer, 2022b.
  43. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  139–149, 2022c.
  44. lo-fi: distributed fine-tuning without communication. arXiv preprint arXiv:2210.11948, 2022a.
  45. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In International Conference on Machine Learning, pp. 23965–23998. PMLR, 2022b.
  46. Normalized/clipped sgd with perturbation for differentially private non-convex optimization. arXiv preprint arXiv:2206.13033, 2022.
  47. Differentially private fine-tuning of language models. arXiv preprint arXiv:2110.06500, 2021.
  48. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199, 2021.
  49. A survey on negative transfer. IEEE/CAA Journal of Automatica Sinica, 2022.
Citations (6)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets