Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Low-Rank Knowledge Decomposition for Medical Foundation Models (2404.17184v1)

Published 26 Apr 2024 in cs.CV

Abstract: The popularity of large-scale pre-training has promoted the development of medical foundation models. However, some studies have shown that although foundation models exhibit strong general feature extraction capabilities, their performance on specific tasks is still inferior to task-specific methods. In this paper, we explore a new perspective called ``Knowledge Decomposition'' to improve the performance on specific medical tasks, which deconstruct the foundation model into multiple lightweight expert models, each dedicated to a particular task, with the goal of improving specialization while concurrently mitigating resource expenditure. To accomplish the above objective, we design a novel framework named Low-Rank Knowledge Decomposition (LoRKD), which explicitly separates graidents by incorporating low-rank expert modules and the efficient knowledge separation convolution. Extensive experimental results demonstrate that the decomposed models perform well in terms of performance and transferability, even surpassing the original foundation models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Alzheimer’s dataset, Kaggle dataset. https://www.kaggle.com/datasets/tourist55/alzheimers-dataset-4-class-of-images.
  2. Algerian ultrasound images thyroid dataset: Auitd,Kaggle dataset. https://www.kaggle.com/datasets/azouzmaroua/algeria-ultrasound-images-thyroid-dataset-auitd.
  3. Variational information distillation for knowledge transfer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9163–9171, 2019.
  4. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  5. Understanding disentangling in backslash beta-vae. arXiv preprint arXiv:1804.03599, 2018.
  6. R Caruana. Multitask learning: A knowledge-based source of inductive bias1. In Proceedings of the Tenth International Conference on Machine Learning, pages 41–48. Citeseer, 1993.
  7. Rich Caruana. Multitask learning. Machine learning, 28:41–75, 1997.
  8. Isolating sources of disentanglement in variational autoencoders. Advances in neural information processing systems, 31, 2018.
  9. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. Advances in neural information processing systems, 29, 2016.
  10. Michael Crawshaw. Multi-task learning with deep neural networks: A survey. arXiv preprint arXiv:2009.09796, 2020.
  11. Jean-Antoine Désidéri. Multiple-gradient descent algorithm (mgda) for multiobjective optimization. Comptes Rendus Mathematique, 350(5-6):313–318, 2012.
  12. Seed: Self-supervised distillation for visual representation. arXiv preprint arXiv:2101.04731, 2021.
  13. Mitigating gradient bias in multi-objective learning: A provably convergent approach. In The Eleventh International Conference on Learning Representations, 2022.
  14. Born again neural networks. In International Conference on Machine Learning, pages 1607–1616. PMLR, 2018.
  15. Risk of bias in chest radiography deep learning foundation models. Radiology: Artificial Intelligence, 5(6):e230060, 2023.
  16. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  17. beta-vae: Learning basic visual concepts with a constrained variational framework. In International conference on learning representations, 2016.
  18. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  19. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670, 2018.
  20. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  21. Segment anything model for medical images? arXiv preprint arXiv:2304.14660, 2023.
  22. Disentangling by factorising. In International Conference on Machine Learning, pages 2649–2658. PMLR, 2018.
  23. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
  24. Chestx-det10: Chest x-ray dataset on detection of thoracic abnormalities, 2020.
  25. End-to-end multi-task learning with attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1871–1880, 2019.
  26. Multi-task adversarial network for disentangled feature learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3743–3751, 2018.
  27. Greedy infomax for self-supervised representation learning. 2019.
  28. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European conference on computer vision (ECCV), pages 116–131, 2018.
  29. Disentangling factors of variation in deep representation using adversarial training. Advances in neural information processing systems, 29, 2016.
  30. Radimagenet: an open radiologic deep learning research dataset for effective transfer learning. Radiology: Artificial Intelligence, 4(5):e210315, 2022.
  31. Improved knowledge distillation via teacher assistant. In Proceedings of the AAAI conference on artificial intelligence, pages 5191–5198, 2020.
  32. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  33. Relational knowledge distillation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3967–3976, 2019.
  34. Mura: Large dataset for abnormality detection in musculoskeletal radiographs. arXiv preprint arXiv:1712.06957, 2017.
  35. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014.
  36. Sebastian Ruder. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098, 2017.
  37. Brain tumor classification using deep learning. In 2020 International Conference on Assistive and Rehabilitation Technologies (iCareTech), pages 131–136, 2020.
  38. Simplified transfer learning for chest radiography models using less data. Radiology, 305(2):454–465, 2022.
  39. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
  40. Multi-task learning as multi-objective optimization. Advances in neural information processing systems, 31, 2018.
  41. Independent component alignment for multi-task learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20083–20093, 2023.
  42. Contrastive representation distillation. arXiv preprint arXiv:1910.10699, 2019.
  43. Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR, 2021.
  44. Disentangled representation learning gan for pose-invariant face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1415–1424, 2017.
  45. The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific data, 5(1):1–9, 2018.
  46. Similarity-preserving knowledge distillation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1365–1374, 2019.
  47. Dylora: Parameter efficient tuning of pre-trained models using dynamic search-free low-rank adaptation. arXiv preprint arXiv:2210.07558, 2022.
  48. Batched low-rank adaptation of foundation models. arXiv preprint arXiv:2312.05677, 2023.
  49. Can gpt-4v (ision) serve medical applications? case studies on gpt-4v for multimodal medical diagnosis. arXiv preprint arXiv:2310.09909, 2023.
  50. Covid-ct-dataset: a ct image dataset about covid-19. arXiv preprint arXiv:2003.13865, 2020.
  51. Medmnist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification. Scientific Data, 10(1):41, 2023a.
  52. Factorizing knowledge in neural networks. In European Conference on Computer Vision, pages 73–91. Springer, 2022.
  53. From knowledge distillation to self-knowledge distillation: A unified approach with normalized loss and customized soft labels. arXiv preprint arXiv:2303.13005, 2023b.
  54. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928, 2016.
  55. Lora-fa: Memory-efficient low-rank adaptation for large language models fine-tuning. arXiv preprint arXiv:2308.03303, 2023.
  56. A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering, 34(12):5586–5609, 2021.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com