Model Selection with Model Zoo via Graph Learning (2404.03988v1)
Abstract: Pre-trained deep learning (DL) models are increasingly accessible in public repositories, i.e., model zoos. Given a new prediction task, finding the best model to fine-tune can be computationally intensive and costly, especially when the number of pre-trained models is large. Selecting the right pre-trained models is crucial, yet complicated by the diversity of models from various model families (like ResNet, Vit, Swin) and the hidden relationships between models and datasets. Existing methods, which utilize basic information from models and datasets to compute scores indicating model performance on target datasets, overlook the intrinsic relationships, limiting their effectiveness in model selection. In this study, we introduce TransferGraph, a novel framework that reformulates model selection as a graph learning problem. TransferGraph constructs a graph using extensive metadata extracted from models and datasets, while capturing their inherent relationships. Through comprehensive experiments across 16 real datasets, both images and texts, we demonstrate TransferGraph's effectiveness in capturing essential model-dataset relationships, yielding up to a 32% improvement in correlation between predicted performance and the actual fine-tuning results compared to the state-of-the-art methods.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, ISSN: 1063-6919.
- A. Deshpande, A. Achille, A. Ravichandran, H. Li, L. Zancato, C. Fowlkes, R. Bhotika, S. Soatto, and P. Perona, “A linearized framework and a new benchmark for model selection for fine-tuning,” arXiv preprint arXiv:2102.00084, 2021.
- J. Krause, M. Stark, J. Deng, and L. Fei-Fei, “3d object representations for fine-grained categorization,” in 2013 IEEE International Conference on Computer Vision Workshops, pp. 554–561.
- K. You, Y. Liu, J. Wang, and M. Long, “LogME: Practical assessment of pre-trained models for transfer learning,” in Proceedings of the 38th International Conference on Machine Learning. PMLR, pp. 12 133–12 143, ISSN: 2640-3498.
- A. T. Tran, C. V. Nguyen, and T. Hassner, “Transferability and hardness of supervised classification tasks,” pp. 1395–1405.
- C. Nguyen, T. Hassner, M. Seeger, and C. Archambeau, “LEEP: A new measure to evaluate transferability of learned representations,” in Proceedings of the 37th International Conference on Machine Learning. PMLR, pp. 7294–7305, ISSN: 2640-3498.
- D. Bolya, R. Mittapalli, and J. Hoffman, “Scalable diverse model selection for accessible transfer learning,” in Advances in Neural Information Processing Systems, vol. 34. Curran Associates, Inc., pp. 19 301–19 312.
- L.-K. Huang, J. Huang, Y. Rong, Q. Yang, and Y. Wei, “Frustratingly easy transferability estimation,” in Proceedings of the 39th International Conference on Machine Learning. PMLR, pp. 9201–9225, ISSN: 2640-3498.
- J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep neural networks?” Advances in neural information processing systems, vol. 27, 2014.
- H. Li, C. Fowlkes, H. Yang, O. Dabeer, Z. Tu, and S. Soatto, “Guided recommendation for model fine-tuning,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 3633–3642.
- F. Nargesian, E. Zhu, R. J. Miller, K. Q. Pu, and P. C. Arocena, “Data lake management: challenges and opportunities,” Proceedings of the VLDB Endowment, vol. 12, no. 12, pp. 1986–1989, 2019.
- I. G. Terrizzano, P. M. Schwarz, M. Roth, and J. E. Colino, “Data wrangling: The challenging yourney from the wild to the lake.” in CIDR. Asilomar, 2015.
- R. Hai, C. Koutras, C. Quix, and M. Jarke, “Data lakes: A survey of functions and systems,” IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 12, pp. 12 571–12 590, 2023.
- R. C. Fernandez, Z. Abedjan, F. Koko, G. Yuan, S. Madden, and M. Stonebraker, “Aurum: A Data Discovery System,” in ICDE, 2018, pp. 1001–1012.
- Y. Zhang and Z. G. Ives, “Finding Related Tables in Data Lakes for Interactive Data Science,” in SIGMOD, 2020, pp. 1951–1966.
- F. Nargesian, K. Q. Pu, E. Zhu, B. Ghadiri Bashardoost, and R. J. Miller, “Organizing Data Lakes for Navigation,” in SIGMOD, 2020, pp. 1939–1950.
- C. Renggli, X. Yao, L. Kolar, L. Rimanic, A. Klimovic, and C. Zhang, “SHiFT: an efficient, flexible search engine for transfer learning,” vol. 16, no. 2, pp. 304–316.
- Z. Wang, Z. Dai, B. Poczos, and J. Carbonell, “Characterizing and avoiding negative transfer,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 11 285–11 294.
- Y. Cui, Y. Song, C. Sun, A. Howard, and S. Belongie, “Large scale fine-grained categorization and domain-specific transfer learning,” pp. 4109–4118.
- A. Achille, M. Lam, R. Tewari, A. Ravichandran, S. Maji, C. C. Fowlkes, S. Soatto, and P. Perona, “Task2vec: Task embedding for meta-learning,” pp. 6430–6439.
- A. Deshpande, A. Achille, A. Ravichandran, H. Li, L. Zancato, C. C. Fowlkes, R. Bhotika, S. Soatto, and P. Perona, “A linearized framework and a new benchmark for model selection for fine-tuning.”
- K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14. Springer, 2016, pp. 630–645.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
- Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” vol. 86, no. 11, pp. 2278–2324, conference Name: Proceedings of the IEEE.
- A. R. Zamir, A. Sax, W. Shen, L. Guibas, J. Malik, and S. Savarese, “Taskonomy: Disentangling task transfer learning,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp. 3712–3722.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations, 2015.
- S. Black, G. Leo, P. Wang, C. Leahy, and S. Biderman, “GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow,” Mar. 2021, If you use this software, please cite it using these metadata.
- J. Song, Y. Chen, X. Wang, C. Shen, and M. Song, “Deep model transferability from attribution maps,” in Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc.
- W. L. Hamilton, R. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, ser. NIPS’17. Curran Associates Inc., pp. 1025–1035.
- P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio, “Graph attention networks.”
- A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’16. Association for Computing Machinery, pp. 855–864.
- R. Liu, M. Hirn, and A. Krishnan, “Accurately modeling biased random walks on weighted networks using node2vec+,” vol. 39, no. 1, p. btad047, publisher: Oxford University Press.
- T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794.
- L. Fei-Fei, R. Fergus, and P. Perona, “Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories,” in 2004 Conference on Computer Vision and Pattern Recognition Workshop, pp. 178–178.
- A. Krizhevsky, “Learning multiple layers of features from tiny images.”
- M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi, “Describing textures in the wild,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp. 3606–3613.
- M.-E. Nilsback and A. Zisserman, “Automated flower classification over a large number of classes,” in 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing. IEEE, pp. 722–729.
- H. Zhang, S. Zhou, G. Y. Li, and N. Xiu, “0/1 deep neural networks via block coordinate descent,” CoRR, vol. abs/2206.09379, 2022.
- Y. LeCun, Fu Jie Huang, and L. Bottou, “Learning methods for generic object recognition with invariance to pose and lighting,” in Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., vol. 2. IEEE, pp. 97–104.
- Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Ng, “Reading digits in natural images with unsupervised feature learning.”
- A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. Bowman, “GLUE: A multi-task benchmark and analysis platform for natural language understanding,” in Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, T. Linzen, G. Chrupa{\textbackslash}la, and A. Alishahi, Eds. Association for Computational Linguistics, pp. 353–355.
- F. Barbieri, J. Camacho-Collados, L. Espinosa Anke, and L. Neves, “TweetEval: Unified benchmark and comparative evaluation for tweet classification,” in Findings of the Association for Computational Linguistics: EMNLP 2020, T. Cohn, Y. He, and Y. Liu, Eds. Association for Computational Linguistics, pp. 1644–1650.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, pp. 9992–10 002.
- Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A ConvNet for the 2020s,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 11 966–11 976.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), J. Burstein, C. Doran, and T. Solorio, Eds. Association for Computational Linguistics, pp. 4171–4186.
- J. Lee-Thorp, J. Ainslie, I. Eckstein, and S. Ontanon, “FNet: Mixing tokens with fourier transforms,” in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, M. Carpuat, M.-C. de Marneffe, and I. V. Meza Ruiz, Eds. Association for Computational Linguistics, pp. 4296–4313.
- K. Clark, M.-T. Luong, and Q. V. Le, “ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS.”
- L. N. Smith, “Cyclical learning rates for training neural networks,” in 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, pp. 464–472.
- I. Loshchilov and F. Hutter, “Decoupled weight decay regularization.”
- E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models.”
- J. You, T. Du, and J. Leskovec, “Roland: graph learning framework for dynamic graphs,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 2358–2366.
- S. Piaggesi, M. Khosla, A. Panisson, and A. Anand, “Dine: Dimensional interpretability of node embeddings,” arXiv preprint arXiv:2310.01162, 2023.
- H. Yuan, H. Yu, S. Gui, and S. Ji, “Explainability in graph neural networks: A taxonomic survey,” IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 5, pp. 5782–5799, 2022.
- C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, “A survey on deep transfer learning,” in Artificial Neural Networks and Machine Learning – ICANN 2018, ser. Lecture Notes in Computer Science, V. Kůrková, Y. Manolopoulos, B. Hammer, L. Iliadis, and I. Maglogiannis, Eds. Springer International Publishing, pp. 270–279.
- F. Xia, K. Sun, S. Yu, A. Aziz, L. Wan, S. Pan, and H. Liu, “Graph learning: A survey,” vol. 2, no. 2, pp. 109–127.
- S. Wang, L. Hu, Y. Wang, X. He, Q. Z. Sheng, M. A. Orgun, L. Cao, F. Ricci, and S. Y. Philip, “Graph learning based recommender systems: A review,” in IJCAI International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence, 2021, pp. 4644–4652.
- T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” in 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2013.
- T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks.”
- Ziyu Li (34 papers)
- Hilco van der Wilk (1 paper)
- Danning Zhan (1 paper)
- Megha Khosla (35 papers)
- Alessandro Bozzon (15 papers)
- Rihan Hai (17 papers)