Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Naming Practices of Pre-Trained Models in Hugging Face (2310.01642v2)

Published 2 Oct 2023 in cs.SE and cs.AI

Abstract: As innovation in deep learning continues, many engineers seek to adopt Pre-Trained Models (PTMs) as components in computer systems. Researchers publish PTMs, which engineers adapt for quality or performance prior to deployment. PTM authors should choose appropriate names for their PTMs, which would facilitate model discovery and reuse. However, prior research has reported that model names are not always well chosen - and are sometimes erroneous. The naming for PTM packages has not been systematically studied. In this paper, we frame and conduct the first empirical investigation of PTM naming practices in the Hugging Face PTM registry. We initiated our study with a survey of 108 Hugging Face users to understand the practices in PTM naming. From our survey analysis, we highlight discrepancies from traditional software package naming, and present findings on naming practices. Our findings indicate there is a great mismatch between engineers' preferences and practical practices of PTM naming. We also present practices on detecting naming anomalies and introduce a novel automated DNN ARchitecture Assessment technique (DARA), capable of detecting PTM naming anomalies. We envision future works on leveraging meta-features of PTMs to improve model reuse and trustworthiness.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. [n. d.]. Keras: the Python deep learning API. https://keras.io/
  2. 2023. TorchView: Visualize Pytorch Model. https://torchview.dev/
  3. On the Naming of Methods: A Survey of Professional Developers. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 587–599. https://doi.org/10.1109/ICSE43902.2021.00061
  4. An experience report on machine learning reproducibility: Guidance for practitioners and TensorFlow model garden contributors. https://arxiv.org/abs/2107.00821
  5. Towards a change taxonomy for machine learning pipelines: Empirical study of ML pipelines and forks related to academic publications. Empirical Software Engineering 28, 3 (May 2023), 60. https://doi.org/10.1007/s10664-022-10282-8
  6. The impact of identifier style on effort and comprehension. Empirical software engineering 18 (2013), 219–276.
  7. Barry Bozeman. 2000. Technology transfer and public policy: a review of research and theory. Research Policy 29, 4-5 (April 2000), 627–655. https://doi.org/10.1016/S0048-7333(99)00093-1
  8. Language Models are Few-Shot Learners. Technical Report arXiv:2005.14165. arXiv. http://arxiv.org/abs/2005.14165
  9. Sparks of Artificial General Intelligence: Early experiments with GPT-4. http://arxiv.org/abs/2303.12712
  10. Exploring the Influence of Identifier Names on Code Quality: An Empirical Study. In 2010 14th European Conference on Software Maintenance and Reengineering. 156–165. https://doi.org/10.1109/CSMR.2010.27
  11. Kathy Charmaz. 2006. Constructing grounded theory: A practical guide through qualitative analysis. sage.
  12. Net2net: Accelerating learning via knowledge transfer. arXiv preprint arXiv:1511.05641 (2015).
  13. Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and psychological measurement 20, 1 (1960), 37–46.
  14. Reusing Deep Learning Models: Challenges and Directions in Software Engineering. In Proceedings of the IEEE John Vincent Atanasoff Symposium on Modern Computing (JVA’23).
  15. Hugging Face. 2023. Hugging Face Documentations. https://huggingface.co/docs
  16. A neural model for method name generation from functional description. In International Conference on Software Analysis, Evolution and Reengineering (SANER’19). IEEE, 414–421.
  17. A comprehensive study of autonomous vehicle bugs. In International Conference on Software Engineering (ICSE). https://dl.acm.org/doi/10.1145/3377811.3380397
  18. Transformers meet directed graphs. In International Conference on Machine Learning (ICML’23). PMLR, 11144–11172.
  19. Google. 2022. TensorFlow Model Garden. https://github.com/tensorflow/models
  20. “If security is required”: Engineering and Security Practices for Machine Learning-based IoT Devices. In International Workshop on Software Engineering Research & Practices for the Internet of Things (SERP4IoT).
  21. Christine Grima-Farrell. 2017. The RTP Model: An Interactive Research to Practice Framework. What Matters in a Research to Practice Cycle? Teachers as Researchers (2017), 237–250.
  22. Investigating package related security threats in software registries. In 2023 IEEE Symposium on Security and Privacy (SP). IEEE, 1578–1595.
  23. An Empirical Study of Malicious Code In PyPI Ecosystem. In Internaltional Conference on Automated Software Engineering (ASE’23).
  24. Pre-trained models: Past, present and future. AI Open 2 (2021), 225–250.
  25. On the naturalness of software. Commun. ACM 59, 5 (2016), 122–131.
  26. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
  27. Analysis of Failures and Risks in Deep Learning Model Converters: A Case Study in the ONNX Ecosystem. arXiv (2023). https://arxiv.org/abs/2303.17708
  28. Artificial intelligence in healthcare: past, present and future. Stroke and vascular neurology 2, 4 (2017).
  29. Challenges and practices of deep learning model reengineering: A case study on computer vision. arXiv preprint arXiv:2303.07476 (2023).
  30. An Empirical Study of Pre-Trained Model Reuse in the Hugging Face Deep Learning Model Registry. In IEEE/ACM 45th International Conference on Software Engineering (ICSE’23) (Melbourne, Australia).
  31. PTMTorrent: A Dataset for Mining Open-source Pre-trained Model Packages. Proceedings of the 20th International Conference on Mining Software Repositories (MSR’23) (2023).
  32. An Empirical Study of Artifacts and Security Risks in the Pre-trained Model Supply Chain. Los Angeles (2022), 10.
  33. Assessing the Vulnerabilities of the Open-Source Artificial Intelligence (AI) Landscape: A Large-Scale Analysis of the Hugging Face Platform. (2023).
  34. Donald Ervin Knuth. 1984. Literate programming. The computer journal 27, 2 (1984), 97–111.
  35. Taxonomy of Attacks on Open-Source Software Supply Chains. http://arxiv.org/abs/2204.04008
  36. SoK: Taxonomy of Attacks on Open-Source Software Supply Chains. In 2023 IEEE Symposium on Security and Privacy (SP). 1509–1526. https://doi.org/10.1109/SP46215.2023.10179304
  37. An empirical study of rules for well-formed identifiers. Journal of Software Maintenance and Evolution: Research and Practice 19, 4 (2007), 205–229. https://doi.org/10.1002/smr.350
  38. Effective identifier names for comprehension and memory. Innovations in Systems and Software Engineering 3, 4 (Nov. 2007), 303–318. https://doi.org/10.1007/s11334-007-0031-2
  39. The Power of Scale for Parameter-Efficient Prompt Tuning. http://arxiv.org/abs/2104.08691
  40. When Machine Learning Meets Privacy: A Survey and Outlook. Comput. Surveys (2022).
  41. Automating Deep Neural Network Model Selection for Edge Inference. In 2019 IEEE First International Conference on Cognitive Machine Intelligence (CogMI). 184–193. https://doi.org/10.1109/CogMI48466.2019.00035
  42. Meta. 2022. Torchvision. https://github.com/pytorch/vision
  43. Discrepancies among pre-trained deep neural networks: a new threat to model zoo reliability. In European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE-IVR track).
  44. Manas: mining software repositories to assist AutoML. In International Conference on Software Engineering (ICSE). ACM, Pittsburgh Pennsylvania, 1368–1380. https://doi.org/10.1145/3510003.3510052
  45. Software Entity Recognition with Noise-Robust Learning. In International Conference on Automated Software Engineering (ASE’23). arXiv. http://arxiv.org/abs/2308.10564
  46. The close relationship between contrastive learning and meta-learning. In International Conference on Learning Representations (ICLR’21).
  47. Knowledge transfer challenges and mitigation strategies in global software development—A systematic literature review and industrial validation. International Journal of Information Management 33, 2 (April 2013), 333–355. https://doi.org/10.1016/j.ijinfomgt.2012.11.004
  48. Backstabber’s Knife Collection: A Review of Open Source Software Supply Chain Attacks. In Detection of Intrusions and Malware, and Vulnerability Assessment, Clémentine Maurice, Leyla Bilge, Gianluca Stringhini, and Nuno Neves (Eds.). Vol. 12223. Springer International Publishing, Cham, 23–43. http://link.springer.com/10.1007/978-3-030-52683-2_2
  49. SoK: Analysis of Software Supply Chain Security by Establishing Secure Design Properties. In Proceedings of the 2022 ACM Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses. ACM, Los Angeles CA USA, 15–24. https://doi.org/10.1145/3560835.3564556
  50. ONNX. 2023. ONNX Model Zoo. https://github.com/onnx/models.
  51. Carbon Emissions and Large Neural Network Training. https://doi.org/10.48550/arXiv.2104.10350
  52. Reusing Deep Neural Network Models through Model Re-engineering. In International Conference on Software Engineering (ICSE’23). http://arxiv.org/abs/2304.00245
  53. Language Models are Unsupervised Multitask Learners. OpenAI blog 1, 8 (2019), 9.
  54. Hidden Technical Debt in Machine Learning Systems. In Advances in Neural Information Processing Systems, Vol. 28. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2015/hash/86df7dcfd896fcaf2674f757a2463eba-Abstract.html
  55. Backdoor Pre-trained Models Can Transfer to All. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 3141–3158. https://doi.org/10.1145/3460120.3485370
  56. Energy and Policy Considerations for Deep Learning in NLP. http://arxiv.org/abs/1906.02243
  57. Michal Tkáč and Robert Verner. 2016. Artificial neural networks in business: Two decades of research. Applied Soft Computing 38 (2016), 788–804.
  58. Pre-Implementation Method Name Prediction for Object-Oriented Programming. ACM Transactions on Software Engineering and Methodology (TOSEM) (2023). https://doi.org/10.1145/3597203
  59. Wikipedia. 2023. Naming convention (programming). https://en.wikipedia.org/wiki/Naming_convention_(programming)
  60. Ranking and Tuning Pre-trained Models: A New Paradigm of Exploiting Model Hubs. The Journal of Machine Learning Research (JMLR) 23, 1 (Oct. 2021), 9400–9446. http://arxiv.org/abs/2110.10545
  61. LogME: Practical Assessment of Pre-trained Models for Transfer Learning. In International Conference on Machine Learning (ICML). PMLR, 12133–12143. https://proceedings.mlr.press/v139/you21b.html
  62. An Empirical Study on the Usage and Evolution of Identifier Styles in Practice. In Asia-Pacific Software Engineering Conference (APSEC). IEEE, 171–180.
  63. An Empirical Study of Common Challenges in Developing Deep Learning Applications. In International Symposium on Software Reliability Engineering (ISSRE).
  64. Small World with High Risks: A Study of Security Threats in the npm Ecosystem. In USENIX Security Symposium. https://doi.org/10.5555/3361338.3361407
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Wenxin Jiang (33 papers)
  2. Chingwo Cheung (1 paper)
  3. George K. Thiruvathukal (48 papers)
  4. James C. Davis (60 papers)
  5. Mingyu Kim (24 papers)
  6. Heesoo Kim (2 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com