ModelGPT: Unleashing LLM's Capabilities for Tailored Model Generation (2402.12408v1)
Abstract: The rapid advancement of LLMs has revolutionized various sectors by automating routine tasks, marking a step toward the realization of AGI. However, they still struggle to accommodate the diverse and specific needs of users and simplify the utilization of AI models for the average user. In response, we propose ModelGPT, a novel framework designed to determine and generate AI models specifically tailored to the data or task descriptions provided by the user, leveraging the capabilities of LLMs. Given user requirements, ModelGPT is able to provide tailored models at most 270x faster than the previous paradigms (e.g. all-parameter or LoRA finetuning). Comprehensive experiments on NLP, CV, and Tabular datasets attest to the effectiveness of our framework in making AI models more accessible and user-friendly. Our code is available at https://github.com/IshiKura-a/ModelGPT.
- Comparative analysis of statistical pattern recognition methods in high dimensional settings. Pattern Recognit., 27:1065–1077, 1994. URL https://api.semanticscholar.org/CorpusID:22910802.
- Hyperstyle: Stylegan inversion with hypernetworks for real image editing. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pp. 18490–18500. IEEE, 2022. doi: 10.1109/CVPR52688.2022.01796. URL https://doi.org/10.1109/CVPR52688.2022.01796.
- Adult. UCI Machine Learning Repository, 1996. DOI: https://doi.org/10.24432/C5XW20.
- Knowledge acquisition and explanation for multi-attribute decision making. In 8th intl workshop on expert systems and their applications, pp. 59–78. Avignon France, 1988.
- Language models are few-shot learners. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.
- Modeling wine preferences by data mining from physicochemical properties. Decis. Support Syst., 47:547–553, 2009. URL https://api.semanticscholar.org/CorpusID:2996254.
- Classification of rice varieties using artificial intelligence methods. International Journal of Intelligent Systems and Applications in Engineering, 2019. URL https://api.semanticscholar.org/CorpusID:208105752.
- ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009.
- International application of a new probability algorithm for the diagnosis of coronary artery disease. The American journal of cardiology, 64 5:304–10, 1989. URL https://api.semanticscholar.org/CorpusID:23545303.
- BERT: pre-training of deep bidirectional transformers for language understanding. In Burstein, J., Doran, C., and Solorio, T. (eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, 2019. doi: 10.18653/V1/N19-1423. URL https://doi.org/10.18653/v1/n19-1423.
- Hyperinverter: Improving stylegan inversion via hypernetwork. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pp. 11379–11388. IEEE, 2022. doi: 10.1109/CVPR52688.2022.01110. URL https://doi.org/10.1109/CVPR52688.2022.01110.
- Specializing smaller language models towards multi-step reasoning. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pp. 10421–10430. PMLR, 2023. URL https://proceedings.mlr.press/v202/fu23d.html.
- Textbooks are all you need. CoRR, abs/2306.11644, 2023. doi: 10.48550/ARXIV.2306.11644. URL https://doi.org/10.48550/arXiv.2306.11644.
- Hypernetworks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017. URL https://openreview.net/forum?id=rkpACe1lx.
- Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer Society, 2016. doi: 10.1109/CVPR.2016.90. URL https://doi.org/10.1109/CVPR.2016.90.
- Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes. In Rogers, A., Boyd-Graber, J. L., and Okazaki, N. (eds.), Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023, pp. 8003–8017. Association for Computational Linguistics, 2023. doi: 10.18653/V1/2023.FINDINGS-ACL.507. URL https://doi.org/10.18653/v1/2023.findings-acl.507.
- Lora: Low-rank adaptation of large language models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. URL https://openreview.net/forum?id=nZeVKeeFYf9.
- HINT: hypernetwork instruction tuning for efficient zero- and few-shot generalisation. In Rogers, A., Boyd-Graber, J. L., and Okazaki, N. (eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pp. 11272–11288. Association for Computational Linguistics, 2023. doi: 10.18653/V1/2023.ACL-LONG.631. URL https://doi.org/10.18653/v1/2023.acl-long.631.
- Multiclass classification of dry beans using computer vision and machine learning techniques. Comput. Electron. Agric., 174:105507, 2020. URL https://api.semanticscholar.org/CorpusID:219762890.
- DUET: A tuning-free device-cloud collaborative parameters generation framework for efficient device model generalization. In Ding, Y., Tang, J., Sequeda, J. F., Aroyo, L., Castillo, C., and Houben, G. (eds.), Proceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April 2023 - 4 May 2023, pp. 3077–3085. ACM, 2023. doi: 10.1145/3543507.3583451. URL https://doi.org/10.1145/3543507.3583451.
- A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst., 62:22–31, 2014. URL https://api.semanticscholar.org/CorpusID:14181100.
- A comprehensive overview of large language models. CoRR, abs/2307.06435, 2023. doi: 10.48550/ARXIV.2307.06435. URL https://doi.org/10.48550/arXiv.2307.06435.
- OpenAI. GPT-4 technical report. CoRR, abs/2303.08774, 2023. doi: 10.48550/ARXIV.2303.08774. URL https://doi.org/10.48550/arXiv.2303.08774.
- Hypergan: A generative model for diverse, performant neural networks. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pp. 5361–5369. PMLR, 2019. URL http://proceedings.mlr.press/v97/ratzlaff19a.html.
- Mathematical discoveries from program search with large language models. Nature, 2023. doi: 10.1038/s41586-023-06924-6.
- Adapting visual category models to new domains. In Daniilidis, K., Maragos, P., and Paragios, N. (eds.), Computer Vision - ECCV 2010, 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part IV, volume 6314 of Lecture Notes in Computer Science, pp. 213–226. Springer, 2010. doi: 10.1007/978-3-642-15561-1_16. URL https://doi.org/10.1007/978-3-642-15561-1_16.
- Distributionally robust neural networks. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL https://openreview.net/forum?id=ryxGuJrFvS.
- Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs/1910.01108, 2019. URL http://arxiv.org/abs/1910.01108.
- Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face, 2023.
- Beyond memorization: Violating privacy via inference with large language models. CoRR, abs/2310.07298, 2023. doi: 10.48550/ARXIV.2310.07298. URL https://doi.org/10.48550/arXiv.2310.07298.
- Nuclear feature extraction for breast tumor diagnosis. In Electronic imaging, 1993. URL https://api.semanticscholar.org/CorpusID:14922543.
- Llama: Open and efficient foundation language models. CoRR, abs/2302.13971, 2023a. doi: 10.48550/ARXIV.2302.13971. URL https://doi.org/10.48550/arXiv.2302.13971.
- Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288, 2023b. doi: 10.48550/ARXIV.2307.09288. URL https://doi.org/10.48550/arXiv.2307.09288.
- Well-read students learn better: The impact of student initialization on knowledge distillation. CoRR, abs/1908.08962, 2019. URL http://arxiv.org/abs/1908.08962.
- The iris data set: In search of the source of virginica. Significance, 18, 2021. URL https://api.semanticscholar.org/CorpusID:244763032.
- Attention is all you need. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 5998–6008, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
- GLUE: A multi-task benchmark and analysis platform for natural language understanding. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019a. URL https://openreview.net/forum?id=rJ4km2R5t7.
- GLUE: A multi-task benchmark and analysis platform for natural language understanding. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019b. URL https://openreview.net/forum?id=rJ4km2R5t7.
- Deep visual domain adaptation: A survey. Neurocomputing, 312:135–153, October 2018. ISSN 0925-2312. doi: 10.1016/j.neucom.2018.05.083. URL http://dx.doi.org/10.1016/j.neucom.2018.05.083.
- Generalizing from a few examples: A survey on few-shot learning. ACM Comput. Surv., 53(3):63:1–63:34, 2021. doi: 10.1145/3386252. URL https://doi.org/10.1145/3386252.
- Chain-of-thought prompting elicits reasoning in large language models. In NeurIPS, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.html.
- Sustainable AI: environmental implications, challenges and opportunities. In Marculescu, D., Chi, Y., and Wu, C. (eds.), Proceedings of Machine Learning and Systems 2022, MLSys 2022, Santa Clara, CA, USA, August 29 - September 1, 2022. mlsys.org, 2022. URL https://proceedings.mlsys.org/paper/2022/hash/ed3d2c21991e3bef5e069713af9fa6ca-Abstract.html.
- A survey of resource-efficient llm and multimodal foundation models, 2024.
- Harnessing the power of llms in practice: A survey on chatgpt and beyond. CoRR, abs/2304.13712, 2023. doi: 10.48550/ARXIV.2304.13712. URL https://doi.org/10.48550/arXiv.2304.13712.
- A survey on large language model (LLM) security and privacy: The good, the bad, and the ugly. CoRR, abs/2312.02003, 2023. doi: 10.48550/ARXIV.2312.02003. URL https://doi.org/10.48550/arXiv.2312.02003.
- A survey of large language models. CoRR, abs/2303.18223, 2023. doi: 10.48550/ARXIV.2303.18223. URL https://doi.org/10.48550/arXiv.2303.18223.
- Domain generalization: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–20, 2022. ISSN 1939-3539. doi: 10.1109/tpami.2022.3195549. URL http://dx.doi.org/10.1109/TPAMI.2022.3195549.
- Zihao Tang (13 papers)
- Zheqi Lv (25 papers)
- Shengyu Zhang (160 papers)
- Fei Wu (317 papers)
- Kun Kuang (114 papers)