TrainerAgent: Customizable and Efficient Model Training through LLM-Powered Multi-Agent System (2311.06622v2)
Abstract: Training AI models has always been challenging, especially when there is a need for custom models to provide personalized services. Algorithm engineers often face a lengthy process to iteratively develop models tailored to specific business requirements, making it even more difficult for non-experts. The quest for high-quality and efficient model development, along with the emergence of LLM Agents, has become a key focus in the industry. Leveraging the powerful analytical, planning, and decision-making capabilities of LLM, we propose a TrainerAgent system comprising a multi-agent framework including Task, Data, Model and Server agents. These agents analyze user-defined tasks, input data, and requirements (e.g., accuracy, speed), optimizing them comprehensively from both data and model perspectives to obtain satisfactory models, and finally deploy these models as online service. Experimental evaluations on classical discriminative and generative tasks in computer vision and natural language processing domains demonstrate that our system consistently produces models that meet the desired criteria. Furthermore, the system exhibits the ability to critically identify and reject unattainable tasks, such as fantastical scenarios or unethical requests, ensuring robustness and safety. This research presents a significant advancement in achieving desired models with increased efficiency and quality as compared to traditional model development, facilitated by the integration of LLM-powered analysis, decision-making, and execution capabilities, as well as the collaboration among four agents. We anticipate that our work will contribute to the advancement of research on TrainerAgent in both academic and industry communities, potentially establishing it as a new paradigm for model development in the field of AI.
- R.M. Belbin. Team Roles at Work. Routledge, 2012.
- Large language models as tool makers. arXiv preprint, 2023.
- Query-guided regression network with context policy for phrase grounding. In ICCV, 2017.
- Peopleware: Productive Projects and Teams. Addison-Wesley, 2013.
- Transvg: End-to-end visual grounding with transformers. CoRR, abs/2104.08541, 2021.
- Improving factuality and reasoning in language models through multiagent debate, 2023.
- Chatllm network: More brains, more intelligence. arXiv preprint, 2023.
- Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2023.
- Modeling relationships in referential expressions with compositional modular networks. In CVPR, 2017.
- Generspeech: Towards style transfer for generalizable out-of-domain text-to-speech. In Advances in Neural Information Processing Systems.
- Prodiff: Progressive fast diffusion model for high-quality text-to-speech. In Proceedings of the 30th ACM International Conference on Multimedia, pages 2595–2605, 2022.
- Dual low-rank multimodal fusion. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 377–387, 2020.
- Deep visual-semantic alignments for generating image descriptions. IEEE Trans. Pattern Anal. Mach. Intell., 39(4):664–676, 2017.
- Camel: Communicative agents for” mind” exploration of large scale language model society. arXiv preprint, 2023.
- Date: Domain adaptive product seeker for e-commerce. In CVPR, pages 19315–19324, 2023.
- Encouraging divergent thinking in large language models through multi-agent debate. arXiv preprint, 2023.
- Simullr: Simultaneous lip reading transducer with attention-guided adaptive memory. In Proceedings of the 29th ACM International Conference on Multimedia, pages 1359–1367, 2021.
- Relation-aware instance refinement for weakly supervised visual grounding. In CVPR, pages 5612–5621, 2021.
- Agile Manifesto. Manifesto for agile software development. Snowbird, UT, 2001.
- Generative agents: Interactive simulacra of human behavior. arXiv preprint, 2023.
- Grounding of textual phrases in images by reconstruction. In ECCV, volume 9905, pages 817–834, 2016.
- Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580, 2023.
- Prompt2model: Generating deployable models from natural language instructions. arXiv preprint arXiv:2308.12261, 2023.
- Unleashing cognitive synergy in large language models: A task-solving agent through multi-persona self-collaboration. arXiv preprint, 2023.
- Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155, 2023.
- Video-guided curriculum learning for spoken video grounding. In Proceedings of the 30th ACM International Conference on Multimedia, pages 5191–5200, 2022.
- Videoqa: question answering on news video. In Proceedings of the eleventh ACM international conference on Multimedia, pages 632–641, 2003.
- A fast and accurate one-stage approach to visual grounding. In ICCV, 2019.
- Simulslt: End-to-end simultaneous sign language translation. In Proceedings of the 29th ACM International Conference on Multimedia, pages 4118–4127, 2021.
- Mattnet: Modular attention network for referring expression comprehension. In CVPR, 2018.
- Automl-gpt: Automatic machine learning with gpt. arXiv preprint arXiv:2305.02499, 2023.
- Towards effective multi-modal interchanges in zero-resource sounding object localization. In NIPS, 2022.
- Mindstorms in natural language-based societies of mind. arXiv preprint arXiv:2305.17066, 2023.