Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks (2402.15351v1)

Published 23 Feb 2024 in cs.LG and cs.CV

Abstract: Automated machine learning (AutoML) is a collection of techniques designed to automate the machine learning development process. While traditional AutoML approaches have been successfully applied in several critical steps of model development (e.g. hyperparameter optimization), there lacks a AutoML system that automates the entire end-to-end model production workflow. To fill this blank, we present AutoMMLab, a general-purpose LLM-empowered AutoML system that follows user's language instructions to automate the whole model production workflow for computer vision tasks. The proposed AutoMMLab system effectively employs LLMs as the bridge to connect AutoML and OpenMMLab community, empowering non-expert individuals to easily build task-specific models via a user-friendly language interface. Specifically, we propose RU-LLaMA to understand users' request and schedule the whole pipeline, and propose a novel LLM-based hyperparameter optimizer called HPO-LLaMA to effectively search for the optimal hyperparameters. Experiments show that our AutoMMLab system is versatile and covers a wide range of mainstream tasks, including classification, detection, segmentation and keypoint estimation. We further develop a new benchmark, called LAMP, for studying key components in the end-to-end prompt-based model training pipeline. Code, model, and data will be released.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (82)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. Random search for hyper-parameter optimization. Journal of machine learning research, 13(2), 2012.
  3. Natural language processing with Python: analyzing text with the natural language toolkit. ” O’Reilly Media, Inc.”, 2009.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  5. Proxylessnas: Direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332, 2018.
  6. End-to-end object detection with transformers. In ECCV, 2020.
  7. MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155, 2019.
  8. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017.
  9. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 2018.
  10. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113, 2023.
  11. MMPose Contributors. Openmmlab pose estimation toolbox and benchmark. https://github.com/open-mmlab/mmpose, 2020a.
  12. MMSegmentation Contributors. MMSegmentation: Openmmlab semantic segmentation toolbox and benchmark. https://github.com/open-mmlab/mmsegmentation, 2020b.
  13. MMDeploy Contributors. Openmmlab’s model deployment toolbox. https://github.com/open-mmlab/mmdeploy, 2021.
  14. MMPreTrain Contributors. Openmmlab’s pre-training toolbox and benchmark. https://github.com/open-mmlab/mmpretrain, 2023.
  15. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016.
  16. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
  17. Efficient and robust automated machine learning. Advances in neural information processing systems, 28, 2015.
  18. Forward and reverse gradient-based hyperparameter optimization. In International Conference on Machine Learning, pages 1165–1173. PMLR, 2017.
  19. Dual attention network for scene segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  20. YOLOX: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430, 2021.
  21. Alex Graves. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.
  22. Visual programming: Compositional visual reasoning without training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14953–14962, 2023.
  23. Adaptive pyramid context network for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  24. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  25. Automl: A survey of the state-of-the-art. Knowledge-Based Systems, 212:106622, 2021.
  26. Maximum-likelihood estimation with a contracting-grid search algorithm. IEEE transactions on nuclear science, 57(3):1077–1084, 2010.
  27. CAAFE: Combining large language models with tabular predictors for semi-automated data science. In 1st Workshop on the Synergy of Scientific and Machine Learning Modeling @ ICML2023, 2023.
  28. Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2023.
  29. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
  30. A practical guide to support vector classification, 2003.
  31. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022.
  32. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141, 2018.
  33. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
  34. Auto-keras: An efficient neural architecture search system. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1946–1956, 2019.
  35. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  36. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 2017.
  37. Microsoft coco: Common objects in context. In ECCV, 2014.
  38. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, 2017.
  39. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018.
  40. Improved baselines with visual instruction tuning. arXiv preprint arXiv:2310.03744, 2023a.
  41. Visual instruction tuning. arXiv preprint arXiv:2304.08485, 2023b.
  42. Ssd: Single shot multibox detector. ECCV, 2016.
  43. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030, 2021.
  44. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  45. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European conference on computer vision (ECCV), pages 116–131, 2018.
  46. Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv preprint arXiv:2110.02178, 2021.
  47. Feature selection, extraction and construction. Communication of IICM (Institute of Information and Computing Machinery, Taiwan), 5(67-72):2, 2002.
  48. Fabian Pedregosa. Hyperparameter optimization with approximate gradient. In International conference on machine learning, pages 737–746. PMLR, 2016.
  49. Yolov3: An incremental improvement, 2018.
  50. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
  51. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
  52. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023.
  53. Fully convolutional networks for semantic segmentation. IEEE transactions on pattern analysis and machine intelligence, 39(4):640–651, 2017.
  54. Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580, 2023.
  55. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  56. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5693–5703, 2019.
  57. On the importance of initialization and momentum in deep learning. In International conference on machine learning, pages 1139–1147. PMLR, 2013.
  58. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
  59. Auto-weka: Combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 847–855, 2013.
  60. Fcos: Fully convolutional one-stage object detection. arXiv preprint arXiv:1904.01355, 2019.
  61. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
  62. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.
  63. Prompt2model: Generating deployable models from natural language instructions. arXiv preprint arXiv:2308.12261, 2023.
  64. Vega: Towards an end-to-end configurable automl pipeline. arXiv preprint arXiv:2011.01507, 2020.
  65. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560, 2022.
  66. Limits and possibilities for “ethical ai” in open source: A study of deepfakes. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 2035–2046, 2022.
  67. Visual chatgpt: Talking, drawing and editing with visual foundation models. arXiv preprint arXiv:2303.04671, 2023.
  68. Fastfcn: Rethinking dilated convolution in the backbone for semantic segmentation. arXiv preprint arXiv:1903.11816, 2019.
  69. Simple baselines for human pose estimation and tracking. In European Conference on Computer Vision (ECCV), 2018a.
  70. Unified perceptual parsing for scene understanding. In Proceedings of the European Conference on Computer Vision (ECCV), pages 418–434, 2018b.
  71. Segformer: Simple and efficient design for semantic segmentation with transformers. arXiv preprint arXiv:2105.15203, 2021.
  72. Recognition from web data: A progressive filtering approach. IEEE Transactions on Image Processing, 27(11):5303–5315, 2018.
  73. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European conference on computer vision (ECCV), pages 325–341, 2018.
  74. Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. International Journal of Computer Vision, pages 1–18, 2021a.
  75. Lite-hrnet: A lightweight high-resolution network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10440–10450, 2021b.
  76. Ap-10k: A benchmark for animal pose estimation in the wild, 2021c.
  77. Automl-gpt: Automatic machine learning with gpt. arXiv preprint arXiv:2305.02499, 2023.
  78. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6848–6856, 2018.
  79. Pyramid scene parsing network. In CVPR, 2017.
  80. Can gpt-4 perform neural architecture search? arXiv preprint arXiv:2304.10970, 2023.
  81. Feature selective anchor-free module for single-shot object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 840–849, 2019.
  82. Auto-pytorch tabular: Multi-fidelity metalearning for efficient and robust autodl. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 3079 – 3090, 2021. also available under https://arxiv.org/abs/2006.13799.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zekang Yang (5 papers)
  2. Wang Zeng (9 papers)
  3. Sheng Jin (69 papers)
  4. Chen Qian (226 papers)
  5. Ping Luo (340 papers)
  6. Wentao Liu (87 papers)
Citations (5)
X Twitter Logo Streamline Icon: https://streamlinehq.com