Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Confidant: Customizing Transformer-based LLMs via Collaborative Edge Training (2311.13381v1)

Published 22 Nov 2023 in cs.LG, cs.AI, and cs.DC

Abstract: Transformer-based LLMs have demonstrated impressive capabilities in a variety of NLP tasks. Nonetheless, it is challenging to deploy and fine-tune LLMs on mobile edge devices with limited computing, memory, and energy budgets. In this paper, we propose Confidant, a multi-backend collaborative training framework for customizing state-of-the-art LLMs on commodity mobile devices like smartphones. Confidant partitions an LLM into several sub-models so that each fits into a mobile device's memory. A pipeline parallel training mechanism is further developed to ensure fast and efficient distributed training. In addition, we propose a novel backend scheduler to allocate different attention heads to heterogeneous compute hardware, including mobile CPU and GPUs, to maximize the compute resource utilization on each edge device. Our preliminary experimental results show that Confidant achieves at most 45.3% memory reduction and 8.03x inference speedup in practical settings.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yuhao Chen (84 papers)
  2. Yuxuan Yan (15 papers)
  3. Qianqian Yang (93 papers)
  4. Yuanchao Shu (14 papers)
  5. Shibo He (44 papers)
  6. Jiming Chen (105 papers)
Citations (1)