Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 91 tok/s
Gemini 3.0 Pro 46 tok/s Pro
Gemini 2.5 Flash 148 tok/s Pro
Kimi K2 170 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Online Training of Large Language Models: Learn while chatting (2403.04790v1)

Published 4 Mar 2024 in cs.CL and cs.AI

Abstract: LLMs(LLMs) have dramatically revolutionized the field of Natural Language Processing(NLP), offering remarkable capabilities that have garnered widespread usage. However, existing interaction paradigms between LLMs and users are constrained by either inflexibility, limitations in customization, or a lack of persistent learning. This inflexibility is particularly evident as users, especially those without programming skills, have restricted avenues to enhance or personalize the model. Existing frameworks further complicate the model training and deployment process due to their computational inefficiencies and lack of user-friendly interfaces. To overcome these challenges, this paper introduces a novel interaction paradigm-'Online Training using External Interactions'-that merges the benefits of persistent, real-time model updates with the flexibility for individual customization through external interactions such as AI agents or online/offline knowledge bases.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (87)
  1. Albers, S. Online algorithms: a survey. Mathematical Programming 97 (2003), 3–26.
  2. A tree-based statistical language model for natural language speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing 37, 7 (1989), 1001–1008.
  3. Scaling learning algorithms towards AI. In Large Scale Kernel Machines. MIT Press, 2007.
  4. Colossal-ai: A unified deep learning system for large-scale parallel training. arXiv preprint arXiv:2110.14883 (2021).
  5. Continual lifelong learning in natural language processing: A survey. In Proceedings of the 28th International Conference on Computational Linguistics (2020), International Committee on Computational Linguistics.
  6. Gpt-neox-20b: An open-source autoregressive language model. arXiv preprint arXiv:2204.06745 (2022).
  7. Large language models in machine translation.
  8. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  9. Chase, H. LangChain, Oct. 2022.
  10. When large language models meet personalization: Perspectives of challenges and opportunities, 2023.
  11. Deep reinforcement learning from human preferences. Advances in neural information processing systems 30 (2017).
  12. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416 (2022).
  13. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168 (2021).
  14. Natural language processing (almost) from scratch. Journal of machine learning research 12, ARTICLE (2011), 2493–2537.
  15. Why can gpt learn in-context? language models implicitly perform gradient descent as meta-optimizers, 2023.
  16. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
  17. A survey for in-context learning. arXiv preprint arXiv:2301.00234 (2022).
  18. FairScale authors. Fairscale: A general purpose modular pytorch library for high performance and large scale training. https://github.com/facebookresearch/fairscale, 2021.
  19. FastChat authors. Lm-sys: Fastchat (vicuna: An open-source chatbot). https://github.com/lm-sys/FastChat, 2023.
  20. Deep learning, vol. 1. MIT Press, 2016.
  21. Retrieval augmented language model pre-training. In International conference on machine learning (2020), PMLR, pp. 3929–3938.
  22. Toolkengpt: Augmenting frozen language models with massive tools via tool embeddings. arXiv preprint arXiv:2305.11554 (2023).
  23. Haystack authors. Haystack. https://github.com/deepset-ai/haystack, 2023.
  24. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300 (2020).
  25. A fast learning algorithm for deep belief nets. Neural Computation 18 (2006), 1527–1554.
  26. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556 (2022).
  27. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
  28. A survey of knowledge enhanced pre-trained language models. IEEE Transactions on Knowledge and Data Engineering (2023).
  29. Large language models can self-improve. arXiv preprint arXiv:2210.11610 (2022).
  30. Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299 (2022).
  31. A survey on contrastive self-supervised learning. Technologies 9, 1 (2020), 2.
  32. Active retrieval augmented generation. arXiv preprint arXiv:2305.06983 (2023).
  33. What disease does this patient have? a large-scale open domain question answering dataset from medical exams, 2020.
  34. Openassistant conversations–democratizing large language model alignment. arXiv preprint arXiv:2304.07327 (2023).
  35. Langchain authors. LangChain. https://github.com/langchain-ai/langchain, 2023.
  36. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33 (2020), 9459–9474.
  37. Self-alignment with instruction backtranslation. arXiv preprint arXiv:2308.06259 (2023).
  38. Self-alignment with instruction backtranslation, 2023.
  39. What makes good in-context examples for gpt-3333? arXiv preprint arXiv:2101.06804 (2021).
  40. Lost in the middle: How language models use long contexts, 2023.
  41. Relational memory-augmented language models. Transactions of the Association for Computational Linguistics 10 (2022), 555–572.
  42. Kelm: knowledge enhanced pre-trained language representations with message passing on hierarchical relational graphs. arXiv preprint arXiv:2109.04223 (2021).
  43. Text generation with text-editing models. arXiv preprint arXiv:2206.07043 (2022).
  44. Megatron-DeepSpeed authors. Megatron-DeepSpeed. https://github.com/microsoft/Megatron-DeepSpeed, 2023.
  45. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
  46. Fine-tuning or retrieval? comparing knowledge injection in llms, 2023.
  47. Talm: Tool augmented language models. arXiv preprint arXiv:2205.12255 (2022).
  48. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (2023), pp. 1–22.
  49. Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334 (2023).
  50. Check your facts and try again: Improving large language models with external knowledge and automated feedback. arXiv preprint arXiv:2302.12813 (2023).
  51. Knowledge enhanced contextual word representations. arXiv preprint arXiv:1909.04164 (2019).
  52. Tool learning with foundation models, 2023.
  53. Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789 (2023).
  54. Improving language understanding by generative pre-training.
  55. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
  56. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.
  57. Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2020), pp. 3505–3506.
  58. Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207 (2021).
  59. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761 (2023).
  60. Peer: A collaborative language model. arXiv preprint arXiv:2208.11663 (2022).
  61. Shanahan, M. Talking about large language models. arXiv preprint arXiv:2212.03551 (2022).
  62. Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366 (2023).
  63. Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053 (2019).
  64. Mass: Masked sequence to sequence pre-training for language generation, 2019.
  65. An information-theoretic approach to prompt engineering without ground truth labels. arXiv preprint arXiv:2203.11364 (2022).
  66. Moss: Training conversational language models from synthetic data.
  67. Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation. arXiv preprint arXiv:2107.02137 (2021).
  68. Galactica: A large language model for science. arXiv preprint arXiv:2211.09085 (2022).
  69. Text Generation Inference authors. Text Generation Inference. https://github.com/huggingface/text-generation-inference, 2023.
  70. Memorization without overfitting: Analyzing the training dynamics of large language models. Advances in Neural Information Processing Systems 35 (2022), 38274–38290.
  71. Llama: Open and efficient foundation language models, 2023.
  72. vLLM authors. vllm. https://github.com/vllm-project/vllm, 2023.
  73. Shall we pretrain autoregressive language models with retrieval? a comprehensive study. arXiv preprint arXiv:2304.06762 (2023).
  74. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560 (2022).
  75. Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur) 53, 3 (2020), 1–34.
  76. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652 (2021).
  77. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
  78. The rise and potential of large language model based agents: A survey. arXiv preprint arXiv:2309.07864 (2023).
  79. {{\{{SkyPilot}}\}}: An intercloud broker for sky computing. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23) (2023), pp. 437–455.
  80. Deepspeed-chat: Easy, fast and affordable rlhf training of chatgpt-like models at all scales. arXiv preprint arXiv:2308.01320 (2023).
  81. DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales. arXiv preprint arXiv:2308.01320 (2023).
  82. Openbmb: Big model systems for large-scale representation learning. In Representation Learning for Natural Language Processing. Springer, 2023, pp. 463–489.
  83. Zeng, H. Measuring massive multitask chinese understanding. arXiv preprint arXiv:2304.12986 (2023).
  84. A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).
  85. Judging llm-as-a-judge with mt-bench and chatbot arena, 2023.
  86. Lima: Less is more for alignment, 2023.
  87. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593 (2019).
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper: