AI capabilities can be significantly improved without expensive retraining (2312.07413v1)
Abstract: State-of-the-art AI systems can be significantly improved without expensive retraining via "post-training enhancements"-techniques applied after initial training like fine-tuning the system to use a web browser. We review recent post-training enhancements, categorizing them into five types: tool-use, prompting methods, scaffolding, solution selection, and data generation. Different enhancements improve performance on different tasks, making it hard to compare their significance. So we translate improvements from different enhancements into a common currency, the compute-equivalent gain: how much additional training compute would be needed to improve performance by the same amount as the enhancement. Our non-experimental work shows that post-training enhancements have significant benefits: most surveyed enhancements improve benchmark performance by more than a 5x increase in training compute, some by more than 20x. Post-training enhancements are relatively cheap to develop: fine-tuning costs are typically <1% of the original training cost. Governing the development of capable post-training enhancements may be challenging because frontier models could be enhanced by a wide range of actors.
- Compute trends across three eras of machine learning, 2022. URL https://arxiv.org/abs/2202.05924.
- Will we run out of data? an analysis of the limits of scaling datasets in machine learning, 2022. URL https://arxiv.org/abs/2211.04325.
- Measuring the algorithmic efficiency of neural networks, 2020. URL https://arxiv.org/abs/2005.04305.
- Algorithmic progress in computer vision, 2023. URL https://arxiv.org/abs/2212.05153.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL https://openreview.net/forum?id=uyTL5Bvosj.
- Trading off compute in training and inference, 2023. URL https://epochai.org/blog/trading-off-compute-in-training-and-inference. Accessed: 2023-11-10.
- Frontier ai regulation: Managing emerging risks to public safety, 2023. URL https://arxiv.org/abs/2307.03718.
- Augmented language models: a survey, 2023. URL https://arxiv.org/abs/2302.07842.
- Measuring mathematical problem solving with the math dataset, 2021. URL https://arxiv.org/abs/2103.03874.
- Language models are few-shot learners, 2020. URL https://arxiv.org/abs/2005.14165.
- Measuring and narrowing the compositionality gap in language models, 2022. URL https://arxiv.org/abs/2210.03350.
- Chain-of-thought prompting elicits reasoning in large language models, 2022. URL https://arxiv.org/abs/2201.11903.
- Solving quantitative reasoning problems with language models, 2022. URL https://arxiv.org/abs/2206.14858.
- Talm: Tool augmented language models, 2022. URL https://arxiv.org/abs/2205.12255.
- Competition-level code generation with alphacode, 2022. URL https://arxiv.org/abs/2203.07814.
- Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models, 2023a. URL https://arxiv.org/abs/2305.04091.
- Large language models can self-improve, 2022. URL https://arxiv.org/abs/2210.11610.
- Toolformer: Language models can teach themselves to use tools, 2023. URL https://arxiv.org/abs/2302.04761.
- Evaluating language-model agents on realistic autonomous tasks, July 2023. URL https://evals.alignment.org/language-model-pilot-report.
- Emergent autonomous scientific research capabilities of large language models, 2023. URL https://arxiv.org/abs/2304.05332.
- Epoch. Key trends and figures in machine learning, 2023. URL https://epochai.org/trends. Accessed: 2023-11-30.
- Training compute-optimal large language models, 2022. URL https://arxiv.org/abs/2203.15556.
- Scaling laws for autoregressive generative modeling, 2020. URL https://arxiv.org/abs/2010.14701.
- Scaling laws for single-agent reinforcement learning, 2023. URL https://arxiv.org/abs/2301.13442.
- Ben Cottier. Trends in the dollar training cost of machine learning systems, 2023. URL https://epochai.org/blog/trends-in-the-dollar-training-cost-of-machine-learning-systems. Accessed: 2023-11-10.
- GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model, May 2021. URL https://github.com/kingoflolz/mesh-transformer-jax.
- Opt: Open pre-trained transformer language models, 2022. URL https://arxiv.org/abs/2205.01068.
- Webgpt: Browser-assisted question-answering with human feedback, 2021. URL https://arxiv.org/abs/2112.09332.
- Improving language models by retrieving from trillions of tokens, 2021. URL https://arxiv.org/abs/2112.04426.
- Palm: Scaling language modeling with pathways, 2022.
- Measuring faithfulness in chain-of-thought reasoning, 2023. URL https://arxiv.org/abs/2307.13702.
- Show your work: Scratchpads for intermediate computation with language models, 2021. URL https://arxiv.org/abs/2112.00114.
- Tree of thoughts: Deliberate problem solving with large language models, 2023. URL https://arxiv.org/abs/2305.10601.
- Parsel: Algorithmic reasoning with language models by composing decompositions, 2023. URL https://arxiv.org/abs/2212.10561.
- Significant Gravitas. AutoGPT, 2023. URL https://github.com/Significant-Gravitas/AutoGPT.
- Zvi Mowshowitz. On autogpt, 2023. URL https://www.lesswrong.com/posts/566kBoPi76t8KAkoD/on-autogpt. [Accessed 10-11-2023].
- A survey on large language model based autonomous agents, 2023b. URL https://arxiv.org/abs/2308.11432.
- Cognitive architectures for language agents, 2023. URL https://arxiv.org/abs/2309.02427.
- Lilian Weng. Llm-powered autonomous agents. lilianweng.github.io, 2023. URL https://lilianweng.github.io/posts/2023-06-23-agent/.
- Reflexion: Language agents with verbal reinforcement learning, 2023. URL https://arxiv.org/abs/2303.11366.
- Voyager: An open-ended embodied agent with large language models, 2023c. URL https://arxiv.org/abs/2305.16291.
- Generative agents: Interactive simulacra of human behavior, 2023. URL https://arxiv.org/abs/2304.03442.
- Language agent tree search unifies reasoning acting and planning in language models, 2023. URL https://arxiv.org/abs/2310.04406.
- Epoch. Parameter, compute and data trends in machine learning. https://epochai.org/data/pcd, 2022.
- Fireact: Toward language agent fine-tuning, 2023. URL https://arxiv.org/abs/2310.05915.
- Training verifiers to solve math word problems, 2021. URL https://arxiv.org/abs/2110.14168.
- Let’s verify step by step, 2023. URL https://arxiv.org/abs/2305.20050.
- Orca: Progressive learning from complex explanation traces of gpt-4, 2023. URL https://arxiv.org/abs/2306.02707.
- The false promise of imitating proprietary llms, 2023. URL https://arxiv.org/abs/2305.15717.
- Language models can teach themselves to program better, 2022. URL https://arxiv.org/abs/2207.14502.
- The pile: An 800gb dataset of diverse text for language modeling, 2020. URL https://arxiv.org/abs/2101.00027.
- GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow, August 2021. URL https://doi.org/10.5281/zenodo.5297715.
- Training language models to follow instructions with human feedback, 2022. URL https://arxiv.org/abs/2203.02155.
- Model evaluation for extreme risks, 2023. URL https://arxiv.org/abs/2305.15324.
- Irene Solaiman. The gradient of generative ai release: Methods and considerations, 2023. URL https://arxiv.org/abs/2302.04844.
- Anthropic. Anthropic’s responsible scaling policy, 2023. URL https://www-files.anthropic.com/production/files/responsible-scaling-policy-1.0.pdf. [Accessed 20-11-2023].
- OpenAI. Gpt-4 system card, 2023. URL https://cdn.openai.com/papers/gpt-4-system-card.pdf. [Accessed 20-11-2023].
- Deployment corrections: An incident response framework for frontier ai models, 2023. URL https://static1.squarespace.com/static/64edf8e7f2b10d716b5ba0e1/t/651c397fc04af033499df9f8/1696348544356/Deployment+corrections_+an+incident+response+framework+for+frontier+AI+models.pdf. [Accessed 20-11-2023].
- Oversight for frontier ai through a know-your-customer scheme for compute providers, 2023. URL https://arxiv.org/abs/2310.13625.
- Yonadav Shavit. What does it take to catch a chinchilla? verifying rules on large-scale neural network training via compute monitoring, 2023. URL https://arxiv.org/abs/2303.11341.
- Identifying the risks of lm agents with an lm-emulated sandbox, 2023. URL https://arxiv.org/abs/2309.15817.
- Agentbench: Evaluating llms as agents, 2023. URL https://arxiv.org/abs/2308.03688.
- Intercode: Standardizing and benchmarking interactive coding with execution feedback, 2023. URL https://arxiv.org/abs/2306.14898.
- A general language assistant as a laboratory for alignment, 2021. URL https://arxiv.org/abs/2112.00861.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.