Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 87 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 13 tok/s Pro
GPT-5 High 16 tok/s Pro
GPT-4o 98 tok/s Pro
GPT OSS 120B 472 tok/s Pro
Kimi K2 210 tok/s Pro
2000 character limit reached

Adaptive Decoding via Latent Preference Optimization (2411.09661v1)

Published 14 Nov 2024 in cs.CL

Abstract: During LLM decoding, it is known that using higher temperature sampling gives more creative responses, while lower temperatures are more factually accurate. However, such models are commonly applied to general instruction following, which involves both creative and fact seeking tasks, using a single fixed temperature across all examples and tokens. In this work, we introduce Adaptive Decoding, a layer added to the model to select the sampling temperature dynamically at inference time, at either the token or example level, in order to optimize performance. To learn its parameters we introduce Latent Preference Optimization (LPO) a general approach to train discrete latent variables such as choices of temperature. Our method outperforms all fixed decoding temperatures across a range of tasks that require different temperatures, including UltraFeedback, Creative Story Writing, and GSM8K.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces the ADAPTIVE DECODER, a neural module that dynamically selects temperature settings to balance factuality and creativity.
  • It employs Latent Preference Optimization to tune temperature values based on task-specific evaluations across math, storytelling, and mixed-task datasets.
  • Experiments demonstrate that the adaptive approach outperforms fixed temperature models, enhancing performance in both deterministic and creative applications.

Essay on "Adaptive Decoding via Latent Preference Optimization"

The paper "Adaptive Decoding via Latent Preference Optimization" proposes a novel approach to address the challenges associated with selecting the optimal decoding temperature during LLM (LM) inference. The authors introduce Adaptive Decoding, which leverages a learnable component called the ADAPTIVE DECODER to dynamically adjust the temperature for sampling, whether at the sequence or token level. This method is underpinned by Latent Preference Optimization (LPO), a technique for training discrete latent variables.

Key Contributions and Approach

The central contribution of this paper is the introduction of the ADAPTIVE DECODER, a neural module interfaced with a LLM's final layers. This module computes a probability distribution over a predefined set of temperature values. Rather than using a static temperature for all task instances, the ADAPTIVE DECODER determines a task-specific temperature, thus optimizing the trade-off between creativity and factuality in the generated text. Training this module involves LPO, which derives optimal temperature settings from preference pairs established through different response evaluations.

The paper's methodology involves evaluating the ADAPTIVE DECODER across a suite of tasks that traditionally benefit from varying temperature settings. The tasks addressed include math problem solving (GSM8K), story generation (Stories), and a mixed-task dataset (UltraFeedback). Each task has distinct requirements: accurate deterministic responses for math tasks and diverse, imaginative outputs for creative story writing. The adaptive approach demonstrates superior performance over fixed temperature methods across all tasks.

Numerical Results and Observations

Quantitative analyses presented in the paper highlight Adaptive Decoding's proficiency in both single-task and multi-task scenarios. In experiments using the GSM8K dataset, the ADAPTIVE DECODER achieves an accuracy matching or surpassing the best fixed temperature models, with noticeable improvements in creative tasks where diversity is advantageous. Furthermore, through self-consistency benchmarks on tasks that aggregate several outputs (majority voting), the adaptive model procures superior outcomes by judiciously sampling temperatures for the reasoning chains.

The paper of constrained creative writing prompts demonstrates the decoder's ability to navigate tasks that require alternating between strict and lenient generation rules. ADAPTIVE DECODER tok, at a token level, learns to deploy minimal temperatures on constraint macrons while allowing creative freedom elsewhere, effectively balancing the dual demands.

Implications and Future Developments

The introduction of LPO and its application in training the ADAPTIVE DECODER present significant implications for the field. By integrating task-specific contextual understanding into hyperparameter adjustment, LPO opens pathways for enhancing adaptability in varied task environments found in large-scale models. Moving forward, this strategy may prove beneficial in adjusting other discrete hyperparameters like top-k or top-p, fundamentally altering hyperparameter tuning practices within LLM architectures.

The implications of this research potentially extend to enhancing LLMs' applicability in real-world, dynamic settings where user prompts may widely vary. Subsequent research could focus on refining LPO for broader classes of latent variables or integrating an expanded set of linguistic tasks to further generalize the adaptive methodology beyond what's explored in the current paper.

By advancing the capability to auto-tune model inference parameters dynamically, this work contributes to a more flexible, user-centric approach to deploying LLMs across diverse domains, thus marking a pivotal step in AI-driven text generation technology.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com