Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 87 tok/s

Gemini 2.5 Pro 50 tok/s Pro

GPT-5 Medium 13 tok/s Pro

GPT-5 High 16 tok/s Pro

GPT-4o 98 tok/s Pro

GPT OSS 120B 472 tok/s Pro

Kimi K2 210 tok/s Pro

2000 character limit reached

Adaptive Decoding via Latent Preference Optimization (2411.09661v1)

Published 14 Nov 2024 in cs.CL

Abstract: During LLM decoding, it is known that using higher temperature sampling gives more creative responses, while lower temperatures are more factually accurate. However, such models are commonly applied to general instruction following, which involves both creative and fact seeking tasks, using a single fixed temperature across all examples and tokens. In this work, we introduce Adaptive Decoding, a layer added to the model to select the sampling temperature dynamically at inference time, at either the token or example level, in order to optimize performance. To learn its parameters we introduce Latent Preference Optimization (LPO) a general approach to train discrete latent variables such as choices of temperature. Our method outperforms all fixed decoding temperatures across a range of tasks that require different temperatures, including UltraFeedback, Creative Story Writing, and GSM8K.

Collections

Summary

The paper introduces the ADAPTIVE DECODER, a neural module that dynamically selects temperature settings to balance factuality and creativity.
It employs Latent Preference Optimization to tune temperature values based on task-specific evaluations across math, storytelling, and mixed-task datasets.
Experiments demonstrate that the adaptive approach outperforms fixed temperature models, enhancing performance in both deterministic and creative applications.

Essay on "Adaptive Decoding via Latent Preference Optimization"

The paper "Adaptive Decoding via Latent Preference Optimization" proposes a novel approach to address the challenges associated with selecting the optimal decoding temperature during LLM (LM) inference. The authors introduce Adaptive Decoding, which leverages a learnable component called the ADAPTIVE DECODER to dynamically adjust the temperature for sampling, whether at the sequence or token level. This method is underpinned by Latent Preference Optimization (LPO), a technique for training discrete latent variables.

Key Contributions and Approach

The central contribution of this paper is the introduction of the ADAPTIVE DECODER, a neural module interfaced with a LLM's final layers. This module computes a probability distribution over a predefined set of temperature values. Rather than using a static temperature for all task instances, the ADAPTIVE DECODER determines a task-specific temperature, thus optimizing the trade-off between creativity and factuality in the generated text. Training this module involves LPO, which derives optimal temperature settings from preference pairs established through different response evaluations.

The paper's methodology involves evaluating the ADAPTIVE DECODER across a suite of tasks that traditionally benefit from varying temperature settings. The tasks addressed include math problem solving (GSM8K), story generation (Stories), and a mixed-task dataset (UltraFeedback). Each task has distinct requirements: accurate deterministic responses for math tasks and diverse, imaginative outputs for creative story writing. The adaptive approach demonstrates superior performance over fixed temperature methods across all tasks.

Numerical Results and Observations

Quantitative analyses presented in the paper highlight Adaptive Decoding's proficiency in both single-task and multi-task scenarios. In experiments using the GSM8K dataset, the ADAPTIVE DECODER achieves an accuracy matching or surpassing the best fixed temperature models, with noticeable improvements in creative tasks where diversity is advantageous. Furthermore, through self-consistency benchmarks on tasks that aggregate several outputs (majority voting), the adaptive model procures superior outcomes by judiciously sampling temperatures for the reasoning chains.

The paper of constrained creative writing prompts demonstrates the decoder's ability to navigate tasks that require alternating between strict and lenient generation rules. ADAPTIVE DECODER tok, at a token level, learns to deploy minimal temperatures on constraint macrons while allowing creative freedom elsewhere, effectively balancing the dual demands.

Implications and Future Developments

The introduction of LPO and its application in training the ADAPTIVE DECODER present significant implications for the field. By integrating task-specific contextual understanding into hyperparameter adjustment, LPO opens pathways for enhancing adaptability in varied task environments found in large-scale models. Moving forward, this strategy may prove beneficial in adjusting other discrete hyperparameters like top-k or top-p, fundamentally altering hyperparameter tuning practices within LLM architectures.

The implications of this research potentially extend to enhancing LLMs' applicability in real-world, dynamic settings where user prompts may widely vary. Subsequent research could focus on refining LPO for broader classes of latent variables or integrating an expanded set of linguistic tasks to further generalize the adaptive methodology beyond what's explored in the current paper.

By advancing the capability to auto-tune model inference parameters dynamically, this work contributes to a more flexible, user-centric approach to deploying LLMs across diverse domains, thus marking a pivotal step in AI-driven text generation technology.

PDF Markdown

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (7)

Tweets

https://twitter.com/GptMaestro/status/1863199129784185264