Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 26 tok/s Pro
GPT-4o 64 tok/s Pro
Kimi K2 185 tok/s Pro
GPT OSS 120B 442 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Inference time LLM alignment in single and multidomain preference spectrum (2410.19206v1)

Published 24 Oct 2024 in cs.LG and cs.CL

Abstract: Aligning LLMs (LLM) to address subjectivity and nuanced preference levels requires adequate flexibility and control, which can be a resource-intensive and time-consuming procedure. Existing training-time alignment methods require full re-training when a change is needed and inference-time ones typically require access to the reward model at each inference step. To address these limitations, we introduce inference-time model alignment method that learns encoded representations of preference dimensions, called \textit{Alignment Vectors} (AV). These representations are computed by subtraction of the base model from the aligned model as in model editing enabling dynamically adjusting the model behavior during inference through simple linear operations. Even though the preference dimensions can span various granularity levels, here we focus on three gradual response levels across three specialized domains: medical, legal, and financial, exemplifying its practical potential. This new alignment paradigm introduces adjustable preference knobs during inference, allowing users to tailor their LLM outputs while reducing the inference cost by half compared to the prompt engineering approach. Additionally, we find that AVs are transferable across different fine-tuning stages of the same model, demonstrating their flexibility. AVs also facilitate multidomain, diverse preference alignment, making the process 12x faster than the retraining approach.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Anthropic. Introducing the next generation of claude: The claude 3 family, 2024. URL https://www.anthropic.com/news/claude-3-family. Accessed: 2024-09-10.
  2. A general theoretical paradigm to understand learning from human preferences. pp.  4447–4455. PMLR, 2024.
  3. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022.
  4. Scaling synthetic data creation with 1,000,000,000 personas. arXiv preprint arXiv:2406.20094, 2024.
  5. Jacob Cohen. A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1):37–46, 1960.
  6. Steerlm: Attribute conditioned sft as an (user-steerable) alternative to rlhf. arXiv preprint arXiv:2310.05344, 2023.
  7. Ethos: Rectifying language models in orthogonal parameter space. arXiv preprint arXiv:2403.08994, 2024.
  8. Legalbench: A collaboratively built benchmark for measuring legal reasoning in large language models. Advances in Neural Information Processing Systems, 36, 2024.
  9. Controllable preference optimization: Toward controllable multi-objective alignment. arXiv preprint arXiv:2402.19085, 2024.
  10. Finbert: A large language model for extracting information from financial text. Contemporary Accounting Research, 40(2):806–841, 2023.
  11. Deal: Decoding-time alignment for large language models. arXiv preprint arXiv:2402.06147, 2024.
  12. Editing models with task arithmetic. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=6t0Kwf8-jrj.
  13. Beavertails: Towards improved safety alignment of llm via a human-preference dataset. Advances in Neural Information Processing Systems, 36, 2024.
  14. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
  15. Inference-time intervention: Eliciting truthful answers from a language model. Advances in Neural Information Processing Systems, 36, 2024.
  16. Prompt distillation for efficient llm-based recommendation. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pp.  1348–1357, 2023.
  17. Decoding-time realignment of language models. arXiv preprint arXiv:2402.02992, 2024.
  18. Bertalan Meskó. Prompt engineering as an important emerging skill for medical professionals: tutorial. Journal of medical Internet research, 25:e50638, 2023.
  19. Prompting ai art: An investigation into the creative skill of prompt engineering. arXiv preprint arXiv:2303.13534, 2023.
  20. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744, 2022.
  21. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  22. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36, 2024.
  23. A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv preprint arXiv:2402.07927, 2024.
  24. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33:3008–3021, 2020.
  25. Inferaligner: Inference-time alignment for harmlessness through cross-model guidance. arXiv preprint arXiv:2401.11206, 2024.
  26. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022.
  27. Ties-merging: Resolving interference when merging models. Advances in Neural Information Processing Systems, 36, 2024.
  28. Large language models in health care: Development, applications, and challenges. Health Care Science, 2(4):255–263, 2023.
  29. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36, 2024.

Summary

  • The paper presents a novel method using alignment vectors to adjust LLM outputs at inference, reducing retraining overhead by 12×.
  • The approach enables fine-grained control over model responses in domains like medical, legal, and financial with measurable preference accuracy.
  • The study outlines a multidomain alignment strategy that maintains distinct domain-specific outputs while addressing cross-domain generalization challenges.

Inference Time LLM Alignment in Single and Multidomain Preference Spectrum

The paper presents an innovative approach to aligning LLMs at inference time using Alignment Vectors (AVs), specifically targeting preference spectrum adjustments across various domains. Traditional alignment methods require significant resources and time, as they necessitate model retraining each time a preference is modified. The approach introduced in this paper, however, performs alignment at inference time, significantly reducing these burdens.

Methodological Overview

The core innovation of this work revolves around the concept of AVs, which are obtained by subtracting the base model parameters from those of an aligned model post-fine-tuning. This allows for inference-time behavior adjustments through mathematical operations on these vectors. The AVs afford fine-grained control over model outputs by acting as tunable parameters. This is particularly explored in three specialized domains: medical, legal, and financial. Users can dynamically adjust the model's output by shifting these vectors in the parameter space, demonstrating an effective way to modify model behavior without additional training.

Empirical results show that by manipulating the AVs, significant control over response specificity can be achieved according to the desired level of expertise—ranging from avoidance to expert—with the model's preference accuracy for these behaviors transitioning smoothly as expected. The research suggests that this approach is 12 times faster than traditional methods, drastically cutting the computational cost and resource usage.

Domain-Specific Results

In the detailed experiments, results are presented for single-domain preference tuning across medical, legal, and financial queries. The inference-time model editing allows for tailored response behaviors just by adjusting the proportional integration of the AVs into the base model. Quantitatively, this is evident through preference accuracy indicators that align with the desired response spectrum for each domain. The capability to switch between different expectation levels, such as from avoidance to expert insights, highlights the flexibility and potential application of this approach.

Multidomain Challenges and Strategies

Addressing the multidomain alignment challenge, the research proposes integrating multiple AVs even at the inference stage to achieve diverse domain-specific preferences without separate training per combination. This approach makes it feasible to maintain distinct preferences across multiple domains contemporaneously, although the complexity involved in tuning multiple variables concurrently is acknowledged.

Interestingly, the results underline the intrinsic generalization effect: once a model is aligned in one domain, there seems to be an extension of this alignment across others. This is perceived both as a benefit for applications requiring coherent behavior and as a cautionary note emphasizing the need for precise tuning to avoid unintended cross-domain generalizations.

Implications and Future Research

The implications of this paper are substantial in the context of customizable AI, where user-specific and context-specific adaptations can be implemented at scale, conserving resources, and expediting deployment. The innovative use of AVs not only promises a novel technical extension to current model alignment methodologies but also lays a foundation for more interactive and adaptive AI systems that can seamlessly shift according to nuanced user requirements.

This work opens avenues for further research into various methodologies for obtaining and applying alignment vectors and expanding the applicability across contrasting model architectures or domains with different inherent complexities. Additionally, future exploration could refine multidomain alignment techniques further and address the potential limitations of over-generalization.

In summary, this approach highlights a significant step forward in LLM preference alignment by effectively utilizing inference-time techniques, prompting both theoretical and computational advancements in the field of AI alignment and customization.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 8 likes.

Upgrade to Pro to view all of the tweets about this paper: