Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
43 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PANDA: Preference Adaptation for Enhancing Domain-Specific Abilities of LLMs (2402.12835v2)

Published 20 Feb 2024 in cs.CL and cs.AI

Abstract: While LLMs have demonstrated considerable capabilities across various natural language tasks, they often fall short of the performance achieved by domain-specific state-of-the-art models. One potential approach to enhance domain-specific capabilities of LLMs involves fine-tuning them using corresponding datasets. However, this method can be both resource and time-intensive, and not applicable to closed-source commercial LLMs. In this paper, we propose Preference Adaptation for Enhancing Domain-specific Abilities of LLMs (PANDA), a method designed to augment the domain-specific capabilities of LLMs by leveraging insights from the response preference of expert models without requiring fine-tuning. Our experimental results reveal that PANDA significantly enhances the domain-specific ability of LLMs on text classification and interactive decision tasks. Moreover, LLM with PANDA even outperforms the expert model that being learned on 4 tasks of ScienceWorld. This finding highlights the potential of exploring tuning-free approaches to achieve weak-to-strong generalization.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691.
  2. Rest meets react: Self-improvement for multi-step reasoning llm agent. arXiv preprint arXiv:2312.10003.
  3. Tweeteval: Unified benchmark and comparative evaluation for tweet classification. arXiv preprint arXiv:2010.12421.
  4. Language models are few-shot learners. CoRR, abs/2005.14165.
  5. Weak-to-strong generalization: Eliciting strong capabilities with weak supervision. arXiv preprint arXiv:2312.09390.
  6. Self-play fine-tuning converts weak language models to strong language models. arXiv preprint arXiv:2401.01335.
  7. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997.
  8. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805.
  9. Knowledge distillation of large language models. arXiv preprint arXiv:2306.08543.
  10. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
  11. Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes. arXiv preprint arXiv:2305.02301.
  12. In-context learning distillation: Transferring few-shot learning ability of pre-trained language models. arXiv preprint arXiv:2212.10670.
  13. Mixtral of experts. arXiv preprint arXiv:2401.04088.
  14. Chatgpt: Jack of all trades, master of none. Information Fusion, page 101861.
  15. Chatgpt beyond english: Towards a comprehensive evaluation of large language models in multilingual learning. arXiv preprint arXiv:2304.05613.
  16. Swiftsage: A generative agent with fast and slow thinking for complex interactive tasks. arXiv preprint arXiv:2305.17390.
  17. Tuning language models by proxy. arXiv preprint arXiv:2401.08565.
  18. Inference-time policy adapters (ipa): Tailoring extreme-scale lms without fine-tuning. arXiv preprint arXiv:2305.15065.
  19. OpenAI. 2023. GPT-4 technical report. CoRR, abs/2303.08774.
  20. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems.
  21. Fine-tuning aligned language models compromises safety, even when users do not intend to! arXiv preprint arXiv:2310.03693.
  22. Autoact: Automatic agent learning from scratch via self-planning. arXiv preprint arXiv:2401.05268.
  23. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
  24. Reflexion: Language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems.
  25. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  26. Small language models improve giants by rewriting their outputs. arXiv preprint arXiv:2305.13514.
  27. Danqing Wang and Lei Li. 2023. Learning from mistakes via cooperative study assistant for large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10667–10685.
  28. Scienceworld: Is your agent smarter than a 5th grader? arXiv preprint arXiv:2203.07540.
  29. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  30. Small models are valuable plug-ins for large language models. arXiv preprint arXiv:2305.08848.
  31. Large language models as optimizers. arXiv preprint arXiv:2309.03409.
  32. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.
  33. Retroformer: Retrospective large language agents with policy gradient optimization. arXiv preprint arXiv:2308.02151.
  34. Self-rewarding language models. arXiv preprint arXiv:2401.10020.
  35. Can chatgpt understand too? a comparative study on chatgpt and fine-tuned bert. arXiv preprint arXiv:2302.10198.
  36. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets