Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 163 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 36 tok/s Pro
GPT-4o 95 tok/s Pro
Kimi K2 206 tok/s Pro
GPT OSS 120B 459 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs (2410.18451v1)

Published 24 Oct 2024 in cs.AI and cs.CL

Abstract: In this report, we introduce a collection of methods to enhance reward modeling for LLMs, focusing specifically on data-centric techniques. We propose effective data selection and filtering strategies for curating high-quality open-source preference datasets, culminating in the Skywork-Reward data collection, which contains only 80K preference pairs -- significantly smaller than existing datasets. Using this curated dataset, we developed the Skywork-Reward model series -- Skywork-Reward-Gemma-27B and Skywork-Reward-Llama-3.1-8B -- with the former currently holding the top position on the RewardBench leaderboard. Notably, our techniques and datasets have directly enhanced the performance of many top-ranked models on RewardBench, highlighting the practical impact of our contributions in real-world preference learning applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. Nemotron-4 340b technical report. arXiv preprint arXiv:2406.11704, 2024.
  3. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022.
  4. Stable lm 2 1.6 b technical report. arXiv preprint arXiv:2402.17834, 2024.
  5. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345, 1952.
  6. Internlm2 technical report. arXiv preprint arXiv:2403.17297, 2024.
  7. The dangers of inference using the bradley-terry model. The Annals of Statistics, 38(3):1491–1514, 2010.
  8. Open problems and fundamental limitations of reinforcement learning from human feedback. arXiv preprint arXiv:2307.15217, 2023.
  9. Ultrafeedback: Boosting language models with high-quality feedback. arXiv preprint arXiv:2310.01377, 2023.
  10. L. Daniele and Suphavadeeprasit. Amplify-instruct: Synthetically generated diverse multi-turn conversations for effecient llm training. arXiv preprint arXiv:(coming soon), 2023. URL https://huggingface.co/datasets/LDJnr/Capybara.
  11. Rlhf workflow: From reward modeling to online rlhf. arXiv preprint arXiv:2405.07863, 2024.
  12. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024.
  13. Understanding dataset difficulty with 𝒱𝒱\mathcal{V}caligraphic_V-usable information. In International Conference on Machine Learning, pages 5988–6008. PMLR, 2022.
  14. The Elements of Statistical Learning. Springer, 2001.
  15. Scaling laws for reward model overoptimization. In International Conference on Machine Learning, pages 10835–10866. PMLR, 2023.
  16. Deep Learning. MIT press, 2016.
  17. Wildguard: Open one-stop moderation tools for safety risks, jailbreaks, and refusals of llms. arXiv preprint arXiv:2406.18495, 2024.
  18. Camels in a changing climate: Enhancing lm adaptation with tulu 2, 2023.
  19. Beavertails: Towards improved safety alignment of llm via a human-preference dataset. Advances in Neural Information Processing Systems, 36, 2024.
  20. Llm-blender: Ensembling large language models with pairwise ranking and generative fusion. arXiv preprint arXiv:2306.02561, 2023.
  21. Wildteaming at scale: From in-the-wild jailbreaks to (adversarially) safer language models. arXiv preprint arXiv:2406.18510, 2024.
  22. Supervised contrastive learning. Advances in neural information processing systems, 33:18661–18673, 2020.
  23. Rewardbench: Evaluating reward models for language modeling. arXiv preprint arXiv:2403.13787, 2024.
  24. Openorca: An open dataset of gpt augmented flan reasoning traces. https://https://huggingface.co/Open-Orca/OpenOrca, 2023.
  25. T. Lin. Focal loss for dense object detection. arXiv preprint arXiv:1708.02002, 2017.
  26. Uncertainty-aware reward model: Teaching reward models to know what is unknown. arXiv preprint arXiv:2410.00847, 2024.
  27. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744, 2022.
  28. Offsetbias: Leveraging debiased data for tuning evaluators. arXiv preprint arXiv:2407.06551, 2024.
  29. From r𝑟ritalic_r to q∗superscript𝑞q^{*}italic_q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT: Your language model is secretly a q-function. arXiv preprint arXiv:2404.12358, 2024a.
  30. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36, 2024b.
  31. RyokoAI. ShareGPT52K Dataset. https://huggingface.co/datasets/RyokoAI/ShareGPT52K, 2023.
  32. Do user preferences and evaluation measures line up? In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 555–562, 2010.
  33. Kernel methods for pattern analysis. In Proceedings of the IEEE, volume 12, pages 406–417. IEEE, 2001.
  34. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  35. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
  36. G. Team. Gemma. 2024. 10.34740/KAGGLE/M/3301. URL https://www.kaggle.com/m/3301.
  37. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
  38. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. in arxiv [cs. cl]. arxiv, 2024a.
  39. Gemma 2: Improving open language models at a practical size. arXiv preprint arXiv:2408.00118, 2024b.
  40. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  41. Arithmetic control of llms for diverse user preferences: Directional preference alignment with multi-objective rewards. In ACL, 2024a.
  42. Interpretable preferences via multi-objective reward modeling and mixture-of-experts. In EMNLP, 2024b.
  43. Direct judgement preference optimization. arXiv preprint arXiv:2409.14664, 2024c.
  44. Self-taught evaluators. arXiv preprint arXiv:2408.02666, 2024d.
  45. Helpsteer: Multi-attribute helpfulness dataset for steerlm. arXiv preprint arXiv:2311.09528, 2023.
  46. Helpsteer2: Open-source dataset for training top-performing reward models. arXiv preprint arXiv:2406.08673, 2024e.
  47. Metametrics: Calibrating metrics for generation tasks using human preferences. arXiv preprint arXiv:2410.02381, 2024.
  48. Magpie: Alignment data synthesis from scratch by prompting aligned llms with nothing. arXiv preprint arXiv:2406.08464, 2024.
  49. Regularizing hidden states enables learning generalizable reward model for llms. arXiv preprint arXiv:2406.10216, 2024.
  50. Advancing llm reasoning generalists with preference trees. arXiv preprint arXiv:2404.02078, 2024.
  51. Evaluating large language models at evaluating instruction following. arXiv preprint arXiv:2310.07641, 2023.
  52. General preference modeling with preference representations for aligning language models. arXiv preprint arXiv:2410.02197, 2024.
  53. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36:46595–46623, 2023.
  54. Starling-7b: Improving llm helpfulness & harmlessness with rlaif, November 2023.
Citations (7)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 4 tweets and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper: