Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 163 tok/s

Gemini 2.5 Pro 47 tok/s Pro

GPT-5 Medium 32 tok/s Pro

GPT-5 High 36 tok/s Pro

GPT-4o 95 tok/s Pro

Kimi K2 206 tok/s Pro

GPT OSS 120B 459 tok/s Pro

Claude Sonnet 4.5 38 tok/s Pro

2000 character limit reached

Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs (2410.18451v1)

Published 24 Oct 2024 in cs.AI and cs.CL

Abstract: In this report, we introduce a collection of methods to enhance reward modeling for LLMs, focusing specifically on data-centric techniques. We propose effective data selection and filtering strategies for curating high-quality open-source preference datasets, culminating in the Skywork-Reward data collection, which contains only 80K preference pairs -- significantly smaller than existing datasets. Using this curated dataset, we developed the Skywork-Reward model series -- Skywork-Reward-Gemma-27B and Skywork-Reward-Llama-3.1-8B -- with the former currently holding the top position on the RewardBench leaderboard. Notably, our techniques and datasets have directly enhanced the performance of many top-ranked models on RewardBench, highlighting the practical impact of our contributions in real-world preference learning applications.

References (54)