Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Marco-o1 v2: Towards Widening The Distillation Bottleneck for Reasoning Models (2503.01461v2)

Published 3 Mar 2025 in cs.LG, cs.AI, and cs.CL

Abstract: Large Reasoning Models(LRMs) such as OpenAI o1 and DeepSeek-R1 have shown remarkable reasoning capabilities by scaling test-time compute and generating long Chain-of-Thought(CoT). Distillation--post-training on LRMs-generated data--is a straightforward yet effective method to enhance the reasoning abilities of smaller models, but faces a critical bottleneck: we found that distilled long CoT data poses learning difficulty for small models and leads to the inheritance of biases (i.e. over-thinking) when using Supervised Fine-tuning (SFT) and Reinforcement Learning (RL) methods. To alleviate this bottleneck, we propose constructing tree-based CoT data from scratch via Monte Carlo Tree Search(MCTS). We then exploit a set of CoT-aware approaches, including Thoughts Length Balance, Fine-grained DPO, and Joint Post-training Objective, to enhance SFT and RL on the constructed data. We conduct evaluation on various benchmarks such as math (GSM8K, MATH, AIME). instruction-following (Multi-IF) and planning (Blocksworld), results demonstrate our approaches substantially improve the reasoning performance of distilled models compared to standard distilled models via reducing the hallucinations in long-time thinking. The project homepage is https://github.com/AIDC-AI/Marco-o1.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Huifeng Yin (8 papers)
  2. Yu Zhao (208 papers)
  3. Minghao Wu (31 papers)
  4. Xuanfan Ni (5 papers)
  5. Bo Zeng (41 papers)
  6. Hao Wang (1120 papers)
  7. Tianqi Shi (9 papers)
  8. Liangying Shao (6 papers)
  9. Chenyang Lyu (44 papers)
  10. Longyue Wang (87 papers)
  11. Weihua Luo (63 papers)
  12. Kaifu Zhang (28 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com