Dice Question Streamline Icon: https://streamlinehq.com

Can LLMs acquire genuinely new reasoning strategies beyond pre/post-training?

Determine whether large language models can acquire or generalize genuinely new reasoning strategies beyond the sharpened skills encoded during pre-training or post-training.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper introduces DELTA-Code, a controlled benchmark designed to test whether reinforcement learning can instill procedures that pretrained models cannot execute (learnability) and whether such procedures transfer to out-of-distribution cases (generalization). This setup directly targets the long-standing debate over whether RL simply sharpens existing heuristics or enables genuinely new problem-solving capabilities.

By isolating problem families and providing dense, verifiable rewards (especially in coding tasks), DELTA aims to clarify the conditions under which LLMs might discover new strategies rather than relying on priors encoded during pre-training or standard post-training.

References

It remains an open question whether LLMs can acquire or generalize genuinely new reasoning strategies, beyond the sharpened skills encoded in their parameters during pre-training or post-training.

DELTA-Code: How Does RL Unlock and Transfer New Programming Algorithms in LLMs? (2509.21016 - Sun et al., 25 Sep 2025) in Abstract (page 1)