Dice Question Streamline Icon: https://streamlinehq.com

Transformative schema creation for RL-trained LLMs

Establish whether reinforcement learning can enable large language models to achieve transformative generalization by creating new solution schemas—"schema creation"—for qualitatively novel cases, such as discovering invariants needed to solve perfectly periodic or degenerate dynamics in the BouncingSim benchmark.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper evaluates three axes of generalization—exploratory, compositional, and transformative—on BouncingSim. While RL training yields strong transfer within families and in compositional settings, performance on transformative cases remains near zero, indicating difficulty with qualitatively new dynamics that require novel invariants or solution schemas.

This persistent gap suggests that, unlike structural composition, the creation of new reasoning schemas for transformative shifts is still unresolved, motivating targeted methods capable of discovering and applying such schemas.

References

Coding tasks appear more amenable to structural composition than symbolic math, yet transformative 'schema creation' remains an open challenge.

DELTA-Code: How Does RL Unlock and Transfer New Programming Algorithms in LLMs? (2509.21016 - Sun et al., 25 Sep 2025) in Section 5 (Generalization Study), Takeaways