Necessity of explicit compositional incentives in RL

Determine whether explicit compositional incentives in reinforcement learning training objectives are necessary for large language models to acquire compositional skills, in contrast to RL conducted solely on atomic tasks without such incentives.

Background

Prior works report mixed outcomes: in-context approaches can yield compositional improvements, while RL applied only to atomic skills fails to generalize compositionally. The authors suggest that RL must be explicitly incentivized to perform composition to teach this new skill.

Their synthetic setup separates atomic from compositional training stages and shows RL on compositional data enables generalization to deeper compositions, motivating a definitive assessment of whether explicit compositional incentives are required in RL.

References

Comparing the two works, we conjecture that an explicit incentive to composition is necessary.

— From $f(x)$ and $g(x)$ to $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones (2509.25123 - Yuan et al., 29 Sep 2025) in Section 2, Background

Necessity of explicit compositional incentives in RL

Sponsor

Background

References

Related Problems