Unconstrained or latent reasoning vs. tag-constrained formatting
Determine whether removing the explicit <think></think> and <answer></answer> tag-format constraints used in Logic-RL and adopting an entirely unconstrained or latent reasoning representation yields better results than the current format-constrained approach.
References
Although > \ldots effectively organizes the chain of thought, it remains an open question whether an entirely unconstrained or latent approach might yield better results.
— Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
(2502.14768 - Xie et al., 20 Feb 2025) in Discussion and Future Work, Relaxing the Formatting Constraints