How can chain-of-thought supervision be applied to unstructured tasks like story-writing?

Determine how to provide effective chain-of-thought supervision for unstructured tasks such as story-writing in order to circumvent teacher-forcing-related failures identified for lookahead tasks.

Background

The paper shows that reversing targets or otherwise providing guidance akin to chain-of-thought can mitigate the identified failures of teacher-forcing on the path-star task. This aligns with theoretical results suggesting that intermediate reasoning steps can make certain problems learnable.

However, many real-world tasks—especially creative or unstructured ones like story-writing—do not admit simple, explicit intermediate supervision. The authors highlight uncertainty about how to extend chain-of-thought style supervision to such settings, leaving a practical gap for applying these insights beyond structured problems.

References

However, it is unclear how that is possible in more unstructured tasks like story-writing.

The pitfalls of next-token prediction  (2403.06963 - Bachmann et al., 2024) in Section: Conclusion