Dynamic goalpost updating in Guided Asymmetric Self-Play (GASP)
Determine how to update the set of hard goalpost questions used to guide the teacher in Guided Asymmetric Self-Play (GASP) for code generation once a goalpost question is reached by the student, so that the guidance continues to remain meaningful as the model improves.
References
Finally, our framework uses a fixed set of goalposts. An open question is what should happen once a goalpost is reached: ideally, the goalpost set should be updated over time so that guidance remains meaningful as the model improves. We leave dynamic goalpost updating to future work.
— GASP: Guided Asymmetric Self-Play For Coding LLMs
(2603.15957 - Jana et al., 16 Mar 2026) in Discussion — Limitations