Joint Learning of Raters and Curriculum Schedule

Develop a bilevel optimization procedure that jointly learns the capability-specific rater functions and the curriculum schedule parameters within the SkillRater framework, replacing the current manually designed schedule.

Background

SkillRater currently uses a manually designed curriculum that progressively tightens selection thresholds, which outperforms static filtering but is not learned from data. The raters themselves are meta-learned, but the schedule is not.

A joint optimization of raters and curriculum parameters could better coordinate data selection dynamics across training stages, potentially improving efficiency and performance. The authors flag this as an open direction rather than providing a concrete method.

References

Several directions remain open. Second, the curriculum schedule is manually designed. Jointly learning the raters and the schedule through bilevel optimization over curriculum parameters is a natural extension.

SkillRater: Untangling Capabilities in Multimodal Data  (2602.11615 - Sahi et al., 12 Feb 2026) in Section: Conclusion and Future Work