Effectiveness of IRT-Based Difficulty Curricula for Cross-Difficulty Generalization

Investigate whether curriculum learning strategies that order or sample training data by Item Response Theory–derived difficulty scores can improve cross-difficulty generalization in large language models.

Background

The paper finds that cross-difficulty generalization is limited and declines as the gap between training and test difficulty grows. This suggests that naive training on only easy or only hard data is insufficient for robust transfer across difficulty levels.

Given these findings, the authors highlight the need to explore curricula explicitly structured by model-based difficulty, such as IRT scores, to potentially enhance generalization across disparate difficulty bins.

References

An open question remains whether a curriculum structured by model-based difficulty, such as IRT scores, can lead to cross-difficulty generalization.

— Revisiting Generalization Across Difficulty Levels: It's Not So Easy (2511.21692 - Kordi et al., 26 Nov 2025) in Discussion

Effectiveness of IRT-Based Difficulty Curricula for Cross-Difficulty Generalization

Background

References

Related Problems