Scalability of meta-training DataRater with fully dense inner updates at extreme model scales
Determine whether meta-training the DataRater model via meta-gradients can scale to extremely large foundation models when the inner model updates are fully dense, and, if necessary, develop scalable bilevel optimisation methods that enable such meta-training at these scales.
References
However, the scalability of meta-training DataRater models for extremely large foundation models with fully dense inner updates remains an open question, and may require further algorithmic advancements in scalable bilevel optimisation.
— DataRater: Meta-Learned Dataset Curation
(2505.17895 - Calian et al., 23 May 2025) in Appendix: Limitations