Tightness of the estimated length–performance Pareto frontier
Ascertain whether the inability of the combined parametric-and-data strategy—consisting of encoding a small subset of training batches and training per-batch scalar weights to reweight stored gradient updates—to improve the program-length versus performance Pareto frontier indicates that the currently estimated frontier is tight for the analysed settings (adapting SmolLM3 3B and OLMo3 7B to GSM8K, FLORES English–French translation, and IFEval), or whether alternative combined strategies can yield shorter programs achieving equal or better performance.
References
After extensive experimentation, we concluded that it is not straightforward to push the Pareto frontier with this strategy. It remains unclear whether this indicates the tightness of our Pareto frontier, or just the lack of success of this specific approach. We release these results to encourage future research to target this question.