Tightness of the estimated length–performance Pareto frontier

Ascertain whether the inability of the combined parametric-and-data strategy—consisting of encoding a small subset of training batches and training per-batch scalar weights to reweight stored gradient updates—to improve the program-length versus performance Pareto frontier indicates that the currently estimated frontier is tight for the analysed settings (adapting SmolLM3 3B and OLMo3 7B to GSM8K, FLORES English–French translation, and IFEval), or whether alternative combined strategies can yield shorter programs achieving equal or better performance.

Background

The paper defines task complexity as the length of the shortest program achieving a target performance on a task and estimates Pareto curves of program length versus performance for several tasks (GSM8K, FLORES English–French, and IFEval) and models (SmolLM3 3B and OLMo3 7B).

To potentially tighten these Pareto frontiers, the authors propose a combined approach that merges the data and parametric views by encoding a small subset of training batches, storing per-batch gradient updates, and learning scalar weights that modulate these updates during inference. Despite extensive experimentation, this strategy did not improve the existing Pareto frontier, leaving open whether the frontier is intrinsically tight or whether more effective combined strategies could surpass it.

References

After extensive experimentation, we concluded that it is not straightforward to push the Pareto frontier with this strategy. It remains unclear whether this indicates the tightness of our Pareto frontier, or just the lack of success of this specific approach. We release these results to encourage future research to target this question.

Operationalising the Superficial Alignment Hypothesis via Task Complexity  (2602.15829 - Vergara-Browne et al., 17 Feb 2026) in Appendix, Section "Combining the Parametric and the Data View"