Extrapolating optimal dynamic data mixtures and quantifying trade-offs in multitask SFT

Develop a systematic methodology to extrapolate optimal dynamic data mixture schedules (i.e., time-varying dataset ratios during training) for multitask supervised fine-tuning of multimodal large language models, and quantify the computational costs and downstream performance gains of such dynamic schedules relative to fixed data mixture ratios.

Background

The paper proposes DaMo, which predicts downstream performance to select fixed data mixture ratios for multitask supervised fine-tuning of multimodal LLMs. The methodology and experiments are conducted under two simplifying assumptions: ignoring sample order within datasets and keeping the data mixture fixed throughout training.

In the Limitations and Future work section, the authors note that preliminary attempts to relax the fixed-mixture assumption via dynamic mixture adjustments remain exploratory. They explicitly state that they have not yet established a systematic approach for extrapolating optimal dynamic mixtures or quantified the compute–performance trade-offs versus fixed mixtures, highlighting an unresolved methodological and empirical gap.

References

We have yet to establish a systematic methodology for extrapolating optimal dynamic mixtures or quantify the computational costs and performance gains relative to fixed data mixture.

DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile Phone Agents (2510.19336 - Shi et al., 22 Oct 2025) in Section: Limitations and Future work (Appendix)