Dice Question Streamline Icon: https://streamlinehq.com

Extrapolating optimal dynamic data mixtures and quantifying trade-offs in multitask SFT

Develop a systematic methodology to extrapolate optimal dynamic data mixture schedules (i.e., time-varying dataset ratios during training) for multitask supervised fine-tuning of multimodal large language models, and quantify the computational costs and downstream performance gains of such dynamic schedules relative to fixed data mixture ratios.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper proposes DaMo, which predicts downstream performance to select fixed data mixture ratios for multitask supervised fine-tuning of multimodal LLMs. The methodology and experiments are conducted under two simplifying assumptions: ignoring sample order within datasets and keeping the data mixture fixed throughout training.

In the Limitations and Future work section, the authors note that preliminary attempts to relax the fixed-mixture assumption via dynamic mixture adjustments remain exploratory. They explicitly state that they have not yet established a systematic approach for extrapolating optimal dynamic mixtures or quantified the compute–performance trade-offs versus fixed mixtures, highlighting an unresolved methodological and empirical gap.

References

We have yet to establish a systematic methodology for extrapolating optimal dynamic mixtures or quantify the computational costs and performance gains relative to fixed data mixture.

DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile Phone Agents (2510.19336 - Shi et al., 22 Oct 2025) in Section: Limitations and Future work (Appendix)