Extrapolating optimal dynamic data mixtures and quantifying trade-offs in multitask SFT
Develop a systematic methodology to extrapolate optimal dynamic data mixture schedules (i.e., time-varying dataset ratios during training) for multitask supervised fine-tuning of multimodal large language models, and quantify the computational costs and downstream performance gains of such dynamic schedules relative to fixed data mixture ratios.
References
We have yet to establish a systematic methodology for extrapolating optimal dynamic mixtures or quantify the computational costs and performance gains relative to fixed data mixture.
— DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile Phone Agents
(2510.19336 - Shi et al., 22 Oct 2025) in Section: Limitations and Future work (Appendix)