Generalizing ResAdapt beyond video resizing
Establish whether extending the ResAdapt training mixture to jointly include image and video data and implementing alternative pre-encoding visual budget operators, particularly hard frame selection, can generalize the learned input-side allocation policy beyond continuous resizing and yield consistent efficiency-preserving performance on image-centric benchmarks.
References
Extending the training mixture to image–video data and exploring alternative operators, such as hard frame selection, remain open problems.
— ResAdapt: Adaptive Resolution for Efficient Multimodal Reasoning
(2603.28610 - Liao et al., 30 Mar 2026) in Limitations and Future Work, item (iii)