Finding effective configurations for alternative masked image modeling designs
Identify optimal hyperparameter and architectural configurations for the explored alternative masked image modeling approaches that consistently improve downstream performance, including but not limited to multi-block masking, hybrid masking ratios, hybrid masking granularity, applying Koleo loss to class tokens, decoder cross-attention, reconstruction losses on visible patches, partial masked patch selection, and feeding multi-stage encoder features to the decoder.
Sponsor
References
While some above explored alternatives may indeed be viable, we were unable to identify optimal configurations that consistently improved performance.
— In Pursuit of Pixel Supervision for Visual Pre-training
(2512.15715 - Yang et al., 17 Dec 2025) in Supplementary, Section: Failure Attempts, Limitations, and Future Directions – Subsection: Failure Attempts