Decoder capacity sufficiency in masked autoencoders
Determine whether the masked autoencoder (MAE) decoder has sufficient capacity for pixel regression and assess whether insufficient decoder capacity causes the encoder’s later blocks to prioritize low-level detail modeling at the expense of high-level semantic representation quality; empirically validate whether increasing decoder depth mitigates this issue.
Sponsor
References
We conjecture that the decoder lacks sufficient capacity for pixel regression.
— In Pursuit of Pixel Supervision for Visual Pre-training
(2512.15715 - Yang et al., 17 Dec 2025) in Section 3.2, MAE Redesign – Deeper decoder