Capacity and generalization of smaller models with Scaling on Scales (S2)
Establish whether smaller vision models augmented with Scaling on Scales (S2)—a technique that runs a pre-trained and frozen backbone on multiple image scales—possess capacity at least comparable to larger models, and determine whether pre-training such smaller models with S2 enables them to achieve generalization performance comparable to or exceeding that of larger models.
Sponsor
References
Given that most of the representation larger models have learned is also learned by multi-scale smaller models, we conjecture smaller models with S$2$ scaling have at least similar capacity as larger models. Since larger capacity allows memorizing more rare and atypical instances during pre-training when given sufficient data and thus improves generalization error, we further speculate smaller models can achieve similar or even better generalizability than larger models if pre-trained with S$2$ scaling as well.