Evaluating The Robustness of Self-Supervised Representations to Background/Foreground Removal
Abstract: Despite impressive empirical advances of SSL in solving various tasks, the problem of understanding and characterizing SSL representations learned from input data remains relatively under-explored. We provide a comparative analysis of how the representations produced by SSL models differ when masking parts of the input. Specifically, we considered state-of-the-art SSL pretrained models, such as DINOv2, MAE, and SwaV, and analyzed changes at the representation levels across 4 Image Classification datasets. First, we generate variations of the datasets by applying foreground and background segmentation. Then, we conduct statistical analysis using Canonical Correlation Analysis (CCA) and Centered Kernel Alignment (CKA) to evaluate the robustness of the representations learned in SSL models. Empirically, we show that not all models lead to representations that separate foreground, background, and complete images. Furthermore, we test different masking strategies by occluding the center regions of the images to address cases where foreground and background are difficult. For example, the DTD dataset that focuses on texture rather specific objects.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.