MIMIC: Masked Image Modeling with Image Correspondences (2306.15128v4)

Published 27 Jun 2023 in cs.CV, cs.AI, and cs.LG

Abstract: Dense pixel-specific representation learning at scale has been bottlenecked due to the unavailability of large-scale multi-view datasets. Current methods for building effective pretraining datasets heavily rely on annotated 3D meshes, point clouds, and camera parameters from simulated environments, preventing them from building datasets from real-world data sources where such metadata is lacking. We propose a pretraining dataset-curation approach that does not require any additional annotations. Our method allows us to generate multi-view datasets from both real-world videos and simulated environments at scale. Specifically, we experiment with two scales: MIMIC-1M with 1.3M and MIMIC-3M with 3.1M multi-view image pairs. We train multiple models with different masked image modeling objectives to showcase the following findings: Representations trained on our automatically generated MIMIC-3M outperform those learned from expensive crowdsourced datasets (ImageNet-1K) and those learned from synthetic environments (MULTIVIEW-HABITAT) on two dense geometric tasks: depth estimation on NYUv2 (1.7%), and surface normals estimation on Taskonomy (2.05%). For dense tasks which also require object understanding, we outperform MULTIVIEW-HABITAT, on semantic segmentation on ADE20K (3.89%), pose estimation on MSCOCO (9.4%), and reduce the gap with models pre-trained on the object-centric expensive ImageNet-1K. We outperform even when the representations are frozen, and when downstream training data is limited to few-shot. Larger dataset (MIMIC-3M) significantly improves performance, which is promising since our curation method can arbitrarily scale to produce even larger datasets. MIMIC code, dataset, and pretrained models are open-sourced at https://github.com/RAIVNLab/MIMIC.

References (43)

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (10)

GitHub

GitHub - RAIVNLab/MIMIC: MIMIC: Masked Image Modeling with Image Correspondences (16 stars)

MIMIC: Masked Image Modeling with Image Correspondences (2306.15128v4)

Collections

Summary

Follow-up Questions

Authors (10)

GitHub

Tweets

Don't miss out on important new AI/ML research

MIMIC: Masked Image Modeling with Image Correspondences (2306.15128v4)

Collections

Summary

Follow-up Questions

Related Papers

Authors (10)

GitHub

Tweets

Don't miss out on important new AI/ML research