Multiple Random Masking Autoencoder Ensembles for Robust Multimodal Semi-supervised Learning (2402.08035v1)
Abstract: There is an increasing number of real-world problems in computer vision and machine learning requiring to take into consideration multiple interpretation layers (modalities or views) of the world and learn how they relate to each other. For example, in the case of Earth Observations from satellite data, it is important to be able to predict one observation layer (e.g. vegetation index) from other layers (e.g. water vapor, snow cover, temperature etc), in order to best understand how the Earth System functions and also be able to reliably predict information for one layer when the data is missing (e.g. due to measurement failure or error).
- Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. (2022) 16000–16009
- 4m: Massively multimodal masked modeling. arXiv preprint arXiv:2312.06647 (2023)
- Multimae: Multi-modal multi-task masked autoencoders. In: European Conference on Computer Vision, Springer (2022) 348–367
- Self-supervised learning with geometric constraints in monocular video: Connecting flow, depth, and camera. In: Proceedings of the IEEE International Conference on Computer Vision. (2019) 7063–7072
- Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2017) 1851–1858
- Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2019) 12240–12249
- Unsupervised scale-consistent depth and ego-motion learning from monocular video. In: Advances in Neural Information Processing Systems. (2019) 35–45
- Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. (2019) 8977–8986
- Unsupervised learning of geometry from videos with edge-aware depth-normal consistency. In: Thirty-Second AAAI Conference on Artificial Intelligence. (2018)
- Distilled semantics for comprehensive scene understanding from videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2020) 4654–4665
- Semantically-guided representation learning for self-supervised monocular depth. arXiv preprint arXiv:2002.12319 (2020)
- Towards scene understanding: Unsupervised monocular depth estimation with semantic-aware representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2019) 2624–2632
- Casting geometric constraints in semantic segmentation as semi-supervised learning. In: The IEEE Winter Conference on Applications of Computer Vision. (2020) 1854–1863
- Deep multimodal clustering for unsupervised audiovisual learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (June 2019)
- Connecting touch and vision via cross-modal prediction. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (June 2019)
- Split-brain autoencoders: Unsupervised learning by cross-channel prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2017) 1058–1067
- Automatic multimedia cross-modal correlation discovery. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM (2004) 653–658
- Unsupervised cross-modal retrieval through adversarial learning. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), IEEE (2017) 1153–1158
- The sound of pixels. In: Proceedings of the European Conference on Computer Vision (ECCV). (2018) 570–586
- Taskonomy: Disentangling task transfer learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition. (2018) 3712–3722
- Robust learning through cross-task consistency. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2020) 11197–11206
- Efficiently identifying task groupings for multi-task learning. Advances in Neural Information Processing Systems 34 (2021) 27503–27516
- Self-supervised hypergraphs for learning multiple world interpretations. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. (2023) 983–992
- Multi-task hypergraphs for semi-supervised learning using earth observations. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. (2023) 3404–3414
- Semi-supervised learning for multi-task scene understanding by neural graph consensus. In: Proceedings of the AAAI Conference on Artificial Intelligence. Volume 35. (2021) 1882–1892
- The emergence and evolution of earth system science. Nature Reviews Earth & Environment 1(1) (2020) 54–63
- Deflecting adversarial attacks with pixel deflection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. (2018) 8571–8580
- Pitfalls of in-domain uncertainty estimation and ensembling in deep learning. arXiv preprint arXiv:2002.06470 (2020)
- Memo: Test time robustness via adaptation and augmentation. Advances in Neural Information Processing Systems 35 (2022) 38629–38642
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Collections
Sign up for free to add this paper to one or more collections.