Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 165 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 38 tok/s Pro
GPT-5 High 39 tok/s Pro
GPT-4o 111 tok/s Pro
Kimi K2 188 tok/s Pro
GPT OSS 120B 450 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Self-Guided Masked Autoencoders for Domain-Agnostic Self-Supervised Learning (2402.14789v1)

Published 22 Feb 2024 in cs.LG and cs.AI

Abstract: Self-supervised learning excels in learning representations from large amounts of unlabeled data, demonstrating success across multiple data modalities. Yet, extending self-supervised learning to new modalities is non-trivial because the specifics of existing methods are tailored to each domain, such as domain-specific augmentations which reflect the invariances in the target task. While masked modeling is promising as a domain-agnostic framework for self-supervised learning because it does not rely on input augmentations, its mask sampling procedure remains domain-specific. We present Self-guided Masked Autoencoders (SMA), a fully domain-agnostic masked modeling method. SMA trains an attention based model using a masked modeling objective, by learning masks to sample without any domain-specific assumptions. We evaluate SMA on three self-supervised learning benchmarks in protein biology, chemical property prediction, and particle physics. We find SMA is capable of learning representations without domain-specific knowledge and achieves state-of-the-art performance on these three benchmarks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. Chemberta-2: Towards chemical foundation models. arXiv preprint arXiv:2209.01712, 2022.
  2. S. Ö. Arik and T. Pfister. Tabnet: Attentive interpretable tabular learning. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 6679–6687, 2021.
  3. Gradient boosting neural networks: Grownet. arXiv preprint arXiv:2002.07971, 2020.
  4. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, 33:12449–12460, 2020.
  5. Data2vec: A general framework for self-supervised learning in speech, vision and language. In International Conference on Machine Learning, pages 1298–1312. PMLR, 2022.
  6. Adamae: Adaptive masking for efficient spatiotemporal learning with masked autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14507–14517, 2023.
  7. Proteinbert: a universal deep-learning model of protein sequence and function. Bioinformatics, 38(8):2102–2110, 2022.
  8. Guacamol: benchmarking models for de novo molecular design. Journal of chemical information and modeling, 59(3):1096–1108, 2019.
  9. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  10. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
  11. T. Chen and C. Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016.
  12. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  13. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  14. Autoaugment: Learning augmentation policies from data. arXiv preprint arXiv:1805.09501, 2018.
  15. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 702–703, 2020.
  16. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  17. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  18. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  19. Masked autoencoders as spatiotemporal learners. arXiv preprint arXiv:2205.09113, 2022.
  20. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231, 2018.
  21. Revisiting deep learning models for tabular data. Advances in Neural Information Processing Systems, 34:18932–18943, 2021.
  22. On embeddings for numerical features in tabular deep learning. Advances in Neural Information Processing Systems, 35:24991–25004, 2022.
  23. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
  24. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022.
  25. Deep networks with stochastic depth. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pages 646–661. Springer, 2016.
  26. Perceiver io: A general architecture for structured inputs & outputs. arXiv preprint arXiv:2107.14795, 2021a.
  27. Perceiver: General perception with iterative attention. In International conference on machine learning, pages 4651–4664. PMLR, 2021b.
  28. D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  29. Self-normalizing neural networks. Advances in neural information processing systems, 30, 2017.
  30. Deep learning. nature, 521(7553):436–444, 2015.
  31. Set transformer: A framework for attention-based permutation-invariant neural networks. In International conference on machine learning, pages 3744–3753. PMLR, 2019.
  32. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461, 2019.
  33. Fast autoaugment. In Advances in Neural Information Processing Systems, pages 6665–6675, 2019.
  34. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  35. I. Loshchilov and F. Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  36. The million song dataset challenge. In Proceedings of the 21st International Conference on World Wide Web, pages 909–916, 2012.
  37. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  38. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
  39. Masked autoencoders for point cloud self-supervised learning. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part II, pages 604–621. Springer, 2022.
  40. Neural oblivious decision ensembles for deep learning on tabular data. arXiv preprint arXiv:1909.06312, 2019.
  41. Catboost: unbiased boosting with categorical features. Advances in neural information processing systems, 31, 2018.
  42. H. Qu and L. Gouskos. Jet tagging via particle clouds. Physical Review D, 101(5):056019, 2020.
  43. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  44. Deepchem: democratizing deep-learning for drug discovery, quantum chemistry. Materials Science and Biology. https://github. com/deepchem/deepchem (accessed Aug 8, 2017), 4:2785–2791, 2019.
  45. Evaluating protein transfer learning with tape. Advances in neural information processing systems, 32, 2019.
  46. wav2vec: Unsupervised pre-training for speech recognition. arXiv preprint arXiv:1904.05862, 2019.
  47. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538, 2017.
  48. Adversarial masking for self-supervised learning. In International Conference on Machine Learning, pages 20026–20040. PMLR, 2022.
  49. Autoint: Automatic feature interaction learning via self-attentive neural networks. In Proceedings of the 28th ACM international conference on information and knowledge management, pages 1161–1170, 2019.
  50. R. Sutton. The bitter lesson. Incomplete Ideas (blog), 13(1), 2019.
  51. T. Suzuki. Teachaugment: Data augmentation optimization using teacher knowledge. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10904–10914, June 2022.
  52. Viewmaker networks: Learning views for unsupervised representation learning. arXiv preprint arXiv:2010.07432, 2020.
  53. Dabs: A domain-agnostic benchmark for self-supervised learning. arXiv preprint arXiv:2111.12062, 2021.
  54. Contrastive multiview coding. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16, pages 776–794. Springer, 2020.
  55. Deit iii: Revenge of the vit. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIV, pages 516–533. Springer, 2022.
  56. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  57. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461, 2018.
  58. Dcn v2: Improved deep & cross network and practical lessons for web-scale learning to rank systems. In Proceedings of the web conference 2021, pages 1785–1797, 2021.
  59. Image as a foreign language: Beit pretraining for all vision and vision-language tasks. arXiv preprint arXiv:2208.10442, 2022.
  60. D. Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28(1):31–36, 1988.
  61. D. Whiteson. HIGGS. UCI Machine Learning Repository, 2014. DOI: https://doi.org/10.24432/C5V312.
  62. Moleculenet: a benchmark for molecular machine learning. Chemical science, 9(2):513–530, 2018.
  63. Clear: Contrastive learning for sentence representation. arXiv preprint arXiv:2012.15466, 2020.
  64. Simmim: A simple framework for masked image modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9653–9663, 2022.
  65. Masked autoencoders that listen. arXiv preprint arXiv:2207.06405, 2022.
  66. Byt5: Towards a token-free future with pre-trained byte-to-byte models. Transactions of the Association for Computational Linguistics, 10:291–306, 2022.
  67. Megabyte: Predicting million-byte sequences with multiscale transformers. arXiv preprint arXiv:2305.07185, 2023.
  68. Point-bert: Pre-training 3d point cloud transformers with masked point modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19313–19322, 2022.
  69. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6023–6032, 2019.
  70. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
  71. Uni-mol: A universal 3d molecular representation learning framework. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=6K2RM6wVqKu.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 4 tweets and received 29 likes.

Upgrade to Pro to view all of the tweets about this paper: