Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Abstract Visual Reasoning via Task Decomposition: A Case Study in Raven Progressive Matrices (2308.06528v2)

Published 12 Aug 2023 in cs.AI, cs.CV, and cs.LG

Abstract: Learning to perform abstract reasoning often requires decomposing the task in question into intermediate subgoals that are not specified upfront, but need to be autonomously devised by the learner. In Raven Progressive Matrices (RPM), the task is to choose one of the available answers given a context, where both the context and answers are composite images featuring multiple objects in various spatial arrangements. As this high-level goal is the only guidance available, learning to solve RPMs is challenging. In this study, we propose a deep learning architecture based on the transformer blueprint which, rather than directly making the above choice, addresses the subgoal of predicting the visual properties of individual objects and their arrangements. The multidimensional predictions obtained in this way are then directly juxtaposed to choose the answer. We consider a few ways in which the model parses the visual input into tokens and several regimes of masking parts of the input in self-supervised training. In experimental assessment, the models not only outperform state-of-the-art methods but also provide interesting insights and partial explanations about the inference. The design of the method also makes it immune to biases that are known to be present in some RPM benchmarks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. James C Raven. Mental tests used in genetic, the performance of related indiviuals on tests mainly educative and mainly reproductive. MSC thesisUniv London, 1936.
  2. M.M. Bongard. Pattern recognition. 1970.
  3. Douglas R. Hofstadter. Fluid concepts & creative analogies : computer models of the fundamental mechanisms of thought. Basic Books, New York, 1995.
  4. Deep Learning Methods for Abstract Visual Reasoning: A Survey on Raven’s Progressive Matrices. Jan 2022.
  5. Raven: A dataset for relational and analogical visual reasoning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  6. Stratified rule-aware network for abstract visual reasoning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(2):1567–1574, May 2021.
  7. The scattering compositional learner: Discovering objects, attributes, relationships in analogical reasoning. CoRR, abs/2007.04212, 2020.
  8. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  9. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929 [cs], Oct 2020. arXiv: 2010.11929.
  10. Efficientnetv2: Smaller models and faster training. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 10096–10106. PMLR, 2021.
  11. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  12. EfficientNet: Rethinking model scaling for convolutional neural networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 6105–6114. PMLR, 09–15 Jun 2019.
  13. Understanding the effective receptive field in deep convolutional neural networks. (arXiv:1701.04128), Jan 2017. arXiv:1701.04128 [cs].
  14. Scale-localized abstract reasoning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12557–12565, June 2021.
  15. Daniel Defays. Numbo: a study in cognition and recognition. pages 131–154, 02 1995.
  16. Attention on abstract visual reasoning. CoRR, abs/1911.05990, 2019.
  17. Measuring abstract reasoning in neural networks. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 511–520. PMLR, 10–15 Jul 2018.
  18. Learning perceptual inference by contrasting. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  19. Multi-label contrastive learning for abstract visual reasoning. IEEE Transactions on Neural Networks and Learning Systems, page 1–13, 2022. arXiv:2012.01944 [cs].
  20. Layer Normalization. arXiv e-prints, page arXiv:1607.06450, July 2016.
  21. Hierarchical Rule Induction Network for Abstract Visual Reasoning. Feb 2020.
  22. A closer look at generalisation in raven. In Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVII, page 601–616, Berlin, Heidelberg, 2020. Springer-Verlag.
  23. Few-shot visual reasoning with meta-analogical contrastive learning. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 16846–16856. Curran Associates, Inc., 2020.
  24. Effective abstract reasoning with dual-contrast network. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
  25. Bert: Pre-training of deep bidirectional transformers for language understanding, 2018. cite arxiv:1810.04805Comment: 13 pages.
  26. Bridging nonlinearities and stochastic regularizers with gaussian error linear units. ArXiv, abs/1606.08415, 2016.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Jakub Kwiatkowski (1 paper)
  2. Krzysztof Krawiec (14 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.