Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Slot Abstractors: Toward Scalable Abstract Visual Reasoning (2403.03458v2)

Published 6 Mar 2024 in cs.CV and cs.LG

Abstract: Abstract visual reasoning is a characteristically human ability, allowing the identification of relational patterns that are abstracted away from object features, and the systematic generalization of those patterns to unseen problems. Recent work has demonstrated strong systematic generalization in visual reasoning tasks involving multi-object inputs, through the integration of slot-based methods used for extracting object-centric representations coupled with strong inductive biases for relational abstraction. However, this approach was limited to problems containing a single rule, and was not scalable to visual reasoning problems containing a large number of objects. Other recent work proposed Abstractors, an extension of Transformers that incorporates strong relational inductive biases, thereby inheriting the Transformer's scalability and multi-head architecture, but it has yet to be demonstrated how this approach might be applied to multi-object visual inputs. Here we combine the strengths of the above approaches and propose Slot Abstractors, an approach to abstract visual reasoning that can be scaled to problems involving a large number of objects and multiple relations among them. The approach displays state-of-the-art performance across four abstract visual reasoning tasks, as well as an abstract reasoning task involving real-world images.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Abstractors: Transformer modules for symbolic message passing and relational reasoning. arXiv preprint arXiv:2304.00195, 2023.
  2. Measuring abstract reasoning in neural networks. In International conference on machine learning, pp.  511–520. PMLR, 2018.
  3. Scale-localized abstract reasoning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  12557–12565, 2021.
  4. Monet: Unsupervised scene decomposition and representation. arXiv preprint arXiv:1901.11390, 2019.
  5. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
  6. Dao, T. Flashattention-2: Faster attention with better parallelism and work partitioning. arXiv preprint arXiv:2307.08691, 2023.
  7. Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems, 35:16344–16359, 2022.
  8. Attention over learned object embeddings enables complex visual reasoning. Advances in neural information processing systems, 34:9112–9124, 2021.
  9. Generalization and robustness implications in object-centric learning. arXiv preprint arXiv:2107.00637, 2021.
  10. Genesis-v2: Inferring unordered object representations without iterative refinement. Advances in Neural Information Processing Systems, 34:8085–8094, 2021.
  11. Comparing machines and humans on a visual categorization test. Proceedings of the National Academy of Sciences, 108(43):17621–17625, 2011.
  12. Multi-object representation learning with iterative variational inference. In International Conference on Machine Learning, pp.  2424–2433. PMLR, 2019.
  13. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  14. Stratified rule-aware network for abstract visual reasoning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp.  1567–1574, 2021.
  15. Solving raven’s progressive matrices with multi-layer relation networks. In 2020 International Joint Conference on Neural Networks (IJCNN), pp.  1–6. IEEE, 2020.
  16. Object-centric slot diffusion. arXiv preprint arXiv:2303.10834, 2023.
  17. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  2901–2910, 2017.
  18. On neural architecture inductive biases for relational tasks. arXiv preprint arXiv:2206.05056, 2022.
  19. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  20. Comparison and categorization in the development of relational similarity. Child Development, 67(6):2797–2822, 1996.
  21. Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. In International conference on machine learning, pp.  2873–2882. PMLR, 2018.
  22. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015.
  23. Object-centric learning with slot attention. Advances in Neural Information Processing Systems, 33:11525–11538, 2020.
  24. Rotating features for object discovery. arXiv preprint arXiv:2306.00600, 2023.
  25. Learning to reason over visual objects. In 11th International Conference on Learning Representations, ICLR, 2023.
  26. Automatic differentiation in pytorch. 2017.
  27. Raven, J. C. Progressive matrices: A perceptual test of intelligence, individual form. London: Lewis, 1938.
  28. Same-different problems strain convolutional neural networks. arXiv preprint arXiv:1802.03390, 2018.
  29. A simple neural network module for relational reasoning. Advances in neural information processing systems, 30, 2017.
  30. Bridging the gap to real-world object-centric learning. arXiv preprint arXiv:2209.14860, 2022.
  31. An explicitly relational neural network architecture. In International Conference on Machine Learning, pp.  8593–8603. PMLR, 2020.
  32. A closer look at generalisation in raven. In European Conference on Computer Vision, pp.  601–616. Springer, 2020.
  33. Contrastive training of complex-valued autoencoders for object discovery. arXiv preprint arXiv:2305.15001, 2023.
  34. Improving generalization for abstract reasoning tasks using disentangled feature representations. arXiv preprint arXiv:1811.04784, 2018.
  35. Gamr: A guided attention model for (visual) reasoning. In 11th International Conference on Learning Representations, ICLR, 2023.
  36. Understanding the computational demands underlying visual reasoning. Neural Computation, 34(5):1075–1099, 2022.
  37. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  38. Abstract diagrammatic reasoning with multiplex graph networks. arXiv preprint arXiv:2006.11197, 2020.
  39. Visual interaction networks: Learning a physics simulator from video. Advances in neural information processing systems, 30, 2017.
  40. Spatial broadcast decoder: A simple architecture for learning disentangled representations in vaes. arXiv preprint arXiv:1901.07017, 2019.
  41. Learning representations that support extrapolation. In International conference on machine learning, pp.  10136–10146. PMLR, 2020.
  42. Emergent symbols through binding in external memory. In 9th International Conference on Learning Representations, ICLR, 2021.
  43. The relational bottleneck as an inductive bias for efficient abstraction. arXiv preprint arXiv:2309.06629, 2023a.
  44. Systematic visual reasoning through object-centric relational abstraction. arXiv preprint arXiv:2306.02500, 2023b.
  45. The scattering compositional learner: Discovering objects, attributes, relationships in analogical reasoning. arXiv preprint arXiv:2007.04212, 2020.
  46. Slotformer: Unsupervised visual dynamics simulation with object-centric models. arXiv preprint arXiv:2210.05861, 2022.
  47. Slotdiffusion: Object-centric generative modeling with diffusion models. arXiv preprint arXiv:2305.11281, 2023.
  48. Raven: A dataset for relational and analogical visual reasoning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  5317–5327, 2019a.
  49. Learning perceptual inference by contrasting. Advances in neural information processing systems, 32, 2019b.
  50. Learning robust rule representations for abstract reasoning via internal inferences. Advances in Neural Information Processing Systems, 35:33550–33562, 2022.
  51. Abstract reasoning with distracting features. Advances in Neural Information Processing Systems, 32, 2019.
  52. Effective abstract reasoning with dual-contrast network. arXiv preprint arXiv:2205.13720, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Shanka Subhra Mondal (9 papers)
  2. Jonathan D. Cohen (38 papers)
  3. Taylor W. Webb (10 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.