Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

V-LoL: A Diagnostic Dataset for Visual Logical Learning (2306.07743v3)

Published 13 Jun 2023 in cs.AI, cs.CV, and cs.LG

Abstract: Despite the successes of recent developments in visual AI, different shortcomings still exist; from missing exact logical reasoning, to abstract generalization abilities, to understanding complex and noisy scenes. Unfortunately, existing benchmarks, were not designed to capture more than a few of these aspects. Whereas deep learning datasets focus on visually complex data but simple visual reasoning tasks, inductive logic datasets involve complex logical learning tasks, however, lack the visual component. To address this, we propose the diagnostic visual logical learning dataset, V-LoL, that seamlessly combines visual and logical challenges. Notably, we introduce the first instantiation of V-LoL, V-LoL-Train, - a visual rendition of a classic benchmark in symbolic AI, the Michalski train problem. By incorporating intricate visual scenes and flexible logical reasoning tasks within a versatile framework, V-LoL-Train provides a platform for investigating a wide range of visual logical learning challenges. We evaluate a variety of AI systems including traditional symbolic AI, neural AI, as well as neuro-symbolic AI. Our evaluations demonstrate that even SOTA AI faces difficulties in dealing with visual logical learning challenges, highlighting unique advantages and limitations of each methodology. Overall, V-LoL opens up new avenues for understanding and enhancing current abilities in visual logical learning for AI systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. VQA: visual question answering. In International Conference on Computer Vision (ICCV), pages 2425–2433. IEEE Computer Society, 2015.
  2. Chitta Baral. Knowledge Representation, Reasoning and Declarative Problem Solving. Cambridge University Press, 2010.
  3. On the dangers of stochastic parrots: Can language models be too big? In Conference on Fairness, Accountability, and Transparency (FAccT), pages 610–623. ACM, 2021.
  4. How did aq face the east-west challenge? an analysis of the aq family’s performance in the 2nd international competition of machine learning programs. Technical report, 1995.
  5. Mikhail Moiseevich Bongard. The recognition problem. Technical report, Foreign Technology Div Wright-Patterson AFB Ohio, 1968.
  6. Knowledge representation and reasoning. Elsevier, 2004.
  7. What one intelligence test measures: A theoretical account of the processing in the raven progressive matrices test. Psychological Review, 97:404–31, 08 1990.
  8. The cityscapes dataset for semantic urban scene understanding. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 3213–3223. IEEE Computer Society, 2016.
  9. Selection-inference: Exploiting large language models for interpretable logical reasoning. CoRR, abs/2205.09712, 2022.
  10. Turning 30: New ideas in inductive logic programming. In International Joint Conference on Artificial Intelligence (IJCAI), pages 4833–4839, 2020.
  11. Learning programs by learning from failures, 2020.
  12. FFNSL: feed-forward neural-symbolic learner. Machine Learning, 112(2):515–569, 2023.
  13. Probabilistic Inductive Logic Programming, pages 1–27. Springer Berlin Heidelberg, 2008.
  14. Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. Journal of medicinal chemistry, 34(2):786–797, 1991.
  15. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (ICLR), 2021.
  16. UCI machine learning repository, 2017.
  17. Learning explanatory rules from noisy data. Journal of Artificial Intelligence Research, 61:1–64, 2018.
  18. Neurosymbolic ai: the 3rd wave. Artificial Intelligence Review, pages 1–20, 03 2023.
  19. Datasheets for datasets. Communications of the ACM, 64(12):86–92, 2021.
  20. Mask R-CNN. In International Conference on Computer Vision (ICCV), pages 2980–2988. IEEE Computer Society, 2017.
  21. Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778. IEEE Computer Society, 2016.
  22. KANDINSKY patterns as iq-test for machine learning. In Andreas Holzinger, Peter Kieseberg, A Min Tjoa, and Edgar R. Weippl, editors, International Cross-Domain Conference Machine Learning and Knowledge Extraction (CD-MAKE), volume 11713, pages 1–14. Springer, 2019.
  23. Kandinskypatterns–an experimental exploration environment for pattern analysis and machine intelligence. CoRR, 2021.
  24. PTR: A benchmark for part-based conceptual, relational, and physical reasoning. In Conference on Neural Information Processing Systems (NeurIPS), pages 17427–17440, 2021.
  25. Scallop: From probabilistic deductive databases to scalable differentiable reasoning. In Conference on Neural Information Processing (NeurIPS), pages 25134–25145, 2021.
  26. GQA: A new dataset for real-world visual reasoning and compositional question answering. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 6700–6709. Computer Vision Foundation / IEEE, 2019.
  27. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 1988–1997. IEEE Computer Society, 2017.
  28. Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873):583–589, 2021.
  29. Clevrtex: A texture-rich benchmark for unsupervised multi-object segmentation. In Conference on Neural Information Processing Systems (NeurIPS), NeurIPS Datasets and Benchmarks, 2021.
  30. Henry Kautz. The third ai summer: Aaai robert s. engelmore memorial lecture. AI Magazine, 43(1):105–125, Mar. 2022.
  31. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 123(1):32–73, 2017.
  32. Deep learning. Nature, 521(7553):436, 2015.
  33. Benchmarking detection transfer learning with vision transformers. CoRR, abs/2111.11429, 2021.
  34. Microsoft COCO: common objects in context. In European Conference on Computer Vision (ECCV), volume 8693, pages 740–755. Springer, 2014.
  35. Deepproblog: Neural probabilistic logic programming. In Conference on Neural Information Processing Systems (NeurIPS), pages 3753–3763, 2018.
  36. The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision. In International Conference on Learning Representations (ICLR), 2019.
  37. Ryszard S. Michalski. Pattern recognition as rule-guided inductive inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-2(4):349–361, 1980.
  38. To the international computing community: A new east-west challenge. 1994.
  39. Stephen Muggleton. Random train generator, 1998.
  40. Stephen H. Muggleton. Inverse entailment and progol. New Generation Computing, 13(3&4):245–286, 1995.
  41. A review of relational machine learning for knowledge graphs. Proceedings of the IEEE, 104(1):11–33, 2015.
  42. Bongard-logo: A new benchmark for human-level concept learning and reasoning. In Conference on Neural Information Processing Systems (NeurIPS), 2020.
  43. Foundations of Inductive Logic Programming. Springer-Verlag, 1997.
  44. End-to-end Differentiable Proving. In Conference on Neural Information Processing (NeurIPS), pages 3788–3800, 2017.
  45. Artificial Intelligence: A Modern Approach (4th Edition). Pearson, 2020.
  46. Language models are greedy reasoners: A systematic formal analysis of chain-of-thought. CoRR, abs/2210.01240, 2022.
  47. Making deep neural networks right for the right scientific reasons by interacting with their explanations. Nature Machine Intelligence, 2(8):476–486, 2020.
  48. LAION-5B: an open large-scale dataset for training next generation image-text models. In Conference on Neural Information Processing Systems (NeurIPS), 2022.
  49. α𝛼\alphaitalic_αilp: thinking visual scenes as differentiable logic programs. Machine Learning, 112(5):1465–1497, May 2023.
  50. Mastering the game of go with deep neural networks and tree search. Nature, 529:484–503, 2016.
  51. Neural-probabilistic answer set programming. In Gabriele Kern-Isberner, Gerhard Lakemeyer, and Thomas Meyer, editors, International Conference on Principles of Knowledge Representation and Reasoning (KR), 2022.
  52. A. Srinivasan. The Aleph Manual, 2001.
  53. Right for the right concept: Revising neuro-symbolic concepts by interacting with their explanations. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 3619–3629. Computer Vision Foundation / IEEE, 2021.
  54. Efficientnetv2: Smaller models and faster training. In International Conference on Machine Learning (ICML), volume 139 of Proceedings of Machine Learning Research, pages 10096–10106. PMLR, 2021.
  55. Chain-of-thought prompting elicits reasoning in large language models. In Conference on Neural Information Processing (NeurIPS), 2022.
  56. Visual question answering: A survey of methods and datasets. Computer Vision and Image Understanding, 163:21–40, 2017.
  57. Neurasp: Embracing neural networks into answer set programming. In International Joint Conference on Artificial Intelligence (IJCAI), pages 1755–1762, 2020.
  58. Clevrer: Collision events for video representation and reasoning. In International Conference on Learning Representations (ICLR), 2020.
  59. From recognition to cognition: Visual commonsense reasoning. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 6720–6731. Computer Vision Foundation / IEEE, 2019.
  60. RAVEN: A dataset for relational and analogical visual reasoning. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 5317–5327. Computer Vision Foundation / IEEE, 2019.
  61. Visual7w: Grounded question answering in images. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 4995–5004. IEEE Computer Society, 2016.
Citations (3)

Summary

We haven't generated a summary for this paper yet.