Solving Bongard Problems with a Visual Language and Pragmatic Reasoning

Published 12 Apr 2018 in stat.ML, cs.AI, and cs.LG | (1804.04452v1)

Abstract: More than 50 years ago Bongard introduced 100 visual concept learning problems as a testbed for intelligent vision systems. These problems are now known as Bongard problems. Although they are well known in the cognitive science and AI communities only moderate progress has been made towards building systems that can solve a substantial subset of them. In the system presented here, visual features are extracted through image processing and then translated into a symbolic visual vocabulary. We introduce a formal language that allows representing complex visual concepts based on this vocabulary. Using this language and Bayesian inference, complex visual concepts can be induced from the examples that are provided in each Bongard problem. Contrary to other concept learning problems the examples from which concepts are induced are not random in Bongard problems, instead they are carefully chosen to communicate the concept, hence requiring pragmatic reasoning. Taking pragmatic reasoning into account we find good agreement between the concepts with high posterior probability and the solutions formulated by Bongard himself. While this approach is far from solving all Bongard problems, it solves the biggest fraction yet.

Abstract PDF Upgrade to Chat

Authors (3)

Citations (40)

View on Semantic Scholar

Summary

The paper presents an innovative approach by integrating sub-symbolic image processing with a symbolic visual language to describe complex visual concepts.
It employs Bayesian inference enhanced by pragmatic reasoning to reduce hypothesis space and boost problem-solving efficiency.
The system sets a new benchmark by solving a significant portion of Bongard problems and bridging the gap between vision and cognition.

Solving Bongard Problems with a Visual Language and Pragmatic Reasoning

The paper presents a novel approach to solving Bongard problems, which were introduced over 50 years ago as a means of testing the capabilities of intelligent vision systems. The essence of Bongard problems lies in the distinction between two sets of six images, each set adhering to a distinct visual concept. They pose a unique challenge in visual cognition, demanding more than just object recognition—they require the induction of visual concepts through pragmatic reasoning.

The authors propose a system that integrates image processing to extract visual features that are then translated into a symbolic visual vocabulary. This is followed by the application of a formal language to express complex visual concepts, which enables Bayesian inference for concept induction. Their method is not aimed at solving all Bongard problems but rather at successfully tackling the largest fraction of them thus far.

Bongard problems are particularly designed to illustrate the inadequacies of conventional pattern recognition tools, and they continue to challenge modern approaches. The primary aim of the authors is to develop a language that encapsulates visual concepts by combining sub-symbolic image representations with symbolic cognitive processes. This is done with an eye towards improving the interaction between visual and cognitive modules which remains a significant gap in artificial intelligence systems, especially in terms of their ability to generalize and interpret novel stimuli.

The system outlined in the paper extracts visual features using standard methods for segmentation and feature extraction but focuses on crafting a language suitable for expressing visual concepts. This language includes operations for basic shapes, properties, and relational features like size, position, or whether one figure is inside another. The constructed grammar takes these elements to form logical expressions that describe the visual scenes in the Bongard problems.

Critically, the authors apply pragmatic reasoning inspired by human communication to refine their system. They use a Bayesian approach, employing a likelihood function that incorporates pragmatic effects, ensuring that Bongard's carefully crafted examples effectively communicate the intended concepts. This pragmatic reasoning is a key innovation, allowing for significant reductions in the hypothesis space and enhancing the efficiency of the inference algorithm.

The results demonstrate that the system can solve a significant portion of Bongard problems, achieving the best performance recorded in the literature thus far. The system expresses solutions in a formal language and ranks several logically equivalent solutions, providing a distribution of potential solutions rather than a single answer. While not all human-like presuppositions and pragmatic subtleties can be captured, the system nonetheless finds reasonable solutions for a wide array of problems.

The implications of this research are twofold: first, it contributes to the discussion on the interface between vision and cognition by providing a language that bridges sub-symbolic and symbolic processes; second, it emphasizes the need for incorporating pragmatic reasoning into AI systems to more closely mimic human visual cognition. These findings could steer future efforts toward constructing AI systems that not only recognize objects but also understand scenes and concepts, propelling developments in AI further towards human-like capabilities. The project can be seen as part of a broader attempt in cognitive science and AI to explore domain-general representations and algorithms for complex reasoning tasks.

In conclusion, this paper advances the integration of cognitive reasoning in AI systems, paving the way for further research into more comprehensive, pragmatic AI capable of solving complex visual problems like Bongard's.

Markdown Report Issue