Solving Bongard Problems with a Visual Language and Pragmatic Reasoning
The paper presents a novel approach to solving Bongard problems, which were introduced over 50 years ago as a means of testing the capabilities of intelligent vision systems. The essence of Bongard problems lies in the distinction between two sets of six images, each set adhering to a distinct visual concept. They pose a unique challenge in visual cognition, demanding more than just object recognition—they require the induction of visual concepts through pragmatic reasoning.
The authors propose a system that integrates image processing to extract visual features that are then translated into a symbolic visual vocabulary. This is followed by the application of a formal language to express complex visual concepts, which enables Bayesian inference for concept induction. Their method is not aimed at solving all Bongard problems but rather at successfully tackling the largest fraction of them thus far.
Bongard problems are particularly designed to illustrate the inadequacies of conventional pattern recognition tools, and they continue to challenge modern approaches. The primary aim of the authors is to develop a language that encapsulates visual concepts by combining sub-symbolic image representations with symbolic cognitive processes. This is done with an eye towards improving the interaction between visual and cognitive modules which remains a significant gap in artificial intelligence systems, especially in terms of their ability to generalize and interpret novel stimuli.
The system outlined in the paper extracts visual features using standard methods for segmentation and feature extraction but focuses on crafting a language suitable for expressing visual concepts. This language includes operations for basic shapes, properties, and relational features like size, position, or whether one figure is inside another. The constructed grammar takes these elements to form logical expressions that describe the visual scenes in the Bongard problems.
Critically, the authors apply pragmatic reasoning inspired by human communication to refine their system. They use a Bayesian approach, employing a likelihood function that incorporates pragmatic effects, ensuring that Bongard's carefully crafted examples effectively communicate the intended concepts. This pragmatic reasoning is a key innovation, allowing for significant reductions in the hypothesis space and enhancing the efficiency of the inference algorithm.
The results demonstrate that the system can solve a significant portion of Bongard problems, achieving the best performance recorded in the literature thus far. The system expresses solutions in a formal language and ranks several logically equivalent solutions, providing a distribution of potential solutions rather than a single answer. While not all human-like presuppositions and pragmatic subtleties can be captured, the system nonetheless finds reasonable solutions for a wide array of problems.
The implications of this research are twofold: first, it contributes to the discussion on the interface between vision and cognition by providing a language that bridges sub-symbolic and symbolic processes; second, it emphasizes the need for incorporating pragmatic reasoning into AI systems to more closely mimic human visual cognition. These findings could steer future efforts toward constructing AI systems that not only recognize objects but also understand scenes and concepts, propelling developments in AI further towards human-like capabilities. The project can be seen as part of a broader attempt in cognitive science and AI to explore domain-general representations and algorithms for complex reasoning tasks.
In conclusion, this paper advances the integration of cognitive reasoning in AI systems, paving the way for further research into more comprehensive, pragmatic AI capable of solving complex visual problems like Bongard's.