Papers
Topics
Authors
Recent
Search
2000 character limit reached

Human-like machine thinking: Language guided imagination

Published 18 May 2019 in cs.CL, cs.AI, and q-bio.NC | (1905.07562v2)

Abstract: Human thinking requires the brain to understand the meaning of language expression and to properly organize the thoughts flow using the language. However, current natural language processing models are primarily limited in the word probability estimation. Here, we proposed a Language guided imagination (LGI) network to incrementally learn the meaning and usage of numerous words and syntaxes, aiming to form a human-like machine thinking process. LGI contains three subsystems: (1) vision system that contains an encoder to disentangle the input or imagined scenarios into abstract population representations, and an imagination decoder to reconstruct imagined scenario from higher level representations; (2) Language system, that contains a binarizer to transfer symbol texts into binary vectors, an IPS (mimicking the human IntraParietal Sulcus, implemented by an LSTM) to extract the quantity information from the input texts, and a textizer to convert binary vectors into text symbols; (3) a PFC (mimicking the human PreFrontal Cortex, implemented by an LSTM) to combine inputs of both language and vision representations, and predict text symbols and manipulated images accordingly. LGI has incrementally learned eight different syntaxes (or tasks), with which a machine thinking loop has been formed and validated by the proper interaction between language and vision system. The paper provides a new architecture to let the machine learn, understand and use language in a human-like way that could ultimately enable a machine to construct fictitious 'mental' scenario and possess intelligence.

Citations (5)

Summary

  • The paper presents the LGI network which integrates vision, language, and PFC modules to simulate human-like iterative thinking.
  • It demonstrates capabilities in language-controlled image manipulation, object classification with 72.7% accuracy, and size judgment tasks.
  • The study introduces a biologically plausible, modular architecture that lays the groundwork for advanced AI systems with imagination.

Human-like Machine Thinking: Language Guided Imagination

Introduction

The paper "Human-like machine thinking: Language guided imagination" explores a novel approach to create a machine learning framework, the Language Guided Imagination (LGI) network, which integrates vision and language to emulate human-like thinking processes. Distinct from conventional NLP models that predominantly focus on word probability estimation, LGI endeavors to systematically learn and employ complex syntactic structures through the interaction of vision and language subsystems, mimicking human cognitive processes. This network capitalizes on incremental learning principles demonstrated by humans during cognitive development, thereby setting a foundation for machines to simulate abstract mental scenarios and demonstrate advanced intelligence features akin to human thinking.

Architecture

The LGI network comprises three distinct subsystems: a vision system, a language system, and a PreFrontal Cortex (PFC) module.

  • Vision System: This subsystem encompasses an encoder that derives abstract population representations from input data and an imagination decoder to reconstruct scenarios from these high-level representations. This setup mimics the functionalities of the human Anterior Temporal Lobe (AIT) responsible for processing comprehensive image representations.
  • Language System: It integrates a binarizer converting text symbols into binary vectors and an IPS layer (implemented using LSTM) to extract quantity-related information from text inputs, followed by a textizer that translates binary vectors back into text symbols. This configuration imitates the human IntraParietal Sulcus's function of numeric information processing.
  • PFC Module: Functioning as the model's working memory, the PFC (also established using LSTMs) manages inputs from both language and vision components and predicts text symbols and manipulated imagery. It forms a closed loop system via the vision and language interactions, enabling sustained cognitive-like operations.

The LGI network's innovative architecture negates components unsupported by neuroscience, such as softmax operations and CNN-like kernel scanning, opting instead for biologically plausible mechanisms, such as direct neuron output classification and coordinated population coding.

Experimental Results

The LGI demonstrated efficacy in learning eight syntactic constructs, each facilitating different cognitive tasks:

  1. Language-controlled Manipulations: Examples include "move left/right," showcasing LGI's ability to reconstruct and predict image manipulations upon command, evidencing its command comprehension and execution.
  2. Object Classification: Using "this is ...", the LGI proficiently classified numerical digits based on varying morphological inputs without resorting to softmax, achieving a classification accuracy of 72.7%.
  3. Size Judgment Tasks: Learning commands like "the size is big/small" and their negations, the network proficiently assessed and output size-related observations.
  4. Fictitious Instance Generation: Commands like "give me a [number]" prompted the LGI to produce digital representations that embodied the abstract idea of specified numerical identities.

The framework's resilience to cumulative learning was evident as previously learned syntaxes facilitated expedited learning of related constructs, paralleling human cognitive growth patterns.

Discussion and Future Directions

LGI's establishment of a human-like iterative thinking loop through interaction between its components lays groundwork for enhanced machine cognitive systems. The paper posits that integrating imagination capability, such as that utilized in mental simulations for strategizing in Go without reinforcement learning, can broaden AI's generalization ability across varied applications.

Prospective advancements include incorporating functionalities like mathematical reasoning, intuitive physics understanding, and navigation capabilities, as well as enriching the model with auditory processing interfaces. Further exploration is recommended into the PFC's nuanced role, paralleling the complex prefrontal interconnections in biological specimens, to pave paths toward developing comprehensive machine intelligence.

Conclusion

The LGI network represents a significant stride in designing AI systems that emulate human cognitive processes. By introducing subsystems analogous to the vision, language, and PFC cortical areas, the model not only learns multiple syntaxes but also successfully engages in a machine thinking loop derived from human brain functions. This approach highlights a commitment to embodying biologically plausible mechanisms in AI development, opening new avenues for research in machine intelligence.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.