Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data (2406.14546v2)

Published 20 Jun 2024 in cs.CL, cs.AI, and cs.LG

Abstract: One way to address safety risks from LLMs is to censor dangerous knowledge from their training data. While this removes the explicit information, implicit information can remain scattered across various training documents. Could an LLM infer the censored knowledge by piecing together these implicit hints? As a step towards answering this question, we study inductive out-of-context reasoning (OOCR), a type of generalization in which LLMs infer latent information from evidence distributed across training documents and apply it to downstream tasks without in-context learning. Using a suite of five tasks, we demonstrate that frontier LLMs can perform inductive OOCR. In one experiment we finetune an LLM on a corpus consisting only of distances between an unknown city and other known cities. Remarkably, without in-context examples or Chain of Thought, the LLM can verbalize that the unknown city is Paris and use this fact to answer downstream questions. Further experiments show that LLMs trained only on individual coin flip outcomes can verbalize whether the coin is biased, and those trained only on pairs (x, f (x)) can articulate a definition of f and compute inverses. While OOCR succeeds in a range of cases, we also show that it is unreliable, particularly for smaller LLMs learning complex structures. Overall, the ability of LLMs to "connect the dots" without explicit in-context learning poses a potential obstacle to monitoring and controlling the knowledge acquired by LLMs.

PDF HTML Abstract

Inductive Out-of-Context Reasoning in LLMs

The paper "Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data" by Johannes Treutlein and colleagues explores an advanced capability of LLMs known as inductive out-of-context reasoning (OOCR). This form of reasoning involves inferring latent information distributed across various training documents and applying it to downstream tasks without explicit in-context learning cues. The paper is significant for its implications on the safety and monitorability of LLMs, as it underscores the potential for LLMs to connect scattered evidence from training data to reconstruct censored or implicit knowledge.

Methodology

The researchers developed a suite of five tasks to evaluate OOCR abilities in LLMs, which include:

Locations: Inferring the identity of an unknown city based on distances to known cities.
Coins: Determining the bias of coins from individual coin flip outcomes.
Functions: Learning mathematical functions from input-output pairs and using this knowledge for function inversion and composition.
Mixture of Functions: Learning a distribution over arithmetic functions without explicit variable names.
Parity Learning: Inferring Boolean assignments from parity conditions on variable tuples.

Each task demonstrates a unique aspect of OOCR and involves various forms of inductive reasoning. The authors finetuned GPT-3.5 and GPT-4 models on these tasks and performed comprehensive evaluations including comparison to in-context learning.

Key Findings

The experiments revealed several key findings:

Inductive OOCR Abilities: Across all five tasks, both GPT-3.5 and GPT-4 exhibited substantial capabilities in OOCR. These models could infer latent values from implicit evidences and successfully articulate this knowledge in downstream tasks.
Comparison with In-Context Learning: The paper compared performance using in-context learning methodologies and found that OOCR significantly outperformed in-context learning, especially with smaller datasets and complex structures. The GPT-4 model demonstrated superior performance compared to GPT-3.5 across all tasks.
Task Specific Performance:
- Locations: Models could identify unknown cities such as Paris from distance data with impressive accuracy. The performance on out-of-distribution queries such as local cuisines underscored LLMs' generalization abilities.
- Coins: Despite the stochastic nature of the task, models distinguished coin biases with reasonable accuracy.
- Functions: Function definitions and inversions were reliably output by finetuned models, extending to functions not explicitly seen during training.
- Mixture of Functions: Even without explicit variable names, models inferred sets of functions and their properties, though with a lower absolute accuracy.
- Parity Learning: Baseline against sophisticated theoretical learning problems, models inferred Boolean values and applied them in subsequent logical contexts.

Practical and Theoretical Implications

The ability of LLMs to perform OOCR opens up new directions in understanding and enhancing AI transparency and safety. Models that can infer and reason about latent structures inherently challenge existing methodologies for controlling and monitoring AI behavior. These capabilities necessitate more advanced frameworks for AI oversight and prompt-permissive learning environments.

Speculations for Future Developments

Scaling and Fine-Tuning: Investigating the scaling properties and further finetuning methodologies of OOCR-capable models would be of interest. Fine-tuning protocols could be optimized to enhance latent structure inference robustly.
Safety and Monitorability: The results underscore the importance of developing comprehensive safety protocols. Future models should incorporate mechanisms to make the inferred knowledge traceable and verifiable.
Mechanistic Interpretations: Further research into the internal representations and algorithmic processes within LLMs during OOCR tasks will be essential. Understanding how models aggregate and abstract information will provide deeper insights into their reasoning processes.

Conclusion

This paper provides a thorough demonstration of the substantial capabilities of LLMs in performing inductive out-of-context reasoning. By finetuning LLMs on carefully designed tasks, the researchers showed that these models could infer and articulate complex latent information, extending the boundaries of current AI capabilities. This poses important questions and challenges for the future of AI safety and monitoring, emphasizing the need for developing robust frameworks to oversee and control advanced AI systems. The paper is a crucial step towards understanding and leveraging the implicit reasoning abilities of LLMs.