Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Can Large Language Models Learn Independent Causal Mechanisms? (2402.02636v2)

Published 4 Feb 2024 in cs.CL, cs.AI, cs.IT, cs.LG, and math.IT

Abstract: Despite impressive performance on LLMling and complex reasoning tasks, LLMs fall short on the same tasks in uncommon settings or with distribution shifts, exhibiting a lack of generalisation ability. By contrast, systems such as causal models, that learn abstract variables and causal relationships, can demonstrate increased robustness against changes in the distribution. One reason for this success is the existence and use of Independent Causal Mechanisms (ICMs) representing high-level concepts that only sparsely interact. In this work, we apply two concepts from causality to learn ICMs within LLMs. We develop a new LLM architecture composed of multiple sparsely interacting LLMling modules. We show that such causal constraints can improve out-of-distribution performance on abstract and causal reasoning tasks. We also investigate the level of independence and domain specialisation and show that LLMs rely on pre-trained partially domain-invariant mechanisms resilient to fine-tuning.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Gaƫl Gendron (14 papers)
  2. Bao Trung Nguyen (1 paper)
  3. Alex Yuxuan Peng (6 papers)
  4. Michael Witbrock (48 papers)
  5. Gillian Dobbie (21 papers)
Citations (1)