Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization (2506.13331v1)

Published 16 Jun 2025 in cs.LG

Abstract: Human intelligence emerges from the interaction of specialized brain networks, each dedicated to distinct cognitive functions such as language processing, logical reasoning, social understanding, and memory retrieval. Inspired by this biological observation, we introduce the Mixture of Cognitive Reasoners (MiCRo) architecture and training paradigm: a modular transformer-based LLM with a training curriculum that encourages the emergence of functional specialization among different modules. Inspired by studies in neuroscience, we partition the layers of a pretrained transformer model into four expert modules, each corresponding to a well-studied cognitive brain network. Our Brain-Like model has three key benefits over the state of the art: First, the specialized experts are highly interpretable and functionally critical, where removing a module significantly impairs performance on domain-relevant benchmarks. Second, our model outperforms comparable baselines that lack specialization on seven reasoning benchmarks. And third, the model's behavior can be steered at inference time by selectively emphasizing certain expert modules (e.g., favoring social over logical reasoning), enabling fine-grained control over the style of its response. Our findings suggest that biologically inspired inductive biases involved in human cognition lead to significant modeling gains in interpretability, performance, and controllability.

Summary

Modular Reasoning with Brain-Inspired Architecture: Examining Mixture of Cognitive Reasoners

The paper presents a novel architectural paradigm, Mixture of Cognitive Reasoners (\ourmodel), designed to enhance interpretability, performance, and control in LLMs through biologically-inspired modular specialization. By leveraging insights into human cognitive networks, the paper proposes a framework where pretrained transformer models are partitioned into four distinct expert modules, each reflecting a specialized cognitive function akin to those exhibited in the human brain. These modules, categorized into language, logic, social reasoning, and world knowledge, emulate the brain's modular approach to processing information.

The authors employ a three-phase training methodology to foster clear specialization in these expert modules. The process begins with data-driven expert pretraining, followed by router refinement to direct token routing, and concludes with comprehensive instruction-tuning. This structured training emphasizes maintaining and enhancing specialization, enabling the model to outperform non-modular baselines on seven distinct reasoning benchmarks. A distinctive feature of this approach is its ability to modulate model inference behavior—by emphasizing certain expert modules, the model can adjust its response style, providing a granular control previously unseen in LLMs.

Quantitative analyses in the paper reveal substantial performance increments compared to traditional models, highlighting that domain-specific specialization not only improves interpretability but also leads to measurable gains in reasoning tasks such as GSM8K, MATH, and BBH. Moreover, the strategic expert architecture allows selective steering of model outputs, showcasing flexibility in handling diverse and complex social tasks juxtaposed against logical computations.

The implications of this research are profound, suggesting that integrating human cognitive principles into model architectures could pave the way for developing AI systems that resonate more closely with human thought processes. Furthermore, the model's compatibility with neuroscience tools suggests potential interdisciplinary applications, offering a unified framework to advance both AI and cognitive science research.

Future exploration might focus on expanding the specialization framework to incorporate additional cognitive networks, thereby improving model adaptability and scalability. Such enhancements could further refine the balance between specialized and generalist task performances, potentially transforming domain-specific applications in both the technical and scientific realms. Additionally, the paper calls for the evolution of neural datasets that enable deeper comparisons between artificial and biological cognition, which might uncover insights into the fidelity of brain-aligned modeling approaches.

In conclusion, this paper presents a pioneering direction in LLM architecture by integrating modular reasoning capabilities inspired by human cognition. Its implications are vast, affecting not only language processing systems but also offering a conceptual bridge to understand and emulate the intricate workings of the human brain in artificial entities.

Youtube Logo Streamline Icon: https://streamlinehq.com