Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning

Published 7 Mar 2025 in cs.CL, cs.AI, and cs.LG | (2503.05641v2)

Abstract: Combining existing pre-trained expert LLMs is a promising avenue for scalably tackling large-scale and diverse tasks. However, selecting experts at the task level is often too coarse-grained, as heterogeneous tasks may require different expertise for each instance. To enable adaptive instance-level mixing of pre-trained LLM experts, we propose Symbolic-MoE, a symbolic, text-based, and gradient-free Mixture-of-Experts framework. Symbolic-MoE takes a fine-grained approach to selection by emphasizing skills, e.g., algebra in math or molecular biology in biomedical reasoning. We propose a skill-based recruiting strategy that dynamically selects the most relevant set of expert LLMs for diverse reasoning tasks based on their strengths. Each selected expert then generates its own reasoning, resulting in k outputs from k experts, which are then synthesized into a final high-quality response by an aggregator chosen based on its ability to integrate diverse reasoning outputs. We show that Symbolic-MoE's instance-level expert selection improves performance by a large margin but -- when implemented naively -- can introduce a high computational overhead due to the need for constant model loading and offloading. To address this, we implement a batch inference strategy that groups instances based on their assigned experts, loading each model only once. This allows us to integrate 16 expert models on 1 GPU with a time cost comparable to or better than prior multi-agent baselines using 4 GPUs. Through extensive evaluations on diverse benchmarks (MMLU-Pro, GPQA, AIME, and MedMCQA), we demonstrate that Symbolic-MoE outperforms strong LLMs like GPT4o-mini, as well as multi-agent approaches, with an absolute average improvement of 8.15% over the best multi-agent baseline. Moreover, Symbolic-MoE removes the need for expensive multi-round discussions, outperforming discussion baselines with less computation.

Abstract PDF Upgrade to Chat

Summary

Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning

The paper titled "Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning" presents an innovative approach to improving the reasoning capabilities of Large Language Models (LLMs) by leveraging a skill-based Mixture of Experts (MoE) framework. This method, termed Symbolic-MoE, aims to address the limitations of traditional MoE systems by dynamically selecting a subset of expert models for each instance, rather than at the task level, to refine reasoning outputs using these specialized skills.

Core Contributions

Symbolic-MoE advances the traditional MoE paradigm through several key innovations:

Instance-Level Expert Selection: Unlike conventional approaches that recruit experts at the task level, Symbolic-MoE selects models based on the specific skills required for each instance. This skill-based recruiting process improves performance and efficiency by targeting relevant expertise for heterogeneous reasoning tasks.
Gradient-Free Framework: This approach is symbolic and text-based, integrating expert LLMs without the need for gradients, thus capitalizing on pre-trained models without requiring retraining or fine-tuning. It dynamically utilizes symbolic structures to infer the discrete skills needed for problem-solving.
Batch Inference Strategy: To mitigate the high computational overhead generally associated with multi-model integration, Symbolic-MoE introduces a smart batch inference strategy. This strategy effectively groups instances by their assigned experts, thus allowing efficient processing on a single GPU and reducing computational load.
Synthesis of Outputs: Symbolic-MoE generates multiple reasoning outputs from selected experts and synthesizes them into a high-quality response using an aggregator chosen for its ability to effectively integrate diverse inputs. This process bypasses the need for expensive multi-round discussions.

Empirical Evaluations

Through extensive evaluations on diverse benchmarks such as MMLU-Pro, GPQA, AIME, and MedMCQA, the paper demonstrates notable improvements in reasoning performance:

Symbolic-MoE achieves an absolute average improvement of $8.15\%$ over the best multi-agent baseline, showcasing the efficacy of instance-level expert selection.
It surpasses strong proprietary models like GPT4o-mini and performs comparably with larger 70B models, indicating its potential to optimize performance without necessitating large-scale computational resources.

Theoretical and Practical Implications

Practical implications of the research include enhanced efficiency in deploying LLMs for complex tasks requiring diverse expertise, significantly reducing the computational burden by integrating various models through a symbolic channel. It sets a precedent for model modularity, wherein pre-trained models can be reused effectively, promoting collaborative reasoning across different domains.

Theoretically, Symbolic-MoE might influence future developments in AI by demonstrating the effectiveness of adaptive skills-based routing. The modular and scalable nature of this approach paves the way for more flexible and powerful reasoning systems that can integrate multiple expert models dynamically—with potential applications in fields like automated reasoning, interdisciplinary research, and beyond.

Future Directions

Future work could explore extending the Symbolic-MoE framework to integrate emerging models with new specialization areas while further reducing computational requirements. Investigating the broader applicability of this method in diverse domains beyond academic benchmarks could also enhance its robustness and utility in real-world scenarios.

In summary, this paper contributes a significant optimization to LLM capabilities through skill-based MoE, offering avenues for impactful advancements in computational efficiency and domain-specific reasoning faculties within AI systems.