Papers
Topics
Authors
Recent
Search
2000 character limit reached

Multi-LoRA Interaction for Math Reasoning Distillation

Updated 21 August 2025
  • The paper introduces a modular multi-LoRA framework that integrates rapid intuition-based and deliberate reasoning to solve mathematical problems with iterative feedback.
  • LoRID employs three specialized modules—IR for fast chain-of-thought, KG for explicit knowledge extraction, and DR for deep reasoning—yielding state-of-the-art performance on GSM8K benchmarks.
  • The approach achieves parameter efficiency by significantly boosting small language models' accuracy and reducing resource costs compared to larger models.

Multi-LoRA Interaction for Mathematical Reasoning Distillation (LoRID) is a reasoning distillation paradigm for small LLMs (SLMs) that utilizes multiple Low-Rank Adaptation (LoRA) modules in an interactive, cognitively inspired framework. LoRID combines parameter-efficient adaptation strategies, explicit modeling of distinct reasoning modes, and iterative inference to endow SLMs with robust mathematical reasoning abilities, achieving state-of-the-art results on mathematical benchmarks by leveraging principles of modularity, orthogonality, and feedback-driven knowledge transfer (Li et al., 18 Aug 2025).

1. The Multi-LoRA Interaction Paradigm

LoRID introduces a specialized architecture in which three LoRA modules are concurrently fine-tuned on a shared base model, each corresponding to a distinct stage or type of reasoning. Specifically:

  • The Intuitive Reasoner (IR) LoRA models "System 1" (fast, intuition-based reasoning), directly producing chain-of-thought (CoT) solutions from the question input.
  • The Knowledge Generator (KG) LoRA extracts structured problem-relevant knowledge in natural language, simulating explicit knowledge acquisition.
  • The Deep Reasoner (DR) LoRA embodies "System 2" (slow, deliberate reasoning) by solving the problem via explicit use of KG's output.

Mathematically, each module independently modifies the base weight matrix WinitW_{init} via its own low-rank update:

WIR=Winit+AIRBIR, WKG=Winit+AKGBKG, WDR=Winit+ADRBDR.W_{\mathrm{IR}} = W_{init} + A_{\mathrm{IR}} B_{\mathrm{IR}}, \ W_{\mathrm{KG}} = W_{init} + A_{\mathrm{KG}} B_{\mathrm{KG}}, \ W_{\mathrm{DR}} = W_{init} + A_{\mathrm{DR}} B_{\mathrm{DR}}.

During inference, the modules produce solutions in a plug-and-play manner, allowing modular activation and comparison of their outputs (Li et al., 18 Aug 2025).

2. System 1 and System 2 Reasoning in LoRID

LoRID is inspired by the dual-process theory of human cognition, specifically “System 1” and “System 2” modes:

  • System 1 (IR): Fast, context-sensitive, automatic generation of solutions (quick CoT directly from question).
  • System 2 (KG + DR): Deliberative, stepwise problem solving: KG first generates explicit task-relevant knowledge from the question, and DR uses that knowledge as additional input, producing a structured, detailed CoT response.

A core benefit of this decomposition is managing distinct error profiles. The paper illustrates cases where IR (System 1) fails due to superficial intuition, but DR, guided by explicit knowledge from KG, corrects the error via more methodical computation (Li et al., 18 Aug 2025).

3. Iterative Inference and Mutual Feedback

A characteristic innovation in LoRID is its iterative inference process. For each math problem:

  • IR and DR generate their respective answers (AIR\mathcal{A}_{\mathrm{IR}}, ADR\mathcal{A}_{\mathrm{DR}}).
  • If the answers match, the model returns the solution immediately.
  • If they differ, inference is repeated (for up to t=20t = 20 rounds), enabling mutual refinement through feedback.

This process both reduces randomness and increases answer consistency, as repeated inference allows each module to revise its output based on prior feedback. This iterative regime closely mirrors human problem-solving, where initial intuition is cross-validated and corrected through further reflection and analysis (Li et al., 18 Aug 2025).

4. Training Procedure and Formalization

The components are trained as follows:

  • IR Loss: LIR=1ni=1n(f(qi;θWIR),[riai])\mathcal{L}_{\mathrm{IR}} = \frac{1}{n} \sum_{i=1}^n \ell(f(q_i; \theta_{W_{\mathrm{IR}}}), [r_i \oplus a_i]),
  • KG Loss: LKG=1ni=1n(f(qi;θWKG),ki)\mathcal{L}_{\mathrm{KG}} = \frac{1}{n} \sum_{i=1}^n \ell(f(q_i; \theta_{W_{\mathrm{KG}}}), k_i),
  • DR Loss: LDR=1ni=1n(f([qiki];θWDR),[riai])\mathcal{L}_{\mathrm{DR}} = \frac{1}{n} \sum_{i=1}^n \ell(f([q_i \oplus k_i]; \theta_{W_{\mathrm{DR}}}), [r_i \oplus a_i]),

where qiq_i is the question, rir_i the reasoning, aia_i the answer, kik_i the explicit knowledge string, and \ell the loss function (e.g., cross-entropy). Each LoRA is plug-and-play, interacting via shared base weights and orchestrated input/output flows.

5. Empirical Results and Performance

LoRID achieves state-of-the-art results on GSM8K math word problem benchmarks:

  • On five base models (LLaMA-2-7B, LLaMA-3-8B, Mistral-7B, Qwen2.5-Math-7B, DeepSeekMath-7B), LoRID outperformed the second-best methods by margins of 2.3%, 16.1%, 2.4%, 12.3%, and 1.8% in accuracy, respectively.
  • Ablation further demonstrates that the full system (IR+KG+DR with iteration) yields between 4.4% and 25.0% improvements relative to single-module ablations.

The architecture supports robust generalization by leveraging both rapid CoT generation and explicit, knowledge-based problem decomposition (Li et al., 18 Aug 2025).

6. Parameter Efficiency and Small Model Advantages

LoRID leverages the parameter efficiency inherent in LoRA for all modules, allowing SLMs—models with orders of magnitude fewer parameters than standard LLMs—to match or surpass previous large-model baselines for math reasoning. This method reduces the dependence on heavy teacher LLMs and massive distillation sets by combining modular, explicit knowledge generation with plug-and-play fine-tuning strata (Li et al., 18 Aug 2025).

A summary table illustrates the modular design:

Component Function LoRA Block
IR Direct question → CoT+answer AIR,BIRA_{\mathrm{IR}}, B_{\mathrm{IR}}
KG Question → explicit knowledge AKG,BKGA_{\mathrm{KG}}, B_{\mathrm{KG}}
DR (Question+knowledge) → CoT+answer ADR,BDRA_{\mathrm{DR}}, B_{\mathrm{DR}}

7. Significance and Outlook

LoRID demonstrates that multi-LoRA interaction—implemented as cognitively inspired modular reasoning blocks with an iterative mutual feedback protocol—constitutes an effective path to distilling advanced mathematical reasoning in SLMs. The approach is parameter-efficient, supports robust error correction, and is aligned with dual-process cognitive theories. Its open, modular principle allows straightforward extension to other domains (e.g., code reasoning, instruction following) and portends new directions for plug-and-play, composable reasoning adaptation in foundation models.

A plausible implication is that as more nuanced modules (for safety, domain knowledge, computation) are added and orchestrated through similar interaction/iteration frameworks, small models may approach the reasoning robustness presently seen in massive LLMs, but at a fraction of the resource cost (Li et al., 18 Aug 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Multi-LoRA Interaction for Mathematical Reasoning Distillation (LoRID).