Papers

Topics

Authors

Recent

View all

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 86 tok/s

Gemini 2.5 Pro 56 tok/s Pro

GPT-5 Medium 31 tok/s Pro

GPT-5 High 33 tok/s Pro

GPT-4o 102 tok/s Pro

Kimi K2 202 tok/s Pro

GPT OSS 120B 467 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

KRUX: Framework for Scientific Reasoning

Updated 28 August 2025

KRUX is a framework designed to systematically probe and disentangle parametric knowledge from explicit reasoning in large language models for scientific tasks.
It employs controlled protocols including atomic knowledge extraction and prompt augmentation to isolate model limitations.
Empirical results indicate that explicit KI augmentation significantly boosts performance, underscoring the complementary roles of knowledge retrieval and reasoning.

KRUX is a framework introduced for the systematic probing and disentanglement of knowledge and reasoning capabilities in LLMs applied to scientific problem solving, aiming to clarify the distinct contributions of parametric knowledge retrieval and explicit reasoning strategies. This analytic protocol facilitates the evaluation and enhancement of scientific reasoning tasks by providing controlled access to atomic knowledge ingredients (KIs) and measuring the influence of reasoning‐focused fine tuning.

1. Motivation and Research Objectives

KRUX addresses the absence of holistic benchmarks and systematic approaches for evaluating scientific reasoning in LLMs (Li et al., 26 Aug 2025). The framework is constructed to answer three foundational questions:

RQ1: How does explicit external knowledge, when added in-context, affect the performance of base models?
RQ2: Do models fine-tuned for reasoning (e.g., via chain-of-thought methods) benefit further from such external knowledge?
RQ3: Does reasoning-oriented fine tuning itself improve a model’s ability to surface and utilize helpful knowledge, even in the absence of in-context augmentation?

The intention is to isolate the contributions and bottlenecks associated with parametric knowledge retrieval versus reasoning competence, particularly in the context of scientific tasks where intricate domain knowledge and multi-step deduction are required.

2. Methodology: Probing Knowledge and Reasoning

KRUX implements a controlled experimental protocol:

Extraction of Atomic Knowledge Ingredients (KIs): Using a strong reasoning model and targeted prompting (e.g., DeepSeek-R1, as shown in Figure 1 of the source), atomic, answer-agnostic knowledge facts, relationships, and mechanisms are distilled from chain-of-thought (CoT) traces.
Prompt Augmentation: The extracted KIs are prepended to the original scientific query, providing in-context explicit knowledge that does not disclose the answer but encapsulates relevant facts.
Comparative Evaluation: Target models are then assessed on performance with and without KI augmentation. The pipeline is:

1	Question → Model generates CoT → KI extractor → Augmented prompt (Question + KIs) → Evaluate model

Reasoning Fine Tuning: Standard supervised fine-tuning (SFT) loss is used:

$\mathcal{L}(x, y)=-\sum_t \log p(y_t | y_{1:t-1}, x)$

This methodology systematically separates knowledge stored in model parameters from reasoning skills manifested via articulated CoT, allowing for fine-grained diagnosis of model limitations.

3. Empirical Findings and Performance Analysis

(Li et al., 26 Aug 2025) presents several quantifiable insights regarding scientific reasoning in LLMs:

Knowledge Retrieval Bottleneck: Base instruct models, when supplied with high-quality in-context KIs, surpass reasoning‐fine‐tuned models by more than 10%, indicating that retrieval of task-relevant knowledge is a primary bottleneck.
Additive Utility of External Knowledge: Reasoning-enhanced models (CoT SFT) realize significant additional performance gains when explicit KIs are externally provided, confirming that reasoning and knowledge interaction is complementary rather than independent.
Verbalized Reasoning Enhances Knowledge Recall: KIs derived from reasoning-specialized models (such as math-focused variants) are more effective in augmenting downstream performance, suggesting that verbalized reasoning facilitates more precise and exhaustive knowledge access.
Latent Knowledge Accessibility: Even though models may possess all necessary information in their weights, failures in parametric retrieval are frequent unless chain-of-thought, or explicit retrieval cues, are present.

A plausible implication is that explicit prompts with high-quality KIs can compensate for deficits in internal retrieval pathways, and that future architectures may need to combine parametric and non-parametric knowledge sources for robust scientific reasoning.

4. Implications for LLM Development in Science

KRUX serves as a practical diagnostic instrument for designer and developer audiences:

Diagnostic Applications: Identifies whether a model’s deficiencies on scientific tasks are primarily due to knowledge retrieval limits or reasoning gaps.
Model Development Strategy: Informs decisions on whether to prioritize reasoning-centric fine tuning or to integrate external knowledge modules, retrieval systems, or hybrid solutions.
Architectural Consequences: The magnitude of improvement from KI augmentation implies the future importance of explicit external memory layers or retrieval-augmented generation specific to scientific domains.
Verbalization and Explainability: The effectiveness of CoT-derived KIs underscores the utility of finely tuned reasoning verbalization both for boosting performance and for rendering model predictions transparent to human users.

5. Technical Details and Formalization

KRUX's formal components include:

Component	Technical Description	Role
Atomic KI Extraction	Prompted chain-of-thought, distilled via secondary protocols	Disentangle knowledge source
Prompt Augmentation	In-context addition of KIs (answer-agnostic)	Bottleneck analysis
SFT Loss	$-\sum_t \log p(y_t \| y_{1:t-1}, x)$	Reinforces reasoning and recall
Performance Comparison	Gap between base/CoT models with/without KI augmentation	Isolate knowledge/reasoning

Key metrics are defined as absolute and relative performance change under the addition of explicit KIs, and comparative results between base and reasoning-tuned models across the SciReas and SciReas-Pro benchmark suites.

6. Relation to Existing Benchmarks and Data Composition

The framework is evaluated on SciReas, a suite of existing scientific reasoning benchmarks, and SciReas-Pro, a subset requiring increased reasoning complexity (Li et al., 26 Aug 2025). It is also compared to contemporary approaches employing extensive chain-of-thought supervised fine tuning (CoT SFT), such as the SciLit01 8B baseline released by the authors.

KRUX surfaces performance trends not observable when relying exclusively on individual benchmarks, offering holistic evaluation and deeper understanding of the interplay between knowledge retrieval and reasoning in LLMs.

7. Future Outlook and Research Directions

Potential future directions suggested by the empirical findings include:

Development of hybrid LLM architectures that seamlessly fuse parametric knowledge in model weights with external retrieval or explicit KI modules.
Innovations in chain-of-thought prompting, both for knowledge surfacing and explainability.
Extension of KRUX-style protocols to other domains requiring formal reasoning (e.g., mathematics, engineering).
Systematic scaling and benchmarking of diagnostic KI protocols for automated curriculum design in science-focused models.

This suggests that KRUX defines a foundational protocol for evaluating, diagnosing, and guiding the advancement of LLMs toward more reliable, interpretable, and knowledge-augmented scientific reasoning.

PDF Markdown Chat (Pro)

References (1)

Demystifying Scientific Problem-Solving in LLMs by Probing Knowledge and Reasoning (2025)

Follow Topic

Get notified by email when new papers are published related to KRUX.

KRUX: Framework for Scientific Reasoning

1. Motivation and Research Objectives

2. Methodology: Probing Knowledge and Reasoning

3. Empirical Findings and Performance Analysis

4. Implications for LLM Development in Science

5. Technical Details and Formalization

6. Relation to Existing Benchmarks and Data Composition

7. Future Outlook and Research Directions

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

KRUX: Framework for Scientific Reasoning

1. Motivation and Research Objectives

2. Methodology: Probing Knowledge and Reasoning

3. Empirical Findings and Performance Analysis

4. Implications for LLM Development in Science

5. Technical Details and Formalization

6. Relation to Existing Benchmarks and Data Composition

7. Future Outlook and Research Directions

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research