Fathom-Synthesizer-4B: Structured Citation Synthesis

Updated 3 July 2026

Fathom-Synthesizer-4B is a synthesis model designed for generating structured, citation-dense research reports from multi-step search traces.
It uses an explicit plan-then-write pipeline with advanced transformer architecture, YaRN RoPE-scaling, and FlashAttention-2 for long-context efficiency.
Its integration with Fathom-Search-4B ensures precise evidence mapping and robust citation fidelity in open-ended academic research.

Fathom-Synthesizer-4B is a specialized synthesis model, built upon the Qwen3-4B architecture, designed for structured, citation-dense report generation in long-horizon, multi-step open-ended research tasks. It serves as the synthesis module within the Fathom-DeepResearch agentic system, transforming DeepSearch traces from the sibling Fathom-Search-4B model into coherent, evidence-mapped DeepResearch Reports. The model integrates explicit section-level planning, extended-context ingestion, and strict citation control, establishing new standards in citation fidelity and structure in open-weight research agents (Singh et al., 28 Sep 2025).

1. Model Architecture and Implementation

Fathom-Synthesizer-4B is realized via supervised fine-tuning of Qwen3-4B, maintaining the original neural architecture:

Parameters: 4 billion
Layers: 32 Transformer layers
Hidden Size: 4096
Attention Heads: 32
Feed-forward Inner Dimension: $4 \times 4096$
Position Embedding: Rotary positional embeddings (RoPE)

The sole architectural modification from Qwen3-4B is the application of YaRN RoPE-scaling (factor 2.0, type=yarn), extending context length to 65,536 tokens. FlashAttention-2 is employed throughout training and inference for computational efficiency, with sequence-parallel size set to 4. No further additions or changes are introduced to the base transformer (Singh et al., 28 Sep 2025).

2. Training Data and Supervisory Objectives

The model is trained on DeepResearch-SFT, a corpus of 2,500 synthetic supervised fine-tuning samples distilled from GPT-5. Each sample consists of:

An open-ended question $q$
A DeepSearch trace $\tau = \{ \mathcal{R}_1, \dots, \mathcal{R}_T \}$ containing all tool responses, snippets, and retrieved URLs from Fathom-Search-4B
A structured plan $\pi = (\pi^{\mathrm{decomp}}, \pi^{\mathrm{map}}, \pi^{\mathrm{insight}})$ , with:
- $\pi^{\mathrm{decomp}} = (S_1, \ldots, S_n)$ , an ordered sub-question decomposition
- $\pi^{\mathrm{map}}$ , a mapping from each evidence item to the corresponding report section(s)
- $\pi^{\mathrm{insight}}$ , abstract synthesis directives
A structured report $r$ $r$ including:
- Executive Summary
- Sectioned Main Body (each $S_i$ includes only the evidence mapped to it, with inline [URL] citations)
- Deduplicated “Sources Used” list

The training objective is next-token cross-entropy over the sequence:

$\mathcal{L}_{\mathrm{SFT}} = -\sum_{t=1}^{|y|} \log p_{\theta}(y_{t}\mid y_{<t},\,q,\,\tau)$

where $q$ 0 concatenates the planning block $q$ 1 and the structured report $q$ 2 (Singh et al., 28 Sep 2025).

Core training hyperparameters include 5 epochs, bf16, gradient-accumulation=8, context window=65,536, cosine LR schedule (peak $q$ 3), and Adam with $q$ 4. FlashAttention-2 is active throughout.

3. Trace-to-Report Conversion Protocol

Fathom-Synthesizer-4B executes a deterministic Plan-then-Write pipeline. Upon receiving inputs $q$ 5, the model first emits a private planning block:

$q$ 6

followed by the report $q$ 7, which strictly adheres to the following structure:

Executive Summary
Sectioned Main Body: For each $q$ 8 (from $q$ 9), only evidence explicitly mapped in $\tau = \{ \mathcal{R}_1, \dots, \mathcal{R}_T \}$ 0 may be cited, using inline brackets [URL]. Section scopes prevent evidence or citations from leaking across boundaries.
Deduplicated “Sources Used”: At the end, aggregating all primary evidence.

This structuring enforces both citation accuracy and transparency, with a report layout that is fully governed by the inferred plan (Singh et al., 28 Sep 2025).

4. Integration with the DeepResearch Pipeline

Fathom-Synthesizer-4B forms the synthesis half of the DeepResearch agent, complementing Fathom-Search-4B. The system is modular: Fathom-Search-4B performs long-horizon, evidence-first investigation, with explicit control over search trajectory breadth, depth, and horizon through reinforcement learning components such as RAPO and steerable step-level rewards (component-specific formulas provided in (Singh et al., 28 Sep 2025)). Fathom-Synthesizer-4B, in turn, receives the entire multi-turn search trace and constructs a report that is tightly coupled to the search provenance.

Notably, the two-step agent design—ongoing search followed by explicit synthesis—allows for reliable tool-calling at scale (over 20 calls when necessary), citation-dense synthesis, and clear evidence provenance. This division of labor outperforms previous monolithic agentic models on both DeepSearch and narrative research benchmarks (Singh et al., 28 Sep 2025).

5. Empirical Performance and Evaluation

Evaluation covers multiple benchmarks, using GPT-4.1-mini as the judge. Fathom-DeepResearch (with Fathom-Synthesizer-4B as its synthesis arm) achieves state-of-the-art results in the open-weights tier:

Fathom-Search-4B (Stage 2) on DeepSearch + Reasoning:

	SimpleQA	FRAMES	WebWalker	Seal0	MuSiQue	Avg $\tau = \{ \mathcal{R}_1, \dots, \mathcal{R}_T \}$ 1	HLE	AIME-25	GPQA-Diamond	MedQA	Avg $\tau = \{ \mathcal{R}_1, \dots, \mathcal{R}_T \}$ 2
Accuracy %	90.0	64.8	50.0	22.5	33.2	52.1	9.5	70.0	60.1	75.4	53.8

Fathom-DeepResearch on DeepResearch-Bench:

	Overall	Comp.	Depth	Inst.	Read.	C. Acc.	E. Cit.
Fathom-DeepResearch	45.47	42.98	45.14	48.25	46.12	56.1	38.3

These metrics indicate strong generalization to complex, open-ended research and reasoning tasks, with robust citation fidelity and structured reporting (Singh et al., 28 Sep 2025).

6. Limitations and Future Directions

Several limitations are acknowledged:

RAPO alone, in absence of steerable incremental reward, leads to trace length saturation and diminishing returns beyond approximately 6,000 tokens.
Synchronous, end-to-end pipeline training is brittle; asynchronous decoupled pipelines are identified as a promising future research direction.
The SFT corpus (2,500 samples) may limit further gains; scaling with broader or human-curated supervision could yield further improvements in synthesis quality.
The report quality and citation accuracy remain fundamentally dependent on the underlying quality and granularity of the search trace provided by Fathom-Search-4B.

A plausible implication is that advances in both search trace richness and plan granularity will amplify Fathom-Synthesizer-4B’s utility in research-oriented agentic workflows (Singh et al., 28 Sep 2025).

7. Significance and Contributions

Fathom-Synthesizer-4B delivers explicit plan-based synthesis, section-wise evidence mapping, and RoPE-extended long-context supervised learning to the open-source agentic research modeling ecosystem. Its strict citation control, transparency, and structured reporting rival those of much larger proprietary models. The division of search and synthesis, along with tractable context scaling and open-weights availability, establishes a new reference point for robust, modular information-seeking agents supporting complex academic inquiry (Singh et al., 28 Sep 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Fathom-Synthesizer-4B.

Fathom-Synthesizer-4B: Structured Citation Synthesis

1. Model Architecture and Implementation

2. Training Data and Supervisory Objectives

3. Trace-to-Report Conversion Protocol

4. Integration with the DeepResearch Pipeline

5. Empirical Performance and Evaluation

Fathom-Search-4B (Stage 2) on DeepSearch + Reasoning:

Fathom-DeepResearch on DeepResearch-Bench:

6. Limitations and Future Directions

7. Significance and Contributions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Fathom-Synthesizer-4B: Structured Citation Synthesis

1. Model Architecture and Implementation

2. Training Data and Supervisory Objectives

3. Trace-to-Report Conversion Protocol

4. Integration with the DeepResearch Pipeline

5. Empirical Performance and Evaluation

Fathom-Search-4B (Stage 2) on DeepSearch + Reasoning:

Fathom-DeepResearch on DeepResearch-Bench:

6. Limitations and Future Directions

7. Significance and Contributions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research