Papers
Topics
Authors
Recent
Search
2000 character limit reached

Fathom-Synthesizer-4B: Structured Citation Synthesis

Updated 3 July 2026
  • Fathom-Synthesizer-4B is a synthesis model designed for generating structured, citation-dense research reports from multi-step search traces.
  • It uses an explicit plan-then-write pipeline with advanced transformer architecture, YaRN RoPE-scaling, and FlashAttention-2 for long-context efficiency.
  • Its integration with Fathom-Search-4B ensures precise evidence mapping and robust citation fidelity in open-ended academic research.

Fathom-Synthesizer-4B is a specialized synthesis model, built upon the Qwen3-4B architecture, designed for structured, citation-dense report generation in long-horizon, multi-step open-ended research tasks. It serves as the synthesis module within the Fathom-DeepResearch agentic system, transforming DeepSearch traces from the sibling Fathom-Search-4B model into coherent, evidence-mapped DeepResearch Reports. The model integrates explicit section-level planning, extended-context ingestion, and strict citation control, establishing new standards in citation fidelity and structure in open-weight research agents (Singh et al., 28 Sep 2025).

1. Model Architecture and Implementation

Fathom-Synthesizer-4B is realized via supervised fine-tuning of Qwen3-4B, maintaining the original neural architecture:

  • Parameters: 4 billion
  • Layers: 32 Transformer layers
  • Hidden Size: 4096
  • Attention Heads: 32
  • Feed-forward Inner Dimension: 4×40964 \times 4096
  • Position Embedding: Rotary positional embeddings (RoPE)

The sole architectural modification from Qwen3-4B is the application of YaRN RoPE-scaling (factor 2.0, type=yarn), extending context length to 65,536 tokens. FlashAttention-2 is employed throughout training and inference for computational efficiency, with sequence-parallel size set to 4. No further additions or changes are introduced to the base transformer (Singh et al., 28 Sep 2025).

2. Training Data and Supervisory Objectives

The model is trained on DeepResearch-SFT, a corpus of 2,500 synthetic supervised fine-tuning samples distilled from GPT-5. Each sample consists of:

  1. An open-ended question qq
  2. A DeepSearch trace τ={R1,,RT}\tau = \{ \mathcal{R}_1, \dots, \mathcal{R}_T \} containing all tool responses, snippets, and retrieved URLs from Fathom-Search-4B
  3. A structured plan π=(πdecomp,πmap,πinsight)\pi = (\pi^{\mathrm{decomp}}, \pi^{\mathrm{map}}, \pi^{\mathrm{insight}}), with:
    • πdecomp=(S1,,Sn)\pi^{\mathrm{decomp}} = (S_1, \ldots, S_n), an ordered sub-question decomposition
    • πmap\pi^{\mathrm{map}}, a mapping from each evidence item to the corresponding report section(s)
    • πinsight\pi^{\mathrm{insight}}, abstract synthesis directives
  4. A structured report rr including:
    • Executive Summary
    • Sectioned Main Body (each SiS_i includes only the evidence mapped to it, with inline [URL] citations)
    • Deduplicated “Sources Used” list

The training objective is next-token cross-entropy over the sequence:

LSFT=t=1ylogpθ(yty<t,q,τ)\mathcal{L}_{\mathrm{SFT}} = -\sum_{t=1}^{|y|} \log p_{\theta}(y_{t}\mid y_{<t},\,q,\,\tau)

where qq0 concatenates the planning block qq1 and the structured report qq2 (Singh et al., 28 Sep 2025).

Core training hyperparameters include 5 epochs, bf16, gradient-accumulation=8, context window=65,536, cosine LR schedule (peak qq3), and Adam with qq4. FlashAttention-2 is active throughout.

3. Trace-to-Report Conversion Protocol

Fathom-Synthesizer-4B executes a deterministic Plan-then-Write pipeline. Upon receiving inputs qq5, the model first emits a private planning block:

qq6

followed by the report qq7, which strictly adheres to the following structure:

  • Executive Summary
  • Sectioned Main Body: For each qq8 (from qq9), only evidence explicitly mapped in τ={R1,,RT}\tau = \{ \mathcal{R}_1, \dots, \mathcal{R}_T \}0 may be cited, using inline brackets [URL]. Section scopes prevent evidence or citations from leaking across boundaries.
  • Deduplicated “Sources Used”: At the end, aggregating all primary evidence.

This structuring enforces both citation accuracy and transparency, with a report layout that is fully governed by the inferred plan (Singh et al., 28 Sep 2025).

4. Integration with the DeepResearch Pipeline

Fathom-Synthesizer-4B forms the synthesis half of the DeepResearch agent, complementing Fathom-Search-4B. The system is modular: Fathom-Search-4B performs long-horizon, evidence-first investigation, with explicit control over search trajectory breadth, depth, and horizon through reinforcement learning components such as RAPO and steerable step-level rewards (component-specific formulas provided in (Singh et al., 28 Sep 2025)). Fathom-Synthesizer-4B, in turn, receives the entire multi-turn search trace and constructs a report that is tightly coupled to the search provenance.

Notably, the two-step agent design—ongoing search followed by explicit synthesis—allows for reliable tool-calling at scale (over 20 calls when necessary), citation-dense synthesis, and clear evidence provenance. This division of labor outperforms previous monolithic agentic models on both DeepSearch and narrative research benchmarks (Singh et al., 28 Sep 2025).

5. Empirical Performance and Evaluation

Evaluation covers multiple benchmarks, using GPT-4.1-mini as the judge. Fathom-DeepResearch (with Fathom-Synthesizer-4B as its synthesis arm) achieves state-of-the-art results in the open-weights tier:

Fathom-Search-4B (Stage 2) on DeepSearch + Reasoning:

SimpleQA FRAMES WebWalker Seal0 MuSiQue Avgτ={R1,,RT}\tau = \{ \mathcal{R}_1, \dots, \mathcal{R}_T \}1 HLE AIME-25 GPQA-Diamond MedQA Avgτ={R1,,RT}\tau = \{ \mathcal{R}_1, \dots, \mathcal{R}_T \}2
Accuracy % 90.0 64.8 50.0 22.5 33.2 52.1 9.5 70.0 60.1 75.4 53.8

Fathom-DeepResearch on DeepResearch-Bench:

Overall Comp. Depth Inst. Read. C. Acc. E. Cit.
Fathom-DeepResearch 45.47 42.98 45.14 48.25 46.12 56.1 38.3

These metrics indicate strong generalization to complex, open-ended research and reasoning tasks, with robust citation fidelity and structured reporting (Singh et al., 28 Sep 2025).

6. Limitations and Future Directions

Several limitations are acknowledged:

  • RAPO alone, in absence of steerable incremental reward, leads to trace length saturation and diminishing returns beyond approximately 6,000 tokens.
  • Synchronous, end-to-end pipeline training is brittle; asynchronous decoupled pipelines are identified as a promising future research direction.
  • The SFT corpus (2,500 samples) may limit further gains; scaling with broader or human-curated supervision could yield further improvements in synthesis quality.
  • The report quality and citation accuracy remain fundamentally dependent on the underlying quality and granularity of the search trace provided by Fathom-Search-4B.

A plausible implication is that advances in both search trace richness and plan granularity will amplify Fathom-Synthesizer-4B’s utility in research-oriented agentic workflows (Singh et al., 28 Sep 2025).

7. Significance and Contributions

Fathom-Synthesizer-4B delivers explicit plan-based synthesis, section-wise evidence mapping, and RoPE-extended long-context supervised learning to the open-source agentic research modeling ecosystem. Its strict citation control, transparency, and structured reporting rival those of much larger proprietary models. The division of search and synthesis, along with tractable context scaling and open-weights availability, establishes a new reference point for robust, modular information-seeking agents supporting complex academic inquiry (Singh et al., 28 Sep 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Fathom-Synthesizer-4B.