Fathom-Synthesizer-4B: Structured Citation Synthesis
- Fathom-Synthesizer-4B is a synthesis model designed for generating structured, citation-dense research reports from multi-step search traces.
- It uses an explicit plan-then-write pipeline with advanced transformer architecture, YaRN RoPE-scaling, and FlashAttention-2 for long-context efficiency.
- Its integration with Fathom-Search-4B ensures precise evidence mapping and robust citation fidelity in open-ended academic research.
Fathom-Synthesizer-4B is a specialized synthesis model, built upon the Qwen3-4B architecture, designed for structured, citation-dense report generation in long-horizon, multi-step open-ended research tasks. It serves as the synthesis module within the Fathom-DeepResearch agentic system, transforming DeepSearch traces from the sibling Fathom-Search-4B model into coherent, evidence-mapped DeepResearch Reports. The model integrates explicit section-level planning, extended-context ingestion, and strict citation control, establishing new standards in citation fidelity and structure in open-weight research agents (Singh et al., 28 Sep 2025).
1. Model Architecture and Implementation
Fathom-Synthesizer-4B is realized via supervised fine-tuning of Qwen3-4B, maintaining the original neural architecture:
- Parameters: 4 billion
- Layers: 32 Transformer layers
- Hidden Size: 4096
- Attention Heads: 32
- Feed-forward Inner Dimension:
- Position Embedding: Rotary positional embeddings (RoPE)
The sole architectural modification from Qwen3-4B is the application of YaRN RoPE-scaling (factor 2.0, type=yarn), extending context length to 65,536 tokens. FlashAttention-2 is employed throughout training and inference for computational efficiency, with sequence-parallel size set to 4. No further additions or changes are introduced to the base transformer (Singh et al., 28 Sep 2025).
2. Training Data and Supervisory Objectives
The model is trained on DeepResearch-SFT, a corpus of 2,500 synthetic supervised fine-tuning samples distilled from GPT-5. Each sample consists of:
- An open-ended question
- A DeepSearch trace containing all tool responses, snippets, and retrieved URLs from Fathom-Search-4B
- A structured plan , with:
- , an ordered sub-question decomposition
- , a mapping from each evidence item to the corresponding report section(s)
- , abstract synthesis directives
- A structured report including:
- Executive Summary
- Sectioned Main Body (each includes only the evidence mapped to it, with inline [URL] citations)
- Deduplicated “Sources Used” list
The training objective is next-token cross-entropy over the sequence:
where 0 concatenates the planning block 1 and the structured report 2 (Singh et al., 28 Sep 2025).
Core training hyperparameters include 5 epochs, bf16, gradient-accumulation=8, context window=65,536, cosine LR schedule (peak 3), and Adam with 4. FlashAttention-2 is active throughout.
3. Trace-to-Report Conversion Protocol
Fathom-Synthesizer-4B executes a deterministic Plan-then-Write pipeline. Upon receiving inputs 5, the model first emits a private planning block:
6
followed by the report 7, which strictly adheres to the following structure:
- Executive Summary
- Sectioned Main Body: For each 8 (from 9), only evidence explicitly mapped in 0 may be cited, using inline brackets [URL]. Section scopes prevent evidence or citations from leaking across boundaries.
- Deduplicated “Sources Used”: At the end, aggregating all primary evidence.
This structuring enforces both citation accuracy and transparency, with a report layout that is fully governed by the inferred plan (Singh et al., 28 Sep 2025).
4. Integration with the DeepResearch Pipeline
Fathom-Synthesizer-4B forms the synthesis half of the DeepResearch agent, complementing Fathom-Search-4B. The system is modular: Fathom-Search-4B performs long-horizon, evidence-first investigation, with explicit control over search trajectory breadth, depth, and horizon through reinforcement learning components such as RAPO and steerable step-level rewards (component-specific formulas provided in (Singh et al., 28 Sep 2025)). Fathom-Synthesizer-4B, in turn, receives the entire multi-turn search trace and constructs a report that is tightly coupled to the search provenance.
Notably, the two-step agent design—ongoing search followed by explicit synthesis—allows for reliable tool-calling at scale (over 20 calls when necessary), citation-dense synthesis, and clear evidence provenance. This division of labor outperforms previous monolithic agentic models on both DeepSearch and narrative research benchmarks (Singh et al., 28 Sep 2025).
5. Empirical Performance and Evaluation
Evaluation covers multiple benchmarks, using GPT-4.1-mini as the judge. Fathom-DeepResearch (with Fathom-Synthesizer-4B as its synthesis arm) achieves state-of-the-art results in the open-weights tier:
Fathom-Search-4B (Stage 2) on DeepSearch + Reasoning:
| SimpleQA | FRAMES | WebWalker | Seal0 | MuSiQue | Avg1 | HLE | AIME-25 | GPQA-Diamond | MedQA | Avg2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy % | 90.0 | 64.8 | 50.0 | 22.5 | 33.2 | 52.1 | 9.5 | 70.0 | 60.1 | 75.4 | 53.8 |
Fathom-DeepResearch on DeepResearch-Bench:
| Overall | Comp. | Depth | Inst. | Read. | C. Acc. | E. Cit. | |
|---|---|---|---|---|---|---|---|
| Fathom-DeepResearch | 45.47 | 42.98 | 45.14 | 48.25 | 46.12 | 56.1 | 38.3 |
These metrics indicate strong generalization to complex, open-ended research and reasoning tasks, with robust citation fidelity and structured reporting (Singh et al., 28 Sep 2025).
6. Limitations and Future Directions
Several limitations are acknowledged:
- RAPO alone, in absence of steerable incremental reward, leads to trace length saturation and diminishing returns beyond approximately 6,000 tokens.
- Synchronous, end-to-end pipeline training is brittle; asynchronous decoupled pipelines are identified as a promising future research direction.
- The SFT corpus (2,500 samples) may limit further gains; scaling with broader or human-curated supervision could yield further improvements in synthesis quality.
- The report quality and citation accuracy remain fundamentally dependent on the underlying quality and granularity of the search trace provided by Fathom-Search-4B.
A plausible implication is that advances in both search trace richness and plan granularity will amplify Fathom-Synthesizer-4B’s utility in research-oriented agentic workflows (Singh et al., 28 Sep 2025).
7. Significance and Contributions
Fathom-Synthesizer-4B delivers explicit plan-based synthesis, section-wise evidence mapping, and RoPE-extended long-context supervised learning to the open-source agentic research modeling ecosystem. Its strict citation control, transparency, and structured reporting rival those of much larger proprietary models. The division of search and synthesis, along with tractable context scaling and open-weights availability, establishes a new reference point for robust, modular information-seeking agents supporting complex academic inquiry (Singh et al., 28 Sep 2025).