Sci2Pol-Corpus: Bridging Science & Policy

Updated 2 October 2025

Sci2Pol-Corpus is a curated training dataset that synthesizes scientific findings into concise policy briefs through multi-stage LLM-based filtering and expert polishing.
It leverages large-scale candidate pair retrieval and automated alignment to ensure high-quality matching between scientific literature and policy documents.
Fine-tuned models on Sci2Pol-Corpus demonstrate notable performance gains in policy brief generation, outperforming larger commercial models in key tasks.

Sci2Pol-Corpus is a curated training dataset designed to enhance the performance of LLMs in generating concise and actionable policy briefs from dense scientific literature. Developed within the broader Sci2Pol framework (Wu et al., 25 Sep 2025), it enables the fine-tuning of LLMs on the highly specialized task of summarizing the core findings of scientific papers in a format suitable for policy-making and decision support. The construction of Sci2Pol-Corpus leverages large-scale retrieval, automated alignment, and sophisticated LLM-based judgment to provide robust supervision for bridging science and policy communication.

1. Corpus Construction and Candidate Pair Retrieval

The initial phase of corpus construction involved the systematic pairing of scientific literature with relevant policy documents by exploiting large-scale citation data. Specifically, 5.6 million policy records (including sources such as Overton) were indexed and cross-referenced against scientific literature (notably the SciSciNet repository). Each policy document was linked to every cited scholarly paper, but only documents citing no more than three scientific articles were retained to maximize the likelihood of a focused and content-aligned brief. This heuristic filtering yielded approximately 140,000 candidate paper–policy document pairs.

A plausible implication is that restricting the citation count improves alignment between scientific content and policy discourse, reducing noise from multi-topic documents.

2. LLM-Based Filtering and Quality Judgement

To isolate cases where the policy document substantively addresses the scientific findings, a two-stage LLM-based quality judgment process was employed using GPT-o3 as a “judge”:

Coarse-grained Filtering: For each candidate pair, only the scientific abstract and the executive summary or main body of the policy document were provided to GPT-o3. Prompts assessed whether the policy text discusses key findings, methods, and conclusions in detail and relevance. Candidates passing this criterion were reduced to about 1,407 pairs, with document length normalization yielding 1,011 usable pairs.
Fine-grained Filtering: The full text of both the scientific paper and policy document was input to GPT-o3. An additional similarity metric was computed to exclude pairs where the policy text was merely a reformat of the scientific document. The final filtering step produced 639 high-quality paper–brief pairs.

This approach rigorously enforces domain-relevant semantic alignment, ensuring that only pairs where the policy brief synthesizes, rather than simply mirrors, scientific content are retained.

3. In-Context Expert Polishing Procedure

Although official policy documents offer substantive content, their length and style diverge from the targeted policy brief format. An in-context learning strategy was adopted for stylistic normalization. Three expert-written paper–brief pairs from a reference set of 85 published briefs served as exemplars. GPT-o3 operated in a “polishing” role, revising candidate briefs to match the tone, structure, and clarity expected in high-quality policy communication, while preserving accurate citations and factual content.

This step operationalizes best practices in scientific translation for policy, leveraging expert guidance to instill stylistic consistency and accessibility in the final dataset.

4. Architecture of Policy Brief Generation Tasks

Sci2Pol-Corpus is fundamentally oriented toward the training and evaluation of models on multi-stage policy brief generation. Sci2Pol-Bench, the associated benchmark, decomposes the human writing process into five stages: Autocompletion, Understanding, Summarization, Generation, and Verification. The Generation stage is specifically constructed to test the model’s ability to produce high-quality policy briefs.

For Task 11 (Policy Problem Generation), the policy brief is decomposed into five semantic components:

$C = \{\text{background},\ \text{existing\_problem},\ \text{consequence},\ \text{attention\_problem},\ \text{supporting\_detail}\}$

A component-wise LLM scoring is conducted, formalized as:

$S_{\textrm{raw}} = \sum_{c\in C}\left[\mathrm{prob}_{\textrm{imp}}(c)\times\mathrm{prob}_{\textrm{qual}}(c)\right]$

where $\mathrm{prob}_{\textrm{imp}}(c)$ and $\mathrm{prob}_{\textrm{qual}}(c)$ denote the LLM-assigned importance and quality scores for each component. Scores are scaled (e.g., by multiplying by 20) to yield normalized metrics in the $[0,100]$ range. This multi-component rubric is iterated across additional generation tasks focusing on completeness, accuracy, clarity, and actionability.

The adoption of LLM-based evaluation reflects the inadequacy of classical metrics such as ROUGE and BERTScore for brief writing, necessitating reference-based expert-aligned judgment.

5. Fine-Tuning and Model Evaluation on Sci2Pol-Bench

Three open-source LLMs—LLaMA-3.1-8B-Instruct, Gemma-12B-Instruct, and Gemma-27B-Instruct—were fine-tuned explicitly on the Sci2Pol-Corpus pairs. Supervised fine-tuning yielded consistent and substantial performance gains across the Sci2Pol-Bench benchmark. Quantitatively, LLaMA-3.1-8B exhibited an average improvement of +7.64 points, Gemma-12B +3.12, and Gemma-27B +2.03 over respective baselines. Notably, post-training Gemma-27B outperformed much larger commercial models, including GPT-4o and DeepSeek-V3 (671B), in critical policy brief writing tasks.

This result substantiates the premise that domain-specific supervision, using a carefully constructed corpus of policy-anchored scientific summaries, can enable smaller LLMs to exceed the performance of far larger foundation models.

6. Structure of Dataset and Supervision Signals

Each data point in Sci2Pol-Corpus consists of:

The full scientific paper (or abstract)
The corresponding high-quality, expert-polished policy brief
Linked metadata, including document identifiers, citation details, and policy context

The supervision signals emphasize key transitions: accurate translation of findings, integration of methods and conclusions into policy recommendations, and preservation of factual content with appropriate citation style. The multi-stage LLM filtering and polishing ensure that dataset quality adheres strictly to the target domain standards.

A plausible implication is that such high-fidelity alignment enables robust generalization on varied policy domains, so long as the candidate policy document is scientifically focused.

7. Significance and Future Research Directions

Sci2Pol-Corpus provides the first large-scale, systematically curated training set for scientific-to-policy brief generation, enabling rigorous evaluation and improvement of LLMs for this specialized task. Its construction methodology—large-scale retrieval, LLM-based semantic alignment, and exemplar-driven in-context revision—establishes a scalable blueprint for future datasets in related science-to-policy translation domains.

A plausible implication is the extension of this corpus to multilingual briefs, domain adaptation for other scientific fields, and the inclusion of additional task rubrics aligned to policy-maker requirements. The demonstrated efficacy of smaller, fine-tuned models points toward sustainable and accessible solutions for government, NGO, and academic stakeholders seeking to leverage automated science-to-policy translation.

In summary, Sci2Pol-Corpus constitutes a pivotal resource for enabling high-quality, efficient, and expert-aligned policy brief generation from scientific work, advancing both the methodology and practical utility of LLMs in scientific communication with policy audiences (Wu et al., 25 Sep 2025).

PDF Markdown Chat (Pro)

References (1)

Sci2Pol: Evaluating and Fine-tuning LLMs on Scientific-to-Policy Brief Generation (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Sci2Pol-Corpus.