SkillFactory: Self-Distillation For Learning Cognitive Behaviors (2512.04072v1)
Abstract: Reasoning models leveraging long chains of thought employ various cognitive skills, such as verification of their answers, backtracking, retrying by an alternate method, and more. Previous work has shown that when a base LLM exhibits these skills, training that model further with reinforcement learning (RL) can learn to leverage them. How can we get models to leverage skills that aren't exhibited by base models? Our work, SkillFactory, is a method for fine-tuning models to roughly learn these skills during a supervised fine-tuning (SFT) stage prior to RL. Our approach does not rely on distillation from a stronger model, but instead uses samples from the model itself, rearranged to provide training data in the format of those skills. These "silver" SFT traces may be imperfect, but are nevertheless effective for priming a model to acquire skills during RL. Our evaluation shows that (1) starting from SkillFactory SFT initialization helps a model to generalize to harder variants of a task post-RL, despite lower performance pre-RL; (2) cognitive skills are indeed used by the model; (3) RLed SkillFactory models are more robust to regression on out-of-domain tasks than RLed base models. Our work suggests that inductive biases learned prior to RL help models learn robust cognitive skill use.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
Overview
This paper isn’t about new scientific results. It’s a clear, step-by-step guide that tells authors exactly how to format their research papers when submitting to the ICLR 2026 conference. Think of it like a “dress code” for papers: it makes sure all submissions look neat, consistent, and easy to read.
Key Objectives and Questions
The paper aims to answer simple, practical questions authors have when preparing a paper:
- How should the paper look (fonts, margins, page size)?
- How many pages are allowed?
- How do you organize headings and sections?
- How do you add figures and tables properly?
- How do you write citations and references?
- What tools and files should you use to build the paper?
Methods and Approach
Instead of doing experiments, the authors provide rules, examples, and ready-made “style files” that authors can use.
Here’s what that means in everyday language:
- LaTeX and style files: LaTeX is like a powerful word processor used for scientific papers, especially ones with math. The style files are templates that automatically set the correct fonts, margins, spacing, and layout so your paper follows the rules.
- Authors must use the official ICLR style file:
iclr2026_conference.sty, plus the matching bibliography fileiclr2026_conference.bst. - There’s a starter file (
iclr2026_conference.tex) you can fill in with your own content.
- Authors must use the official ICLR style file:
- Submission website: Papers are submitted online through OpenReview (
https://openreview.net/). - “Camera-ready” format: If your paper is accepted, you add
\iclrfinalcopyin your LaTeX file to adjust the layout for the final published version. - Citations and references: The paper requires the
natbibpackage, and it shows how to cite:\citet{...}for citations in the sentence (e.g., “See Smith (2020)”).\citep{...}for a parenthetical citation (e.g., “... (Smith, 2020)”).- References can use any consistent style, listed in alphabetical order.
- Figures and tables:
- Use clear, computer-made images (no hand-drawn), with captions and proper numbering.
- Add images with the LaTeX
graphicxpackage using\includegraphics[width=...]{...}. - Keep figure captions close to their figures and table titles above their tables.
- Page setup and file formats:
- US Letter paper size (not A4).
- If you generate PDF from LaTeX, you can use
pdflatex. If you need PostScript, the paper shows the exact commands to convert to PDF. - Images should be in PDF (for
pdflatex) or EPS (for traditional LaTeX workflows).
- Avoid layout problems:
- Don’t manually move figures with special commands; let LaTeX place them.
- If a long word won’t break properly, give LaTeX a hint using
\-to add a hyphenation point.
- Standard notation (optional):
- The paper includes a suggestion to use standard math symbols from the “Deep Learning” textbook. This helps keep symbols consistent across papers.
Main Findings or Results
Because this is an instruction guide, there aren’t scientific “results.” Instead, the important outcomes are the rules themselves:
- Clear formatting standards (fonts, spacing, margins, headings).
- Strict page limits: 9 pages for the initial submission (plus unlimited citations), increasing to 10 pages for the rebuttal/camera-ready.
- Standard ways to handle figures, tables, citations, and references.
- A smooth workflow for creating correctly formatted PDF files.
- A recommendation for common math notation to keep papers consistent.
Why this matters:
- Consistency makes papers easier to read and review.
- Using templates prevents technical problems that could distract from the research.
- Fairness: everyone follows the same rules, so reviewers focus on ideas, not formatting.
Implications and Impact
When authors follow this guide:
- Reviewers can quickly read and compare papers.
- Authors avoid rejection for fixable formatting mistakes.
- The conference proceedings look professional and clean.
- New researchers (including students) get a clear, reliable path to preparing their work.
In short, this paper helps the scientific community communicate better by setting a simple, shared standard for how papers should look and be organized.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
Based on the provided text (an ICLR 2026 LaTeX formatting template and instructions), the following unresolved issues and gaps remain that future researchers, organizers, or tooling developers could concretely address:
- Impact of formatting choices on readability and review outcomes: no empirical evidence that mandated font, margins, and typographic constraints improve reviewer accuracy, speed, or fairness.
- Accessibility standards are absent: no guidance on alt text for figures/tables, screen-reader compatibility for math, colorblind-safe palettes, or minimum contrast ratios.
- No automated format-compliance tooling: lack of an official linter/validator (e.g., CI script or Overleaf plugin) to detect violations (margins, fonts, page limits, figure captions, reference style).
- Outdated and incomplete PDF production pipeline: instructions rely on dvips/ps2pdf; no guidance for modern engines (pdflatex/xelatex/lualatex), font embedding, PDF/A compliance, or setting PDF metadata (Title/Author/Keywords).
- Internationalization and paper size: US Letter requirement is inflexible; no sanctioned A4 workflow or auto-conversion guidance for non-US authors; no guidance for CJK or right-to-left scripts, diacritics, and language-specific hyphenation.
- Package policy is unspecified: no whitelist/blacklist of LaTeX packages (e.g., microtype, fontspec, subcaption, biblatex) and no guidance on shell-escape or security considerations.
- Figure and table quality guidelines are missing: no minimum DPI, vector vs raster recommendations, color gamut (sRGB), line thickness standards, or reproducibility guidelines for visualizations.
- Float placement and subfigures: lack of clear rules for figures/tables positioning, subfigure usage, and avoiding orphaned captions across pages.
- Referencing policy is under-specified: “any style is acceptable” creates inconsistency; no requirements for DOIs, arXiv identifiers, URLs, access dates, or limits on very long author lists; no BibLaTeX support guidance.
- Double-blind anonymization: minimal guidance on removing metadata, handling self-citations (e.g., “Anonymous (2025)”), acknowledging prior submissions, or obfuscating code/data links during review.
- Supplementary materials and artifacts: no policy for code/data submission, artifact evaluation, reproducibility checklists, or permitted content/format for appendices and supplementary files.
- Camera-ready changes beyond
\iclrfinalcopy: unclear what content modifications are allowed (e.g., broader literature review, additional experiments, updated numbers) versus strictly prohibited edits. - Notation standardization is incomplete: the “Default Notation” table contains placeholders and malformed entries, offering no actionable standard or macro set for common ML symbols and equations.
- Math typesetting best practices: no guidance for multi-line equations, alignment, numbering, theorem environments, or consistent variable naming to improve clarity and accessibility.
- Accessibility of math for screen readers: no recommendation for MathML export, tagging, or tools to produce machine-readable mathematics from LaTeX.
- Ethics and responsible AI statements: absent guidance on ethics declarations, dataset licenses, human subject approvals, model risks, or conflict-of-interest disclosures.
- Author identity and contributions: optional “Author Contributions” section lacks structure (e.g., CRediT taxonomy) and there is no support for ORCID integration or multi-affiliation best practices.
- Color printing contingencies: the template notes color may be used but provides no requirement that figures remain interpretable when printed in grayscale.
- Consistency and quality of the sample bibliography: duplicate entries, malformed fields, and non-ASCII characters appear; no instructions for handling extremely long author lists (e.g., “et al.” after N authors).
- Guidance for non-LaTeX authors: no official Word/Docx or Markdown templates, nor conversion pipelines to the LaTeX style for broader accessibility.
- Build reproducibility and security: no containerized build instructions (e.g., Docker images), deterministic compilation settings, or prohibition of risky macros/commands.
- Hyphenation and microtypography: advice is limited to
\-; no recommendations formicrotype, language-specific hyphenation patterns, or strategies to avoid overfull boxes without manual tweaks. - Licensing and reuse of style files: the template does not specify the license for
iclr2026_conference.sty/.bstor permissible modifications for institutional repositories. - Guidance on page-limit enforcement: no automated method to count text pages (excluding references/appendices), nor rules for dense formatting workarounds and their detection.
- Metadata and discoverability: no requirements for structured metadata (keywords, subject areas), which affects indexing and downstream discoverability in digital libraries.
Glossary
- Backtracking: A problem-solving strategy that revisits and reverses prior steps when they lead to errors. "Backtracking In-Context"
- Bootstrapping: A self-training method where a model leverages its own outputs to improve performance. "Bootstrapping Reasoning With Reasoning"
- Camera ready: The final, publication-ready version that meets all formatting requirements. "camera ready requirements"
- Chain-of-Thought: An approach where models produce explicit step-by-step reasoning traces. "Demystifying Long Chain-of-Thought Reasoning in {LLM}s"
- \citet: A natbib command for in-text (non-parenthetical) citations. "using \verb|\citet{}|"
- \citep: A natbib command for parenthetical citations. "using \verb|\citep{}|"
- Distillation: Training a smaller model to imitate a larger or ensemble model. "without Distillation"
- dvips: A utility that converts DVI files to PostScript. "dvips mypaper.dvi -t letter -Ppdf -G0 -o mypaper.ps"
- EPS: Encapsulated PostScript, a vector graphics format used in LaTeX workflows. "EPS figures"
- Flush left: Text aligned to the left margin without indentation. "flush left"
- Gaussian distribution: A continuous probability distribution characterized by mean and covariance. "Gaussian distribution % over and covariance f"
- includegraphics: A LaTeX command to insert and size images. "\includegraphics[width=0.8\linewidth]{myfile.eps}"
- Jacobian matrix: The matrix of first-order partial derivatives of a vector-valued function. "Jacobian matrix of "
- Kullback-Leibler divergence: A measure of how one probability distribution diverges from another. "Kullback-Leibler divergence of P and Q"
- LaTeX: A document preparation system for scientific typesetting. "Submissions must be made using \LaTeX{}"
- Lp norm: A generalized vector norm parameterized by . " norm of \displaystyle \frac{1} {1 + \exp(-x)}"
- Small caps: A typographic style with small uppercase letters. "in small caps"
- Softplus: A smooth approximation to ReLU, defined as . "Softplus, "
- test-time scaling: Increasing computation or search during inference to improve performance. "s1: Simple test-time scaling"
- US Letter: A standard paper size used in the US (8.5×11 inches). "paper size ``US Letter''"
Practical Applications
Overview
The provided document is an ICLR 2026 LaTeX style guide that details formatting, submission workflows, citation practices (natbib), figure/table standards, default mathematical notation (via dlbook_notation), and PDF preparation. While it does not present new scientific findings, it codifies practical methods and constraints that can be directly operationalized into tools, processes, and policies for scholarly publishing and technical document production.
Below are actionable applications derived from these instructions, organized by time horizon.
Immediate Applications
The following applications can be deployed now using existing LaTeX toolchains, Overleaf, CI/CD systems, and standard publishing workflows.
- Conference submission preflight checker
- Sector: software (developer tools), academia, publishing
- Description: A validator that compiles manuscripts and checks page limits, margins (5.5 × 9 inches inside 1.5-inch left margin), fonts (Times 10 pt, specific headings), figure/table placement, footnote rules, natbib citation usage, US Letter size, and camera-ready flag (
\iclrfinalcopy). - Tools/products/workflows: CLI tool, Overleaf plugin, GitHub Action for LaTeX repositories; auto-generated compliance report before OpenReview submission.
- Assumptions/dependencies: Authors use LaTeX and the ICLR style files; reliable TeX Live/MiKTeX environment; static style rules.
- Camera-ready switch and page delta auditor
- Sector: academia, publishing
- Description: Script that toggles
\iclrfinalcopyand reports changes in page count/spacing, ensuring compliance with rebuttal/camera-ready limits. - Tools/products/workflows: Makefile targets or pre-commit hooks; Overleaf template with one-click switch.
- Assumptions/dependencies: Correct use of official
.styand.bstfiles; reproducible LaTeX builds.
- Figure and table quality assurance
- Sector: publishing, academia, design tooling
- Description: Checker for figure resolution, non-hand-drawn requirement, caption proximity (no separation), B/W legibility, color contrast, table centering/legibility.
- Tools/products/workflows: “Figure Preflight” script; matplotlib/Plotly extensions that enforce width as a fraction of
\linewidth(e.g.,width=0.8\linewidth). - Assumptions/dependencies: Access to source image files; consistent LaTeX figure environments; heuristics for B/W readability.
- Citation style linter for natbib
- Sector: academia, publishing, education
- Description: Lints for correct use of
\citet{}vs\citep{}, alphabetical references, BibTeX consistency, and completeness of metadata. - Tools/products/workflows: BibTeX/Natbib linter; Overleaf extension; journal submission pipeline check.
- Assumptions/dependencies: BibTeX files and natbib package in use; consistent citation keys.
- Default notation pack adoption (dlbook_notation)
- Sector: education, academia
- Description: Encourage standardized math notation for ML papers, slides, and course notes; detection of deviations and suggestions.
- Tools/products/workflows: “DL Notation Pack” macro import; course templates; LaTeX snippet library.
- Assumptions/dependencies: Authors opt-in to
math_commands.tex; community acceptance of standard notation.
- PDF preparation pipeline for US Letter
- Sector: software (DevOps), publishing
- Description: Automated build recipes to produce US Letter PDFs via
pdflatex, ordvips -t letter -Ppdf -G0followed byps2pdf. - Tools/products/workflows: CI pipelines (GitHub Actions, GitLab CI) with preconfigured TeX toolchain; Docker images.
- Assumptions/dependencies: TeX Live/MiKTeX present; authors include correct graphics formats (
.pdfinstead of.epsinpdflatex).
- Hyphenation and line-break helper
- Sector: writing tools, academia
- Description: Suggests
\-hints where LaTeX fails to hyphenate, reducing width overflow and margin issues. - Tools/products/workflows: Overleaf plugin or VSCode LaTeX extension.
- Assumptions/dependencies: Text in English or supported languages; author willingness to accept automated hints.
- Department/journal templates and author training
- Sector: education, policy (institutional), academia
- Description: Official templates and short courses for students/researchers on formatting best practices, figure/table standards, and submission workflows.
- Tools/products/workflows: Template repositories; internal guidelines; onboarding sessions.
- Assumptions/dependencies: Institutional adoption; availability of trainers and documentation.
Long-Term Applications
These applications require additional development, standardization efforts, platform integration, or broader community adoption.
- AI-driven “ConferenceReady” formatter
- Sector: software (authoring tools), publishing
- Description: An assistant that ingests drafts (Word/Google Docs/Markdown) and converts them to fully compliant LaTeX, fixing citations, figures, headings, margins, and notation automatically.
- Tools/products/workflows: Cloud service or editor plugin; interactive corrections and style explanations.
- Assumptions/dependencies: Robust document conversion, accurate LaTeX generation, evolving style specs; user trust in automated edits.
- Machine-readable style specification and standard validator
- Sector: policy (scholarly publishing), software, academia
- Description: A formal schema for conference/journal style rules and a universal validator that reduces review overhead and enforces fairness in submissions.
- Tools/products/workflows: “StyleSpec” standard; validators integrated with OpenReview and journal submission systems; versioned rule sets.
- Assumptions/dependencies: Multi-stakeholder agreement (conferences, publishers); maintenance of rule versions; support across toolchains.
- Integrated figure design assistant (B/W and print-safe)
- Sector: design tools, scientific visualization
- Description: Recommends palettes, contrast, resolution, and layout that remain legible in black-and-white print; flags captions at risk of separation; suggests fixes.
- Tools/products/workflows: Plugins for matplotlib/Seaborn/Plotly; preflight checks in Overleaf; “ColorSafe for Papers.”
- Assumptions/dependencies: Standards for legibility thresholds; integration with visualization libraries; acceptance by authors.
- Notation ontology and cross-paper consistency analytics
- Sector: research discovery, education
- Description: A knowledge graph mapping mathematical symbols and conventions across ML papers, enabling automated readability checks and consistent notation suggestions.
- Tools/products/workflows: “NotationGraph” service; editor plugins that recommend standard symbols and definitions.
- Assumptions/dependencies: Large-scale LaTeX parsing; community buy-in for standardization; handling domain-specific exceptions.
- Auto-correction inside submission platforms
- Sector: publishing platforms (OpenReview), academia
- Description: Real-time style compliance feedback and optional auto-fixes during submission (e.g., incorrect page size, missing natbib usage, broken captions).
- Tools/products/workflows: OpenReview extensions; pre-submission sandbox with guided fixes.
- Assumptions/dependencies: Platform APIs and willingness to integrate; safeguards to avoid destructive changes; logging for transparency.
- Intelligent page-length optimizer
- Sector: writing tools, academia
- Description: Suggests restructuring, condensation, and layout changes to meet page limits while preserving content clarity; differentiates text vs. citation pages.
- Tools/products/workflows: LLM-based rewriting module; semantic compression; revision maps.
- Assumptions/dependencies: High-quality summarization and layout inference; alignment with authors’ intent; ethical considerations.
- Multi-style interoperability and conversion framework
- Sector: publishing, software
- Description: Converts manuscripts across conference/journal formats (ICLR, NeurIPS, ACL, etc.) via a meta-style layer and rule-based transformations.
- Tools/products/workflows: Cross-style converters; repository of style mappings; test harnesses.
- Assumptions/dependencies: Accurate mapping of rules; frequent style updates; community-maintained catalogs.
- Accessibility-first scientific document tooling
- Sector: policy (accessibility), academia, publishing
- Description: Extend style checks to ensure alt-text, font legibility, tagged PDFs, and accessible tables/figures, aligned with emerging accessibility policies.
- Tools/products/workflows: Accessibility validators; auto-generation of alt-text from captions; remediation tools.
- Assumptions/dependencies: Consensus standards (PDF/UA, WCAG for scientific docs); reliable auto-alt-text; author review workflows.
- End-to-end print preflight and archival readiness
- Sector: libraries, publishing
- Description: Services that ensure documents meet print constraints (US Letter), are robust to archival conversion (PDF/A), and retain figure/table fidelity.
- Tools/products/workflows: Prepress microservices; integration with institutional repositories; long-term preservation checks.
- Assumptions/dependencies: Adoption by libraries/publishers; tooling for PDF/A and color management; consistent metadata pipelines.
- Embedded compliance in visualization and authoring pipelines
- Sector: software (IDE/editor plugins), data science
- Description: IDE-level enforcement (VSCode/JetBrains/Overleaf) to set figure widths as fractions of
\linewidth, enforce caption proximity, and flag margin risks during writing. - Tools/products/workflows: Real-time LaTeX linting; Jupyter-to-LaTeX exporters with style-aware defaults.
- Assumptions/dependencies: Rich editor integrations; reliable static analysis of LaTeX; standardized authoring habits.
Notes on feasibility across all applications:
- Many solutions depend on LaTeX adoption and strict adherence to official ICLR style files.
- Specifications evolve; tools must track versioned rules and provide transparent updates.
- Visual and accessibility checks require heuristics and, for higher accuracy, ML models; author override and human review should remain part of the workflow.
- Platform-level integrations (e.g., OpenReview) rely on APIs and governance decisions beyond a single tool developer’s control.
Collections
Sign up for free to add this paper to one or more collections.