Co-Authoring with AI: How I Wrote a Physics Paper About AI, Using AI

Published 5 Apr 2026 in physics.comp-ph, physics.ed-ph, physics.soc-ph, and quant-ph | (2604.04081v1)

Abstract: The rapid integration of LLMs into scientific writing fundamentally challenges traditional definitions of authorship, responsibility, and scientific integrity. As researchers transition from using computers as deterministic tools to managing them as ``virtual collaborators,'' the nature of human contribution must be re-evaluated. Using the drafting process of a recent computational physics manuscript as a case study, this essay explores the indispensable role of the Human-in-the-Loop (HITL). We demonstrate that while AI excels at structural organization and syntax generation, the human author bears the ultimate responsibility for enforcing rigorous physical logic, maintaining academic diplomacy, and anticipating peer-review critiques. In this paradigm, the human contribution shifts from writing boilerplate text to acting as a Principal Investigator who actively mentors and steers the AI's reasoning. To ensure accountability and preserve the integrity of the scientific record in this new era, I argue that the community must mandate the publication of full, unedited AI interaction transcripts as standard supplementary material.

Abstract PDF Upgrade to Chat

Authors (1)

Yi Zhou

Summary

The paper demonstrates a human-in-the-loop methodology where LLMs serve as active research collaborators through staged, context-rich prompting.
The paper highlights the crucial role of corrective human mentorship in ensuring physical accuracy, modern taxonomy, and academic precision.
The paper advocates for complete transcript transparency of human-AI exchanges to uphold accountability and reproducibility in research.

Human-in-the-Loop Authorship with LLMs: A Case Study in Virtual Research Cohorts

Introduction

The manuscript "Co-Authoring with AI: How I Wrote a Physics Paper About AI, Using AI" (2604.04081) offers an in-depth autoethnographic account of collaborating with LLMs in the production of a computational physics manuscript. It systematically interrogates authorship, HITL methodology, academic rigor, and the epistemological impact of integrating LLMs into STEM research pipelines. The work reframes industrial LLMs from passive, deterministic tools to participatory, mentorable research collaborators, and offers concrete procedural recommendations for ensuring scientific accountability in the emergent human-AI co-authorship paradigm.

Transitioning from Tool to Active Collaborator

Classical computational workflows in physics treat computers as deterministic executors—compilers, solvers, and function libraries manipulated by human operators. The case study demonstrates a decisive transition: through explicit academic role assignments, LLMs are recontextualized as a cohort with differentiated research functions (theorist, blueprint designer, coder), orchestrated by a human PI who enforces scientific logic and project direction.

Figure 1: The division of LLMs into discrete academic roles—junior theorist, senior postdoc, coder—all mentored in a managed cohort by the human PI, forming a "Virtual Research Group".

The case demonstrates that "end-to-end" prompting approaches yield generic, shallow, or hallucinatory outputs. Instead, high-fidelity results emerge from meticulously staged, supervised workflows, with iterative grounding, context expansion, and direct intervention from a scientifically literate human lead. Explicitly, the manuscript underscores that responsibility for logical rigor, physical accuracy, and professional tone remains non-transferrable—an inescapable domain of the expert human. The case further illuminates the bidirectional genesis of scientific concepts, where key terminology ("Virtual Research Group", "Universal API") is not dictated uni-directionally but emerges dialogically in iterative human-AI exchanges.

The "Inside-Out" Writing Protocol

The methodology subverts conventional AI manuscript drafting by rejecting the "Introduction-first" approach. Context is fully "loaded" into the LLM via concatenation of theoretical motivation, experimental protocol, LaTeX specifications, and unsanitized developmental transcripts. This explicit context expansion is essential for disallowing parametric generalization and enforcing scientific specificity.

Crucially, pivotal core concepts originate through negotiated dialogue, rather than unidirectional author specification or zero-shot generation. The manuscript’s theoretical architecture—terminology, workflow decomposition, and focal claims—emerges through iterative human-AI exchange, with the LLM acting as critical-sounding board and co-inventor, and the human PI curating logical boundaries.

Human-Driven Rigor: Corrective Mentorship and Academic Standards

Several verbatim conversational interventions are documented, highlighting indispensable moments of HITL correction and scientific steering:

Logical Correction: AI claims of "continuous mathematics" in tensor network contexts are flagged as physically inaccurate and revised to reflect the discrete nature of spin lattice models.
Taxonomical Modernity: Legacy terminology ("hidden topological order") is superseded with modern classifications (SPT order) following PI correction.
Diplomatic Framing: Overly blunt criticism of extant open-source libraries is redrafted with appropriate academic diplomacy.

These interventions substantiate the claim that, without active HITL oversight, large models may perpetuate outdated nomenclature, misrepresent state-of-the-art standards, or generate community-alienating rhetoric.

Defensive Scholarship: Anticipating Reviewer Critique

The paper explicitly rehearses the process of preemptively addressing canonical reviewer objections, central for high-tier manuscript acceptance:

Data Contamination: Differences between LLM-generated code and known open-source scripts are foregrounded as empirical evidence against parametric memorization, documenting unique syntactic and algorithmic features derived in-context.
Model Capability Paradox: Apparent contradictions in agentic role assignments (i.e., model performance variations when context is staged) are transformed into evidence of the importance of staged, context-rich prompting over zero-shot approaches.
Algorithmic Precision: Naive asymptotic claims are refined for strictness (e.g., distinguishing between single-site and two-site bottlenecks in memory scaling), demonstrating that reviewer-level pedantry can be successfully anticipated and addressed through human intervention.

AI-Guided, Human-Intervened Visual Communication

Beyond text, the workflow extends to the co-production of scientifically faithful visuals. LLMs serve as "art directors," translating high-level physics criteria and diagrammatic fidelity into rigorously engineered prompts for generative image models. The transcript reveals that left unchecked, generative visual models hallucinate unphysical artifacts—necessitating textual LLMs as mediators for visual grounding, with the human PI as the final judge of physical accuracy.

Implications for Authorship, Integrity, and Transparency

The work advances a critical claim: the trajectory of human contribution in research writing has shifted from generation of boilerplate syntax to high-level intellectual steering—enforcing scientific rigor, curating conceptual structure, and maintaining disciplinary standards.

A strong policy recommendation is advanced: in an era where LLMs contribute structurally and semantically to manuscripts, authors must submit complete, unedited transcripts of all human-AI exchanges as supplementary data. This is justified not merely to placate concerns of "AI-assisted authorship," but to ensure reproducibility, authenticity, and accountability in the historical scientific record.

Conclusion

This case study elucidates best practices, limitations, and theoretical implications of integrating LLMs as "Virtual Research Groups" within computational research. It demonstrates that, contrary to the discourse of LLM-oracle automation, the human remains indispensable in the enforcement of physical reality, logical rigor, and academic standards. The practical impact is immediate: researchers adopting such workflows must invest attention in prompt engineering, staged collaboration, and transcript curation. The theoretical implications extend to the domains of epistemology, academic ethics, and the preservation of scholarly accountability. Transcript-based transparency will become an essential adjunct to future scientific communication, redefining the norms of authorship in computational science.

Markdown Report Issue