- The paper introduces shared state abstraction that enables direct interoperability between LLM prompts and Python programs, reducing manual data marshaling.
- It implements the natural function interface in the Nightjar system, achieving a 39.6% reduction in lines of code and a 4–19% improvement in task accuracy.
- The approach scales efficiently via pass-by-reference, though it introduces runtime overhead and safety challenges that warrant further research.
Shared Program State Between Prompts and Programs
Overview
The paper "Sharing State Between Prompts and Programs" (2512.14805) introduces the concept of shared program state as a formal programming abstraction enabling direct interoperability between natural language code (prompts executed by LLMs) and the internal state of host programming languages such as Python. The authors present a schema, termed the natural function interface (NFI), to generalize and formalize the interaction between natural and formal code. Implementation is demonstrated via the Nightjar programming system, which allows natural code embedded within Python programs to read, write, and control host-program execution state without manual data marshaling or isolated LLM tool calls. The paper provides strong empirical evidence that this paradigm yields more concise programs and, in many cases, higher task accuracy compared to manually orchestrated LLM-program boundaries.
Natural language programming—writing executable code as natural language prompts for LLMs—is increasingly used for abstract tasks like code generation and reasoning. The dominant paradigm in LLM-integrated programming ecosystems is isolated program state: natural code and formal code operate in disjoint environments, requiring manual transfer of data structures and program state across the LLM-program boundary. Prior works, including literate programming tools and modern code interpreter APIs, either treat natural language as documentation or employ rigid tool-call mechanisms, preventing fluid integration of prompt and program semantics.
Shared program state shifts this paradigm by granting natural code direct access to live program variables, objects in the heap, and control flow constructs. This abstraction is realized through the NFI schema, which generalizes previous approaches such as tool calling and custom code interpreters to a more expressive model incorporating algebraic effects and handlers.
Shared Program State Abstraction
At its core, shared program state permits natural code executed by an LLM agent to interact with the formal program state seamlessly. The NFI provides a structured contract through:
- Values: Data types transferrable across the natural-formal boundary.
- Effects: Requests emitted by natural code to read, write, or mutate program state or trigger control transitions.
- Handlers: Runtime implementations that fulfill effect requests—reading/writing variables, dereferencing objects, and orchestrating control flow.
This model enables natural language blocks to issue fine-grained manipulation requests such as variable lookups, heap updates, and context switching (e.g., loop breaks), which are handled by the host programming language runtime.
Figure 1: An example execution in which natural code (bound to an LLM) reads formal program state, emits requests for data, and executes control transitions, as orchestrated by effect handlers.
Nightjar Programming System and Implementation
Nightjar is a practical realization of this abstraction for Python. Using a specialized syntax, variable references and assignments are parsed to route LLM-issued effects directly to the Python runtime. The system includes optimizations such as eager variable loading (reducing inference latency), specialized Python effects (transitioning from language-agnostic primitive effects to Python-specific eval/exec semantics), and effect caching.
To address Python’s lack of low-level goto constructs, Nightjar injects synthetic program labels using try-except blocks and exposes control primitives to the LLM as effectable labels (e.g., break, continue, return).
Empirical Evaluation
The authors introduce SPSBench, a suite of 25 programs leveraging shared program state, to benchmark concise programmability and executional accuracy. Evaluation involves multiple baselines:
- Manual implementations requiring explicit serialization and marshaling
- Manual implementations using standard code interpreter tools (OpenAI/Anthropic)
- Nightjar implementations (both optimized and baseline)
Key metrics include lines of code, accuracy (pass rate), and runtime (effect trace length and execution time).
Figure 2: Average pass rate over SPSBench benchmarks using Claude-Sonnet-4-20250514 across baselines and Nightjar; standard deviation reflects execution variability.
Figure 3: Benchmark performance using GPT-4.1-2025-04-14; Nightjar consistently matches or exceeds manual baseline accuracy.
Nightjar demonstrates a 39.6% average reduction in lines of code and +4–19% improvement in task accuracy compared to manual implementations. The primary tradeoff is runtime overhead—Nightjar executions incur latency ranging from 0.4x to 4.3x relative to manual implementations, attributable to iterative effect emission and handler invocation.
Scalable State Interaction
A critical strength of shared program state is scalability. When formal program data structures become prohibitively large, pass-by-copy data encoding (as required by isolated state APIs) becomes intractable. Nightjar, via pass-by-reference and effect-driven access, maintains efficiency by only querying relevant data fragments:
Figure 4: Performance scaling for operations on large graphs; pass-by-copy (manual baseline) exhibits context-length failure, while Nightjar (pass-by-reference) maintains correct operation as graph size grows.
Effect Trace Analysis
The link between program runtime and the emitted effects is linear; each effect involves an LLM inference, and runtime is proportionate to the number of required effect-handling steps.
Figure 5: Number of emitted effects in a program execution is tightly correlated with total runtime, highlighting the importance of effect trace optimization.
Implications, Safety, and Future Work
The introduction of shared program state significantly blurs the boundary between natural and formal code, potentially improving programmer productivity and reducing boilerplate. The system-level safety implications are nontrivial: direct LLM access to live program state can introduce security risks and unintended side effects. The authors acknowledge incomplete handler safety mechanisms and propose future work inspired by language-based isolation, single-ownership abstractions, deterministic agentic execution protocols, and secure DSL embedding.
Performance engineering remains an open domain; further advances may include bytecode-level optimizations, parallel LLM generation, advanced caching, and prompt compilation. In addition, the paradigm calls for robust program analysis, debugging, and test generation frameworks adapted to mixed natural-formal codebases.
Conclusion
The shared program state abstraction, as formalized via the natural function interface and implemented in Nightjar, fundamentally advances LLM-program interoperability. By delegating state management from programmers to runtime effect handlers, it enables concise, accurate programs integrating natural and formal code. While runtime overhead and safety semantics require continued research, shared program state offers a promising foundation for next-generation agentic programming frameworks where rich, cooperative interactions between prompts and programs are the default.