Tracers for debugging and program exploration

Published 10 Apr 2026 in cs.PL and cs.HC | (2604.09301v1)

Abstract: Programmers often use an iterative process of hypothesis generation ("perhaps this function is called twice?") and hypothesis testing ("let's count how many times this breakpoint fires") to understand the behavior of unfamiliar or malfunctioning software. Existing debugging tools are much better suited to testing hypotheses than to generating them. Step debuggers, for example, present isolated snapshots of the program's state, leaving it to the programmer to mentally reconstruct the evolution of that state over time. We advocate for a different approach: building a debugging and program-exploration tool around a trace, or complete history, of the program's execution. Our key claim is that the user should see every line as executed (in time order) rather than as written (in syntax order). We discuss design choices, preliminary results, and interesting challenges.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper presents a tracer paradigm that records full execution traces, enabling comprehensive debugging and program exploration.
It leverages automated instrumentation to capture every statement’s variable values and control flow, reducing setup and cognitive overhead.
The approach enhances temporal reasoning and supports advanced UI-based querying, significantly improving hypothesis generation during debugging.

Tracers for Debugging and Program Exploration: A Synthesis

Context and Motivation

Debugging and program comprehension remain central challenges in software development, especially as the complexity of systems continues to escalate. The prevalent methodologies—step debugging (steppers) and printf-style logging—primarily facilitate hypothesis testing rather than hypothesis generation. These techniques require the programmer to anticipate where issues of interest might reside, and often demand significant cognitive overhead to reconstruct program behavior over time due to their limited temporal context and need for repeated, often manual, instrumentation.

The paper "Tracers for debugging and program exploration" (2604.09301) posits a paradigm shift: centering debugging tools around a comprehensive trace—an explicit, navigable record of a program’s execution at the granularity of every statement, including concrete variable values and control flow events. This approach directly addresses the cognitive and practical limitations of stepping and logging tools by presenting program behaviors as sequences in time, rather than as isolated, syntax-anchored snapshots.

Limitations of Prevailing Debugging Paradigms

Stepping Debuggers

Steppers focus on rendering lexical context (current line and function) and provide operations for forward/backward stepping through code. Their central limitations include:

Loss of Temporal Continuity: Lexical context switches (e.g., stepping into a function) disrupt temporal reasoning, disconnecting the programmer from the broader program narrative.
High Cost of Exploration: Mistakes (such as missteps in navigation) can rarely be fully undone, forcing programmers to rerun and re-navigate complex execution paths—a process that is both tedious and error-prone.
Partial History: Stack traces illuminate recursion but barely capture histories of control flow constructs such as loops, significantly degrading their utility in imperative programs.

Logger-Based Approaches

The use of print/log statements is widespread due to simplicity and minimal tooling requirements. However, key deficits are:

Prohibitively High Setup Cost: As program size grows, the labor involved in strategically placing and calibrating log messages scales poorly.
Implicit Control Flow: Logs capture only what is manually inserted, often leaving control flow relationships ambiguous and restricting the potential for automatic or structured querying.

The Tracer: Conceptual and Practical Design

A tracer records the entirety of a program’s execution trace—sequentially, with explicit annotation of variable values and control flow. This record fundamentally changes the exploration workflow:

Temporal Context by Default: All statements are presented in execution order, directly supporting temporal reasoning about the computation and state flow, and eliminating the need for manual reconstruction.
Separation of Tool State from Program State: Since execution is fully recorded, all exploring and navigation operations become UI/view-only, easily undoable without loss.
Symmetric Treatment of Control Flow: Tracers represent recursion and loop iterations uniformly, allowing seamless navigation and structural querying.
Minimal Setup and Overhead: Only program re-execution is required—automatic instrumentation is sufficient to capture all relevant execution state.

UI affordances further enhance practicality: structural queries (syntactic/semantic patterns), granular navigation (bookmarks, minimaps), side-by-side source/trace presentation, and click-to-inspect features for object/value lifetimes.

Evaluation of Practicality

Addressing concerns about performance and usability, the paper emphasizes that for many human-scale programs (e.g., educational, research, and scripting contexts), trace sizes remain manageable and can be queried efficiently. Empirical data from similar systems and relevant literature suggest querying gigabyte-scale traces within seconds using standard tools is tractable, and that interface design—particularly for collapsing/expanding trace regions and targeted searches—can mitigate information overload.

Relation to Prior Work

Steppers and Omniscient Debuggers

Prior omniscient/debugging systems (e.g., TOD [PothierTanterBackFutureOmniscient2009], ZStep [LiebermanFryZStep95Reversible1998], Pernosco, rr [OCallahanEtAlEngineeringRecordReplay2017]) record fine-grained execution data, with some offering backward navigation and value provenance. However, most retain step-oriented, source-centric UIs and do not prioritize direct trace interaction as the centerpiece of exploration.

Tracers Proper

Tracers aligned with the proposed concept exist, mainly in educational or research contexts (e.g., CMeRun for C++ [EtheredgeCMeRunProgramLogic2004], Backstop and Traceglasses [SakuraiEtAlTraceglassesTracebasedDebugger2010] for Java, snoop for Python). Typical implementations rely on automated instrumentation and present program flow in execution order with values, but often lack the sophisticated interactive UI or scalable querying facilities envisioned by the authors.

Trace-Based Analysis

Trace-based approaches are widely employed for fault localization, hypothesis testing, and debugging support tools (e.g., Whyline [KoMyersDesigningWhylineDebugging2008], Hypothesizer [AlaboudiLaTozaHypothesizerHypothesisBasedDebugger2023], Delta Debugging [CleveZellerLocatingCausesProgram2005]). These systems leverage execution traces for diffing, path profiling, and program spectra analysis to localize anomalies or facilitate comprehension, underscoring the foundational value of trace data.

Querying Traces

Expressive trace querying (e.g., PTQL [GoldsmithEtAlRelationalQueriesProgram2005], PQL [MartinEtAlFindingApplicationErrors2005]) often relies on event-driven, relational, or tree-structured queries to permit nuanced interrogation of large, structured execution traces.

Implications and Future Directions

Practical Implications

The tracer model is directly beneficial in scenarios where exploratory debugging and understanding dominate over repetitive, finely-tuned error localization. Early experience from prototyping at EPFL and related student projects supports its practical synergy with research, prototyping, and education use cases. Its ability to decouple navigation from execution state and comprehensively record data/flow aligns well with learning environments, design exploration, and hypothesis-driven debugging.

Theoretical Implications

Explicit trace-first representations blur the boundary between static and dynamic program understanding. Hierarchically structured execution traces may serve as the substrate for emerging techniques in automated fault localization, execution pattern mining, or provenance analysis. As LLM-driven code assistance tools evolve, fine-grained execution traces could become central for explainable AI-driven program diagnostics and repair.

Future Developments

Potential research avenues include:

Integration with expressive, user-friendly query languages, supporting both value- and control-flow-oriented queries.
UI features to scale with large programs (e.g., hierarchical trace visualization, automatic summarization/motif detection [AlimadadiEtAlInferringHierarchicalMotifs2018]).
Hybridization with live and literate programming environments, enabling dynamic, example-driven documentation (see literate tracing [SotoudehLiterateTracing2025]).
Extension to multi-threaded, distributed, and non-imperative paradigms, though challenges around non-determinism and trace data volume must be addressed.

Conclusion

The tracer paradigm as outlined in (2604.09301) fundamentally rethinks debugging and exploration tooling: prioritizing temporally ordered, comprehensive execution traces over step-based navigation or ad hoc logging. This approach substantially reduces cognitive and setup burdens, enhances the hypothesis generation phase for both debugging and comprehension, and opens the door to advanced interaction, analysis, and integration with forthcoming AI and program analysis systems. It both synthesizes and extends a diverse body of research, and its wide adoption—especially within human-facing programming domains—could reshape best practices in debugging and software understanding.

Markdown Report Issue