Hook-Pipeline Viability for Evaluation Scaffolding

Evaluate whether Claude Code’s existing hook pipeline can host generator–evaluator separation, sprint contracts, and post-hoc checks within its current context-cost envelope.

Background

The harness currently exposes 27 hook events that participate in permissioning, lifecycle, orchestration, and context management. The authors question whether these hooks can also support richer evaluation scaffolding without exceeding the available context budget.

This question complements the previous problem about where evaluation scaffolding should reside by asking if the current hook substrate is sufficient in practice to host it.

References

Second, whether the existing hook pipeline of \Cref{sec:ext} can host such scaffolding within its current context-cost envelope is a further open question.

Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems  (2604.14228 - Liu et al., 14 Apr 2026) in Section 12.1 (Silent Failure and the Observability–Evaluation Gap)