Investigate tractability of scaling verification and program synthesis from toy to frontier systems

Investigate the tractability of scaling mechanistic interpretability–inspired approaches to formal verification and program synthesis from toy models to frontier AI systems, and identify the technical barriers to such scaling.

Background

The paper discusses recent toy-model results in accuracy proofs and program synthesis via mechanistic analysis, highlighting their promise but also the difficulty of extending them to large, general systems.

Understanding scalability constraints—both conceptual and computational—could enable formal assurances of safety-relevant properties and improve control over advanced AI.

References

Several open questions remain about the tractability of scaling these approaches from toy models to frontier systems.

— Open Problems in Mechanistic Interpretability (2501.16496 - Sharkey et al., 27 Jan 2025) in Using mechanistic interpretability for better predictions about AI systems — Predicting behavior in novel situations (Section 3.3.1, paragraph on formal verification and program synthesis)

Investigate tractability of scaling verification and program synthesis from toy to frontier systems

Background

References

Related Problems