Papers
Topics
Authors
Recent
Search
2000 character limit reached

JunoBench: A Benchmark Dataset of Crashes in Python Machine Learning Jupyter Notebooks

Published 20 Oct 2025 in cs.SE | (2510.18013v1)

Abstract: Jupyter notebooks are widely used for ML prototyping. Yet few debugging tools are designed for ML code in notebooks, potentially due to the lack of benchmarks. We introduce JunoBench, the first benchmark dataset of real-world crashes in Python-based ML notebooks. JunoBench has 111 curated and reproducible crashes from public Kaggle notebooks, each paired with a verifiable fix, ranging over popular ML libraries, including TensorFlow/Keras, PyTorch, Scikit-learn, Pandas, and NumPy, as well as notebook-specific out-of-order execution issue. To support reproducibility and ease of use, JunoBench offers a unified execution environment where crashes and fixes can be reliably reproduced. By providing realistic crashes and their resolutions, JunoBench facilitates bug detection, localization, and repair tailored to the interactive and iterative nature of notebook-based ML development.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 5 likes about this paper.