Systematic Evaluation of Generative Machine Learning Capability to Simulate Distributions of Observables at the Large Hadron Collider
Abstract: Monte Carlo simulations are a crucial component when analysing the Standard Model and New physics processes at the Large Hadron Collider. This paper aims to explore the performance of generative models for complementing the statistics of classical Monte Carlo simulations in the final stage of data analysis by generating additional synthetic data that follows the same kinematic distributions for a limited set of analysis-specific observables to a high precision. Several deep generative models are adapted for this task and their performance is systematically evaluated using a well-known benchmark sample containing the Higgs boson production beyond the Standard Model and the corresponding irreducible background. The paper evaluates the autoregressive models and normalizing flows and the applicability of these models using different model configurations is investigated. The best performing model is chosen for a further evaluation using a set of statistical procedures and a simplified physics analysis. By implementing and performing a series of statistical tests and evaluations we show that a machine-learning-based generative procedure can be used to generate synthetic data that matches the original samples closely enough and that it can therefore be incorporated in the final stage of a physics analysis with some given systematic uncertainty.
- Lhc machine. Journal of Instrumentation, 3(08):S08001, aug 2008. doi: 10.1088/1748-0221/3/08/S08001. URL https://dx.doi.org/10.1088/1748-0221/3/08/S08001.
- ATLAS Collaboration. The atlas simulation infrastructure. The European Physical Journal C, 70(3):823–874, 2010. doi: 10.1140/epjc/s10052-010-1429-9. URL https://doi.org/10.1140/epjc/s10052-010-1429-9.
- An introduction to PYTHIA 8.2. Comput. Phys. Commun., 191:159, 2015. doi: 10.1016/j.cpc.2015.01.024. URL https://doi.org/10.1016/j.cpc.2015.01.024.
- Qcd matrix elements and truncated showers. Journal of High Energy Physics, 2009(05):053, may 2009. doi: 10.1088/1126-6708/2009/05/053. URL https://dx.doi.org/10.1088/1126-6708/2009/05/053.
- S. Agostinelli et al. Geant4 – a simulation toolkit. Nucl. Instrum. Meth. A, 506:250, 2003. doi: 10.1016/S0168-9002(03)01368-8. URL https://doi.org/10.1016/S0168-9002(03)01368-8.
- Asymptotic formulae for likelihood-based tests of new physics. The European Physical Journal C, 71:1–19, 2011.
- Update of the Computing Models of the WLCG and the LHC Experiments. Technical report, 2014. URL https://cds.cern.ch/record/1695401.
- ATLAS HL-LHC Computing Conceptual Design Report. Technical report, CERN, Geneva, Sep 2020. URL https://cds.cern.ch/record/2729668.
- ATLAS Collaboration. The ATLAS Experiment at the CERN Large Hadron Collider. JINST, 3:S08003, 2008. doi: 10.1088/1748-0221/3/08/S08003. URL https://dx.doi.org/10.1088/1748-0221/3/08/S08003.
- ATLAS Collaboration. AtlFast3: the next generation of fast simulation in ATLAS. Computing and Software for Big Science, 6(1):1–54, 2022. doi: 10.1007/s41781-021-00079-7. URL https://doi.org/10.1007/s41781-021-00079-7.
- Deep generative models for detector signature simulation: An analytical taxonomy, 2023. URL https://arxiv.org/abs/2312.09597.
- Normalizing flows for probabilistic modeling and inference, 2021. URL https://arxiv.org/abs/1912.02762.
- Event generation with normalizing flows. Physical Review D, 101(7), apr 2020a. doi: 10.1103/physrevd.101.076002. URL https://doi.org/10.1103%2Fphysrevd.101.076002.
- i-flow: High-dimensional integration and sampling with normalizing flows. Machine Learning: Science and Technology, 1(4):045023, November 2020b. ISSN 2632-2153. doi: 10.1088/2632-2153/abab62. URL http://dx.doi.org/10.1088/2632-2153/abab62.
- Generative networks for precision enthusiasts. SciPost Phys., 14:078, 2023. doi: 10.21468/SciPostPhys.14.4.078. URL https://scipost.org/10.21468/SciPostPhys.14.4.078.
- Rob Verheyen. Event Generation and Density Estimation with Surjective Normalizing Flows. SciPost Phys., 13:047, 2022. doi: 10.21468/SciPostPhys.13.3.047. URL https://scipost.org/10.21468/SciPostPhys.13.3.047.
- Fast and accurate simulations of calorimeter showers with normalizing flows. Phys. Rev. D, 107:113003, Jun 2023. doi: 10.1103/PhysRevD.107.113003. URL https://link.aps.org/doi/10.1103/PhysRevD.107.113003.
- Caloman: Fast generation of calorimeter showers with density estimation on learned manifolds, 2022. URL https://arxiv.org/abs/2211.15380.
- Inductive caloflow, 2023. URL https://arxiv.org/abs/2305.11934.
- L2lflows: generating high-fidelity 3d calorimeter images. Journal of Instrumentation, 18(10):P10017, oct 2023. doi: 10.1088/1748-0221/18/10/P10017. URL https://dx.doi.org/10.1088/1748-0221/18/10/P10017.
- Jetflow: Generating jets with conditioned and mass constrained normalising flows, 2022. URL https://arxiv.org/abs/2211.13630.
- Anomaly detection with density estimation. Physical Review D, 101(7), April 2020. ISSN 2470-0029. doi: 10.1103/physrevd.101.075042. URL http://dx.doi.org/10.1103/PhysRevD.101.075042.
- The interplay of machine learning–based resonant anomaly detection methods, 2023. URL https://arxiv.org/abs/2307.11157.
- Data-driven estimation of background distribution through neural autoregressive flows, 2020. URL https://arxiv.org/abs/2008.03636.
- Understanding Event-Generation Networks via Uncertainties. SciPost Phys., 13:003, 2022. doi: 10.21468/SciPostPhys.13.1.003. URL https://scipost.org/10.21468/SciPostPhys.13.1.003.
- Elsa: enhanced latent spaces for improved collider simulations. The European Physical Journal C, 83(9), September 2023. ISSN 1434-6052. doi: 10.1140/epjc/s10052-023-11989-8. URL http://dx.doi.org/10.1140/epjc/s10052-023-11989-8.
- End-to-end simulation of particle physics events with flow matching and generator oversampling, 2024. URL https://arxiv.org/abs/2402.13684.
- Lhc analysis-specific datasets with generative adversarial networks, 2019. URL https://arxiv.org/abs/1901.05282.
- Event generation and statistical sampling for physics with deep generative models and a density information buffer. Nature communications, 12(1):1–16, 2021. doi: https://doi.org/10.1038/s41467-021-22616-z. URL https://doi.org/10.1038/s41467-021-22616-z.
- Evaluating generative models in high energy physics. Physical Review D, 107(7), April 2023. ISSN 2470-0029. doi: 10.1103/physrevd.107.076017. URL http://dx.doi.org/10.1103/PhysRevD.107.076017.
- How to understand limitations of generative networks. SciPost Physics, 16(1), January 2024. ISSN 2542-4653. doi: 10.21468/scipostphys.16.1.031. URL http://dx.doi.org/10.21468/SciPostPhys.16.1.031.
- Searching for exotic particles in high-energy physics with deep learning. Nature communications, 5(1):1–9, 2014. doi: https://doi.org/10.1038/ncomms5308. URL https://doi.org/10.1038/ncomms5308.
- Delphes, a framework for fast simulation of a generic collider experiment, 2010. URL https://arxiv.org/abs/0903.2225.
- Deep learning. 2016. http://www.deeplearningbook.org.
- Rank-based inverse normal transformations are increasingly used, but are they merited? Behavior genetics, 39:580–595, 2009.
- Normalizing flows: An introduction and review of current methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11):3964–3979, 2021. doi: 10.1109/TPAMI.2020.2992934.
- NICE: Non-linear Independent Components Estimation, 2015. URL https://arxiv.org/abs/1410.8516.
- Density estimation using Real NVP, 2017. URL https://arxiv.org/abs/1605.08803.
- Glow: Generative Flow with Invertible 1x1 Convolutions, 2018. URL https://arxiv.org/abs/1807.03039.
- Kevin P. Murphy. Probabilistic Machine Learning: Advanced Topics. 2023. http://probml.github.io/book2.
- RNADE: The real-valued neural autoregressive density-estimator, 2014. URL https://arxiv.org/abs/1306.0186.
- MADE: Masked Autoencoder for Distribution Estimation, 2015. URL https://arxiv.org/abs/1502.03509.
- Masked Autoregressive Flow for Density Estimation, 2018. URL https://arxiv.org/abs/1705.07057.
- Identity mappings in deep residual networks. In Computer Vision – ECCV 2016, pages 630–645, Cham, 2016. Springer International Publishing. doi: 10.1007/978-3-319-46493-0_38.
- Autoregressive Energy Machines, 2019. URL https://arxiv.org/abs/1904.05626.
- Neural Spline Flows, 2019. URL https://arxiv.org/abs/1906.04032.
- Gaussian Error Linear Units (GELUs), 2016. URL https://arxiv.org/abs/1606.08415.
- SGDR: Stochastic Gradient Descent with Warm Restarts, 2016. URL https://arxiv.org/abs/1608.03983.
- Neural importance sampling. ACM Trans. Graph., 38(5):145:1–145:19, October 2019. ISSN 0730-0301. doi: 10.1145/3341156. URL http://doi.acm.org/10.1145/3341156.
- i- flow: High-dimensional integration and sampling with normalizing flows. Machine Learning: Science and Technology, 1(4):045023, oct 2020c. doi: 10.1088/2632-2153/abab62. URL https://dx.doi.org/10.1088/2632-2153/abab62.
- Revisiting classifier two-sample tests, 2016. URL https://arxiv.org/abs/1610.06545.
- pyhf: pure-python implementation of histfactory statistical models. Journal of Open Source Software, 6(58):2823, 2021. doi: 10.21105/joss.02823.
- pyhf: v0.7.3. https://github.com/scikit-hep/pyhf/releases/tag/v0.7.3.
- Alexander L. Read. Presentation of search results: the CLS𝐶subscript𝐿𝑆CL_{S}italic_C italic_L start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT technique. J. Phys. G, 28:2693, 2002. doi: 10.1088/0954-3899/28/10/313.
- ATLAS Collaboration. Recommendations for the Modeling of Smooth Backgrounds. Technical report, CERN, Geneva, 2020. URL https://cds.cern.ch/record/2743717.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.