OmniJet-$α$: The first cross-task foundation model for particle physics (2403.05618v2)
Abstract: Foundation models are multi-dataset and multi-task machine learning methods that once pre-trained can be fine-tuned for a large variety of downstream applications. The successful development of such general-purpose models for physics data would be a major breakthrough as they could improve the achievable physics performance while at the same time drastically reduce the required amount of training time and data. We report significant progress on this challenge on several fronts. First, a comprehensive set of evaluation methods is introduced to judge the quality of an encoding from physics data into a representation suitable for the autoregressive generation of particle jets with transformer architectures (the common backbone of foundation models). These measures motivate the choice of a higher-fidelity tokenization compared to previous works. Finally, we demonstrate transfer learning between an unsupervised problem (jet generation) and a classic supervised task (jet tagging) with our new OmniJet-$\alpha$ model. This is the first successful transfer between two different and actively studied classes of tasks and constitutes a major step in the building of foundation models for particle physics.
- Rishi Bommasani et al., “On the opportunities and risks of foundation models,” (2022), arXiv:2108.07258 [cs.LG] .
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” (2019), arXiv:1810.04805 [cs.CL] .
- Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer, “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension,” (2019), arXiv:1910.13461 [cs.CL] .
- Tom B. Brown et al., “Language models are few-shot learners,” (2020), arXiv:2005.14165 [cs.CL] .
- Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample, “LLaMA: Open and Efficient Foundation Language Models,” (2023), arXiv:2302.13971 [cs.CL] .
- Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen, “Hierarchical Text-Conditional Image Generation with CLIP Latents,” (2022), arXiv:2204.06125 [cs.CV] .
- Gregor Kasieczka et al., “The machine learning landscape of top taggers,” SciPost Physics 7 (2019a), 10.21468/scipostphys.7.1.014.
- Georgia Karagiorgi, Gregor Kasieczka, Scott Kravitz, Benjamin Nachman, and David Shih, “Machine learning in the search for new fundamental physics,” Nature Rev. Phys. 4, 399–412 (2022).
- Sebastian Macaluso and David Shih, “Pulling out all the tops with computer vision and deep learning,” Journal of High Energy Physics 2018 (2018), 10.1007/jhep10(2018)121.
- Huilin Qu, Congqiao Li, and Sitian Qian, “Particle Transformer for Jet Tagging,” in Proceedings of the 39th International Conference on Machine Learning (2022) pp. 18281–18292, arXiv:2202.03772 [hep-ph] .
- Matthias Vigl, Nicole Hartman, and Lukas Heinrich, “Finetuning Foundation Models for Joint Analysis Optimization,” (2024), arXiv:2401.13536 [hep-ex] .
- Johannes Albrecht et al. (HEP Software Foundation), “A Roadmap for HEP Software and Computing R&D for the 2020s,” Comput. Softw. Big Sci. 3, 7 (2019), arXiv:1712.06982 [physics.comp-ph] .
- Amber Boehnlein et al., “HL-LHC Software and Computing Review Panel Report,” (2022).
- Michela Paganini, Luke de Oliveira, and Benjamin Nachman, “Accelerating Science with Generative Adversarial Networks: An Application to 3D Particle Showers in Multilayer Calorimeters,” Phys. Rev. Lett. 120, 042003 (2018), arXiv:1705.02355 [hep-ex] .
- Erik Buhmann, Sascha Diefenbacher, Engin Eren, Frank Gaede, Gregor Kasieczka, Anatolii Korol, and Katja Krüger, “Getting High: High Fidelity Simulation of High Granularity Calorimeters with High Speed,” Comput. Softw. Big Sci. 5, 13 (2021), 2005.05334 .
- Erik Buhmann, Frank Gaede, Gregor Kasieczka, Anatolii Korol, William Korcari, Katja Krüger, and Peter McKeown, “CaloClouds II: Ultra-Fast Geometry-Independent Highly-Granular Calorimeter Simulation,” (2023a), arXiv:2309.05704 [physics.ins-det] .
- Andreas Adelmann et al., “New directions for surrogate models and differentiable programming for High Energy Physics detector simulation,” in Snowmass 2021 (2022) arXiv:2203.08806 [hep-ph] .
- Simon Badger et al., “Machine learning and LHC event generation,” SciPost Phys. 14, 079 (2023), arXiv:2203.07460 [hep-ph] .
- Hosein Hashemi and Claudius Krause, “Deep Generative Models for Detector Signature Simulation: An Analytical Taxonomy,” (2023), arXiv:2312.09597 [physics.ins-det] .
- Anja Butter, Tilman Plehn, and Ramon Winterhalder, “How to GAN LHC Events,” SciPost Phys. 7, 075 (2019), arXiv:1907.03764 [hep-ph] .
- Luke de Oliveira, Michela Paganini, and Benjamin Nachman, “Learning Particle Physics by Example: Location-Aware Generative Adversarial Networks for Physics Synthesis,” Computing and Software for Big Science 1 (2017), 10.1007/s41781-017-0004-6.
- “Les Houches guide to reusable ML models in LHC analyses, author=Jack Y. Araz and Andy Buckley and Gregor Kasieczka and Jan Kieseler and Sabine Kraml and Anders Kvellestad and Andre Lessa and Tomasz Procter and Are Raklev and Humberto Reyes-Gonzalez and Krzysztof Rolbiecki and Sezen Sekmen and Gokhan Unel,” (2024), arXiv:2312.14575 [hep-ph] .
- Sebastian Bieringer, Gregor Kasieczka, Jan Kieseler, and Mathias Trabs, “Classifier Surrogates: Sharing AI-based Searches with the World,” (2024), arXiv:2402.15558 [hep-ph] .
- Barry M. Dillon, Gregor Kasieczka, Hans Olischlager, Tilman Plehn, Peter Sorrenson, and Lorenz Vogel, “Symmetries, safety, and self-supervision,” SciPost Phys. 12, 188 (2022a), arXiv:2108.04253 [hep-ph] .
- Luigi Favaro, Michael Krämer, Tanmoy Modak, Tilman Plehn, and Jan Rüschkamp, “Semi-visible jets, energy-based models, and self-supervision,” (2023), arXiv:2312.03067 [hep-ph] .
- Barry M. Dillon, Luigi Favaro, Friedrich Feiden, Tanmoy Modak, and Tilman Plehn, “Anomalies, Representations, and Self-Supervision,” (2023), arXiv:2301.04660 [hep-ph] .
- Sang Eon Park, Philip Harris, and Bryan Ostdiek, “Neural embedding: learning the embedding of the manifold of physics data,” JHEP 07, 108 (2023), arXiv:2208.05484 [hep-ph] .
- Barry M. Dillon, Radha Mastandrea, and Benjamin Nachman, “Self-supervised anomaly detection for new physics,” Phys. Rev. D 106, 056005 (2022b), arXiv:2205.10380 [hep-ph] .
- Lisa Benato et al., “Shared Data and Algorithms for Deep Learning in Fundamental Physics,” Comput. Softw. Big Sci. 6, 9 (2022), arXiv:2107.00656 [cs.LG] .
- Dalila Salamani, Anna Zaborowska, and Witold Pokorski, “MetaHEP: Meta learning for fast shower simulation of high energy physics experiments,” Phys. Lett. B 844, 138079 (2023).
- Matthew J. Dolan and Ayodele Ore, ‘‘Metalearning and data augmentation for mass-generalized jet taggers,” Phys. Rev. D 105, 094030 (2022), arXiv:2111.06047 [hep-ph] .
- Hugues Beauchesne, Zong-En Chen, and Cheng-Wei Chiang, “Improving the performance of weak supervision searches using transfer and meta-learning,” JHEP 02, 138 (2024), arXiv:2312.06152 [hep-ph] .
- Gregor Kasieczka, Tilman Plehn, Jennifer Thompson, and Michael Russel, “Top quark tagging reference dataset,” (2019b).
- Huilin Qu, Congqiao Li, and Sitian Qian, “JetClass: A Large-Scale Dataset for Deep Learning in Jet Physics,” (2022b).
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin, “Attention Is All You Need,” in 31st International Conference on Neural Information Processing Systems (2017) arXiv:1706.03762 [cs.CL] .
- Thorben Finke, Michael Krämer, Alexander Mück, and Jan Tönshoff, “Learning the language of QCD jets with transformers,” JHEP 06, 184 (2023), arXiv:2303.07364 [hep-ph] .
- Anja Butter, Nathan Huetsch, Sofia Palacios Schweitzer, Tilman Plehn, Peter Sorrenson, and Jonas Spinner, “Jet Diffusion versus JetGPT – Modern Networks for the LHC,” (2023), arXiv:2305.10475 [hep-ph] .
- Andris Huang, Yash Melkani, Paolo Calafiura, Alina Lazar, Daniel Thomas Murnane, Minh-Tuan Pham, and Xiangyang Ju, “A Language Model for Particle Tracking,” in Connecting The Dots 2023 (2024) arXiv:2402.10239 [hep-ph] .
- Patrick T. Komiske, Eric M. Metodiev, and Jesse Thaler, “Energy flow networks: deep sets for particle jets,” Journal of High Energy Physics 2019 (2019), 10.1007/jhep01(2019)121.
- Erik Buhmann, Gregor Kasieczka, and Jesse Thaler, “EPiC-GAN: Equivariant Point Cloud Generation for Particle Jets,” (2023b), arXiv:2301.08128 [hep-ph] .
- Erik Buhmann, Sascha Diefenbacher, Engin Eren, Frank Gaede, Gregor Kasieczka, Anatolii Korol, William Korcari, Katja Krüger, and Peter McKeown, “CaloClouds: fast geometry-independent highly-granular calorimeter simulation,” JINST 18, P11025 (2023c), arXiv:2305.04847 [physics.ins-det] .
- Aaron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu, “Neural discrete representation learning,” (2018), arXiv:1711.00937 [cs.LG] .
- Hangbo Bao, Li Dong, Songhao Piao, and Furu Wei, “BEiT: BERT Pre-Training of Image Transformers,” (2022), arXiv:2106.08254 [cs.CV] .
- J. Alwall, R. Frederix, S. Frixione, V. Hirschi, F. Maltoni, O. Mattelaer, H.-S. Shao, T. Stelzer, P. Torrielli, and M. Zaro, “The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations,” Journal of High Energy Physics 2014 (2014), 10.1007/jhep07(2014)079.
- Torbjörn Sjöstrand, Stefan Ask, Jesper R. Christiansen, Richard Corke, Nishita Desai, Philip Ilten, Stephen Mrenna, Stefan Prestel, Christine O. Rasmussen, and Peter Z. Skands, “An introduction to PYTHIA 8.2,” Computer Physics Communications 191, 159–177 (2015).
- J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. Lemaître, A. Mertens, and M. Selvaggi, “DELPHES 3: a modular framework for fast simulation of a generic collider experiment,” Journal of High Energy Physics 2014 (2014), 10.1007/jhep02(2014)057.
- The CMS Collaboration, “The CMS experiment at the CERN LHC,” JINST 3, S08004 (2008).
- Matteo Cacciari, Gavin P Salam, and Gregory Soyez, “The anti-kt jet clustering algorithm,” Journal of High Energy Physics 2008, 063–063 (2008).
- Henry Schreiner, Jim Pivarski, and Saransh Chopra, “vector,” (2023).
- Jim Pivarski, Ianna Osborne, Ioana Ifrim, Henry Schreiner, Angus Hollands, Anish Biswas, Pratyush Das, Santam Roy Choudhury, Nicholas Smith, and Manasvi Goyal, “Awkward Array,” (2024).
- Minyoung Huh, Brian Cheung, Pulkit Agrawal, and Phillip Isola, “Straightening out the straight-through estimator: Overcoming optimization challenges in vector quantized networks,” (2023), arXiv:2305.08842 [cs.LG] .
- Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever, “Improving language understanding by generative pre-training,” (2018).
- Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton, “Layer normalization,” (2016), arXiv:1607.06450 [stat.ML] .
- Jesse Thaler and Ken Van Tilburg, “Identifying Boosted Objects with N-subjettiness,” JHEP 03, 015 (2011), arXiv:1011.2268 [hep-ph] .
- Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan Salakhutdinov, and Alexander Smola, “Deep Sets,” (2018), arXiv:1703.06114 [cs.LG] .
- Sam Shleifer, Jason Weston, and Myle Ott, “Normformer: Improved transformer pretraining with extra normalization,” (2021), arXiv:2110.09456 [cs.CL] .
- Claudius Krause and David Shih, “Fast and accurate simulations of calorimeter showers with normalizing flows,” Phys. Rev. D 107, 113003 (2023), arXiv:2106.05285 [physics.ins-det] .
- Ranit Das, Luigi Favaro, Theo Heimel, Claudius Krause, Tilman Plehn, and David Shih, “How to Understand Limitations of Generative Networks,” (2023), arXiv:2305.16774 [hep-ph] .
- Joschka Birk, Erik Buhmann, Cedric Ewen, Gregor Kasieczka, and David Shih, “Flow Matching Beyond Kinematics: Generating Jets with Particle-ID and Trajectory Displacement Information,” (2023), arXiv:2312.00123 [hep-ph] .
- Adam Paszke et al., “PyTorch: An Imperative Style, High-Performance Deep Learning Library,” in Advances in Neural Information Processing Systems 32, edited by H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Curran Associates, Inc., 2019) pp. 8024–8035.
- William Falcon and The PyTorch Lightning team, “Pytorch lightning,” (2024).
- Minyoung Huh, “vqtorch: PyTorch package for vector quantization,” https://github.com/minyoungg/vqtorch (2022).
- Ilya Loshchilov and Frank Hutter, “Decoupled Weight Decay Regularization,” (2019), arXiv:1711.05101 [cs.LG] .
- Leslie N. Smith, “A disciplined approach to neural network hyper-parameters: Part 1 – learning rate, batch size, momentum, and weight decay,” (2018), arXiv:1803.09820 [cs.LG] .
- Diederik P. Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,” (2017), arXiv:1412.6980 [cs.LG] .