Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
121 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Masked Particle Modeling on Sets: Towards Self-Supervised High Energy Physics Foundation Models (2401.13537v3)

Published 24 Jan 2024 in hep-ph, cs.LG, hep-ex, and physics.data-an

Abstract: We propose masked particle modeling (MPM) as a self-supervised method for learning generic, transferable, and reusable representations on unordered sets of inputs for use in high energy physics (HEP) scientific data. This work provides a novel scheme to perform masked modeling based pre-training to learn permutation invariant functions on sets. More generally, this work provides a step towards building large foundation models for HEP that can be generically pre-trained with self-supervised learning and later fine-tuned for a variety of down-stream tasks. In MPM, particles in a set are masked and the training objective is to recover their identity, as defined by a discretized token representation of a pre-trained vector quantized variational autoencoder. We study the efficacy of the method in samples of high energy jets at collider physics experiments, including studies on the impact of discretization, permutation invariance, and ordering. We also study the fine-tuning capability of the model, showing that it can be adapted to tasks such as supervised and weakly supervised jet classification, and that the model can transfer efficiently with small fine-tuning data sets to new classes and new data domains.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, Aditi Raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou,  and Percy Liang, “On the opportunities and risks of foundation models,”  (2022), arXiv:2108.07258 [cs.LG] .
  2. Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov,  and Luke Zettlemoyer, “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,”  (2019), arXiv:1910.13461 [cs.CL] .
  3. Jacob Devlin, Ming-Wei Chang, Kenton Lee,  and Kristina Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,”  (2019), arXiv:1810.04805 [cs.CL] .
  4. OpenAI, “Gpt-4 technical report,”  (2023), arXiv:2303.08774 [cs.CL] .
  5. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever,  and Dario Amodei, “Language models are few-shot learners,” in Advances in Neural Information Processing Systems, Vol. 33, edited by H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan,  and H. Lin (Curran Associates, Inc., 2020) pp. 1877–1901.
  6. Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit,  and Neil Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,”  (2021), arXiv:2010.11929 [cs.CV] .
  7. Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski,  and Armand Joulin, “Emerging properties in self-supervised vision transformers,” in 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021 (IEEE, 2021) pp. 9630–9640.
  8. Hangbo Bao, Li Dong, Songhao Piao,  and Furu Wei, “Beit: Bert pre-training of image transformers,”  (2022), arXiv:2106.08254 [cs.CV] .
  9. Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen,  and Ilya Sutskever, “Zero-shot text-to-image generation,”  (2021), arXiv:2102.12092 [cs.CV] .
  10. Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katherine Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob L. Menick, Sebastian Borgeaud, Andy Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikolaj Binkowski, Ricardo Barreira, Oriol Vinyals, Andrew Zisserman,  and Karén Simonyan, “Flamingo: a visual language model for few-shot learning,” in NeurIPS (2022).
  11. Xinlei Chen, Saining Xie,  and Kaiming He, “An empirical study of training self-supervised vision transformers,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021) pp. 9640–9649.
  12. Shuangfei Zhai, Navdeep Jaitly, Jason Ramapuram, Dan Busbridge, Tatiana Likhomanenko, Joseph Y Cheng, Walter Talbott, Chen Huang, Hanlin Goh,  and Joshua M Susskind, “Position prediction as an effective pretraining strategy,” in Proceedings of the 39th International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 162, edited by Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu,  and Sivan Sabato (PMLR, 2022) pp. 26010–26027.
  13. Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma,  and Rob Fergus, “Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences,” Proceedings of the National Academy of Sciences 118, e2016239118 (2021).
  14. J. Ross, B. Belgodere,  and V. Chenthamarakshan, “Large-scale chemical language representations capture molecular structure and properties,” Nature Machine Intellegence 4, 1256–1264 (2022).
  15. J. Pan, “Large language model for molecular chemistry,” Nature Communication Science 3 (2023).
  16. Francois Lanusse, Liam Parker, Siavash Golkar, Miles Cranmer, Alberto Bietti, Michael Eickenberg, Geraud Krawezik, Michael McCabe, Ruben Ohana, Mariel Pettee, Bruno Regaldo-Saint Blancard, Tiberiu Tesileanu, Kyunghyun Cho,  and Shirley Ho, “Astroclip: Cross-modal pre-training for astronomical foundation models,”  (2023), arXiv:2310.03024 [astro-ph.IM] .
  17. Mike Walmsley, Inigo Val Slijepcevic, Micah Bowles,  and Anna M. M. Scaife, “Towards galaxy foundation models with hybrid contrastive learning,”  (2022), arXiv:2206.11927 [cs.CV] .
  18. Barry M. Dillon, Gregor Kasieczka, Hans Olischlager, Tilman Plehn, Peter Sorrenson,  and Lorenz Vogel, “Symmetries, safety, and self-supervision,” SciPost Phys. 12, 188 (2022).
  19. Rupert Tombs and Christopher G. Lester, “A method to challenge symmetries in data with self-supervised learning,” Journal of Instrumentation 17, P08024 (2022).
  20. Tomoe Kishimoto, Masahiro Morinaga, Masahiko Saito,  and Junichi Tanaka, “Pre-training strategy using real particle collision data for event classification in collider physics,”  (2023), arXiv:2312.06909 [hep-ex] .
  21. Huilin Qu, Congqiao Li,  and Sitian Qian, “Particle transformer for jet tagging,”  (2022a), arXiv:2202.03772 [hep-ph] .
  22. Vinicius Mikuni and Florencia Canelli, “Point cloud transformers applied to collider physics,” Machine Learning: Science and Technology 2, 035027 (2021).
  23. Benno K ach, Dirk Krücker,  and Isabell Melzer-Pellmann, “Point cloud generation using transformer encoders and normalising flows,”  (2022), arXiv:2211.13623 [hep-ex] .
  24. Raghav Kansal, Anni Li, Javier Duarte, Nadezda Chernyavskaya, Maurizio Pierini, Breno Orzari,  and Thiago Tomei, “Evaluating generative models in high energy physics,” Physical Review D 107 (2023), 10.1103/physrevd.107.076017.
  25. Michael James Fenton, Alexander Shmakov, Ta-Wei Ho, Shih-Chieh Hsu, Daniel Whiteson,  and Pierre Baldi, “Permutationless many-jet event reconstruction with symmetry preserving attention networks,” Physical Review D 105 (2022), 10.1103/physrevd.105.112008.
  26. ATLAS Collaboration (ATLAS), Transformer Neural Networks for Identifying Boosted Higgs Bosons decaying into b⁢b¯𝑏normal-¯𝑏b\bar{b}italic_b over¯ start_ARG italic_b end_ARG and c⁢c¯𝑐normal-¯𝑐c\bar{c}italic_c over¯ start_ARG italic_c end_ARG in ATLAS, Tech. Rep. (CERN, Geneva, 2023).
  27. Rachel E. C. Smith, Inês Ochoa, Rúben Inácio, Jonathan Shoemaker,  and Michael Kagan, “Differentiable vertex fitting for jet flavour tagging,”  (2023), arXiv:2310.12804 [hep-ex] .
  28. Akio Tomiya and Yuki Nagai, “Equivariant transformer is all you need,”  (2023), arXiv:2310.13222 [hep-lat] .
  29. Benno K ach and Isabell Melzer-Pellmann, “Attention to mean-fields for particle cloud generation,”  (2023), arXiv:2305.15254 [hep-ex] .
  30. John Andrew Raine, Matthew Leigh, Knut Zoch,  and Tobias Golling, “ν2superscript𝜈2\nu^{2}italic_ν start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-flows: Fast and improved neutrino reconstruction in multi-neutrino final states with conditional normalizing flows,”  (2023a), arXiv:2307.02405 [hep-ph] .
  31. Thorben Finke, Michael Krämer, Alexander Mück,  and Jan Tönshoff, “Learning the language of qcd jets with transformers,” Journal of High Energy Physics 2023, 184 (2023).
  32. Anja Butter, Nathan Huetsch, Sofia Palacios Schweitzer, Tilman Plehn, Peter Sorrenson,  and Jonas Spinner, “Jet diffusion versus jetgpt – modern networks for the lhc,”  (2023), arXiv:2305.10475 [hep-ph] .
  33. Matthias Vigl, Nicole Hartman,  and Lukas Heinrich, “Finetuning foundation models for joint analysis optimization,”  (2024).
  34. Aaron van den Oord, Oriol Vinyals,  and Koray Kavukcuoglu, “Neural discrete representation learning,” arXiv preprint arXiv:1711.00937  (2017).
  35. James MacQueen et al., “Some methods for classification and analysis of multivariate observations,” in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1 (Oakland, CA, USA, 1967) pp. 281–297.
  36. David Arthur and Sergei Vassilvitskii, “K-means++: The advantages of careful seeding,” in Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07 (Society for Industrial and Applied Mathematics, USA, 2007) p. 1027–1035.
  37. Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake VanderPlas, Arnaud Joly, Brian Holt,  and Gaël Varoquaux, “API design for machine learning software: experiences from the scikit-learn project,” in ECML PKDD Workshop: Languages for Data Mining and Machine Learning (2013) pp. 108–122.
  38. Huilin Qu, Congqiao Li,  and Sitian Qian, “JetClass: A Large-Scale Dataset for Deep Learning in Jet Physics,”  (2022b).
  39. Johan Alwall, R Frederix, S Frixione, V Hirschi, Fabio Maltoni, Olivier Mattelaer, H-S Shao, T Stelzer, P Torrielli,  and M Zaro, “The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations,” JHEP 07, 79.
  40. Torbjörn Sjöstrand, Stephen Mrenna,  and Peter Skands, “A brief introduction to pythia 8.1,” Comput. Phys. Commun. 178, 852–867.
  41. Pierre Artoisenet, Rikkert Frederix, Olivier Mattelaer,  and Robbert Rietkerk, “Automatic spin-entangled decays of heavy resonances in monte carlo simulations,” JHEP 03, 15.
  42. J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. Lemaître, A. Mertens, M. Selvaggi,  and The DELPHES 3 collaboration, “Delphes 3: a modular framework for fast simulation of a generic collider experiment,” Journal of High Energy Physics 2014, 57 (2014).
  43. Matteo Cacciari, Gavin P Salam,  and Gregory Soyez, “The anti-kt jet clustering algorithm,” JHEP 04, 063.
  44. Sam Shleifer, Jason Weston,  and Myle Ott, “Normformer: Improved transformer pretraining with extra normalization,” arXiv preprint arXiv:2110.09456  (2021).
  45. Diederik P Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980  (2014).
  46. Ilya Loshchilov and Frank Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101  (2017).
  47. Minyoung Huh, Brian Cheung, Pulkit Agrawal,  and Phillip Isola, “Straightening out the straight-through estimator: Overcoming optimization challenges in vector quantized networks,” arXiv preprint arXiv:2305.08842  (2023).
  48. Jonas Gehring, Michael Auli, David Grangier, Denis Yarats,  and Yann N. Dauphin, “Convolutional sequence to sequence learning,”  (2017), arXiv:1705.03122 [cs.CL] .
  49. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaiser,  and I. Polosukhin, “Attention is all you need,” CoRR abs/1706.03762 (2017).
  50. John Andrew Raine, Samuel Klein, Debajyoti Sengupta,  and Tobias Golling, “Curtains for your sliding window: Constructing unobserved regions by transforming adjacent intervals,” Frontiers in Big Data 6 (2023b), 10.3389/fdata.2023.899345.
  51. Anna Hallin, Joshua Isaacson, Gregor Kasieczka, Claudius Krause, Benjamin Nachman, Tobias Quadfasel, Matthias Schlaffer, David Shih,  and Manuel Sommerhalder, “Classifying anomalies through outer density estimation (cathode),” arXiv preprint arXiv:2109.00546  (2021).
  52. Georges Aad, Brad Abbott, Dale Charles Abbott, A Abed Abud, Kira Abeling, Deshan Kavishka Abhayasinghe, Syed Haider Abidi, OS AbouZeid, Nadine L Abraham, Halina Abramowicz, et al., “Dijet resonance search with weak supervision using s= 13 tev p p collisions in the atlas detector,” Physical review letters 125, 131801 (2020).
  53. Anders Andreassen, Benjamin Nachman,  and David Shih, “Simulation Assisted Likelihood-free Anomaly Detection,” Phys. Rev. D 101, 095004 (2020), arXiv:2001.05001 [hep-ph] .
  54. Tobias Golling, Samuel Klein, Radha Mastandrea,  and Benjamin Nachman, “Flow-enhanced transportation for anomaly detection,” Phys. Rev. D 107, 096025 (2023), arXiv:2212.11285 [hep-ph] .
  55. Jack H. Collins, Kiel Howe,  and Benjamin Nachman, “Extending the search for new resonances with machine learning,” Phys. Rev. D99, 014038 (2019), arXiv:1902.02634 [hep-ph] .
  56. Mattias Birman, Benjamin Nachman, Raphael Sebbah, Gal Sela, Ophir Turetz,  and Shikma Bressler, “Data-directed search for new physics based on symmetries of the sm,” The European Physical Journal C 82, 508 (2022).
  57. Erik Buhmann, Cedric Ewen, Gregor Kasieczka, Vinicius Mikuni, Benjamin Nachman,  and David Shih, “Full phase space resonant anomaly detection,”  (2023), arXiv:2310.06897 [hep-ph] .
  58. Debajyoti Sengupta, Matthew Leigh, John Andrew Raine, Samuel Klein,  and Tobias Golling, “Improving new physics searches with diffusion models for event observables and jet constituents,”  (2023), arXiv:2312.10130 [physics.data-an] .
  59. Edmund Witkowski, Benjamin Nachman,  and Daniel Whiteson, “Learning to isolate muons in data,” arXiv preprint arXiv:2306.15737  (2023).
  60. Yoshua Bengio, Nicholas Léonard,  and Aaron Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” arXiv preprint arXiv:1308.3432  (2013).
  61. Minyoung Huh, “vqtorch: PyTorch package for vector quantization,” https://github.com/minyoungg/vqtorch (2022).
  62. Laurens van der Maaten and Geoffrey Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research 9, 2579–2605 (2008).
Citations (15)

Summary

  • The paper introduces a novel self-supervised learning framework (MPM) that predicts masked particle features in collider physics data.
  • The methodology employs VQ-VAE tokenization and a permutation invariant design to pre-train foundation models for diverse HEP tasks.
  • The results demonstrate improved jet classification and robust generalization across unseen classes and domains, cutting reliance on extensive labeled datasets.

Overview of Masked Particle Modeling on Sets

The paper "Masked Particle Modeling on Sets: Towards Self-Supervised High Energy Physics Foundation Models" introduces a novel self-supervised learning (SSL) approach specifically designed for high energy physics (HEP) data. The authors propose a strategy called Masked Particle Modeling (MPM), which is aimed at learning generic, transferable, and reusable representations from unordered sets of particles in collider physics experiments. This strategy draws inspiration from masked modeling techniques successfully applied in other domains, such as NLP and Computer Vision (CV).

Goals and Methodology

The primary objective of MPM is to construct large foundation models for HEP that can be pre-trained in a self-supervised manner and subsequently fine-tuned for various downstream tasks such as jet classification. Contrary to traditional supervised learning approaches, which depend heavily on labeled data and are susceptible to domain overfitting, MPM seeks to leverage unlabeled data for robust and domain-generalizable feature extraction.

In the MPM framework, particles within a jet are masked, and the training model aims to predict the properties of the masked particles based on the information from the unmasked particles. This is analogous to the masked LLMing used in models like BERT but adapted to unordered, continuous data typical in HEP. Various strategies for masking, ordering, and predicting particle features are investigated within this method.

Tokenization and Permutation Invariance

A salient challenge tackled by the authors is adapting the discretization of continuous features and ensuring permutation invariance for unordered sets of particles. To create discrete tokens from continuous particle features, the authors utilize a Vector Quantized Variational Autoencoder (VQ-VAE). The paper investigates different tokenization techniques, including direct binning of features and using k-means clustering, to evaluate their effectiveness in pre-training.

Moreover, the model also addresses the permutation invariance of particle sets by experimenting with ordering strategies. The paper finds that ordering particles exclusively in the prediction head significantly improves performance while maintaining the essential permutation invariance in the backbone model. This subtle yet critical adjustment ensures that the model can adapt to various downstream tasks without an implicit sequence bias.

Fine-Tuning and Evaluation

The pre-trained models are evaluated on several downstream tasks to measure their effectiveness:

  • In-context Classification: The model is fine-tuned and tested on the same JetClass dataset used for pre-training. The results demonstrate substantial performance improvements, particularly with small labeled datasets, underscoring the effectiveness of SSL pre-training.
  • Out-of-context Classification: To evaluate the generalizability of the learned representations, the model is pre-trained on a subset of classes and fine-tuned on new, unseen classes. The model continues to exhibit strong performance, indicating that the backbone has learned generalizable features useful across different particle jet categories.
  • Out-of-domain Classification: The model's adaptability to different datasets (RODEM) further validates its generalizability. Even with domain shifts, the pre-trained models show superior performance compared to models trained from scratch, highlighting their potential to mitigate domain-related discrepancies in HEP data analytics.

Additionally, the paper explores the utility of weakly supervised learning by using "noisy" labels in fine-tuning. The pre-trained models achieve significant improvements in performance compared to fully supervised models trained from scratch, suggesting their practical applicability in real-world scenarios where clean labels might not always be available.

Implications and Future Directions

The implications of adopting MPM in HEP are multifaceted. This approach can significantly reduce reliance on large labeled datasets, which are often expensive and time-consuming to generate. Moreover, the potential to mitigate domain shifts by pre-training on real experimental data while fine-tuning on simulated data opens new avenues for robust model deployment in high energy physics.

Future work could involve scaling the MPM framework with larger models and data sets, thus enhancing the representation learning capability further. Additionally, exploring other SSL techniques and their synergistic incorporation with MPM could yield more refined and powerful foundation models for HEP.

Furthermore, this paper sets a precedent for cross-domain application of SSL techniques, suggesting that methodologies successful in NLP and CV can be adapted for scientific data with appropriate modifications. This could spur innovation not just within HEP but across various scientific disciplines that deal with large, complex, and unlabeled datasets.

Conclusion

This paper presents a comprehensive approach to self-supervised learning in high energy physics, demonstrating the feasibility and advantages of masked particle modeling. The results underscore the potential of SSL to revolutionize data analysis in HEP, providing a pathway for more efficient and generalizable models capable of addressing the complexities inherent to high energy particle data.