Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Moments of Clarity: Streamlining Latent Spaces in Machine Learning using Moment Pooling (2403.08854v2)

Published 13 Mar 2024 in hep-ph, cs.LG, and stat.ML

Abstract: Many machine learning applications involve learning a latent representation of data, which is often high-dimensional and difficult to directly interpret. In this work, we propose "Moment Pooling", a natural extension of Deep Sets networks which drastically decrease latent space dimensionality of these networks while maintaining or even improving performance. Moment Pooling generalizes the summation in Deep Sets to arbitrary multivariate moments, which enables the model to achieve a much higher effective latent dimensionality for a fixed latent dimension. We demonstrate Moment Pooling on the collider physics task of quark/gluon jet classification by extending Energy Flow Networks (EFNs) to Moment EFNs. We find that Moment EFNs with latent dimensions as small as 1 perform similarly to ordinary EFNs with higher latent dimension. This small latent dimension allows for the internal representation to be directly visualized and interpreted, which in turn enables the learned internal jet representation to be extracted in closed form.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (83)
  1. Supriyo Chakraborty, Richard Tomsett, Ramya Raghavendra, Daniel Harborne, Moustafa Alzantot, Federico Cerutti, Mani Srivastava, Alun Preece, Simon Julier, Raghuveer M. Rao, Troy D. Kelley, Dave Braines, Murat Sensoy, Christopher J. Willis,  and Prudhvi Gurram, “Interpretability of deep learning models: A survey of results,” in 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) (2017) pp. 1–6.
  2. Leilani H. Gilpin, David Bau, Ben Z. Yuan, Ayesha Bajwa, Michael Specter,  and Lalana Kagal, “Explaining explanations: An overview of interpretability of machine learning,”  (2018).
  3. Yu Zhang, Peter Tino, Ales Leonardis,  and Ke Tang, “A survey on neural network interpretability,” IEEE Transactions on Emerging Topics in Computational Intelligence 5, 726–742 (2021).
  4. Christoph Molnar, Giuseppe Casalicchio,  and Bernd Bischl, “Interpretable machine learning – a brief history, state-of-the-art and challenges,” in Communications in Computer and Information Science (Springer International Publishing, 2020) p. 417–431.
  5. Cynthia Rudin, Chaofan Chen, Zhi Chen, Haiyang Huang, Lesia Semenova,  and Chudi Zhong, “Interpretable machine learning: Fundamental principles and 10 grand challenges,”  (2021), arXiv:2103.11251 [cs.LG] .
  6. Barry M. Dillon, Gregor Kasieczka, Hans Olischlager, Tilman Plehn, Peter Sorrenson,  and Lorenz Vogel, “Symmetries, Safety, and Self-Supervision,”   (2021), arXiv:2108.04253 [hep-ph] .
  7. Alexander Bogatskiy et al., “Symmetry Group Equivariant Architectures for Physics,” in 2022 Snowmass Summer Study (2022) arXiv:2203.06153 [cs.LG] .
  8. Lei Gao and Ling Guan, “Interpretability of machine learning: Recent advances and future prospects,”  (2023), arXiv:2305.00537 [cs.MM] .
  9. Alexander Bogatskiy, Timothy Hoffman, David W. Miller, Jan T. Offermann,  and Xiaoyang Liu, “Explainable Equivariant Neural Networks for Particle Physics: PELICAN,”   (2023), arXiv:2307.16506 [hep-ph] .
  10. Jeremy Wayland, Corinna Coupette,  and Bastian Rieck, “Mapping the multiverse of latent representations,”  (2024), arXiv:2402.01514 [cs.LG] .
  11. Bruce H. Denby, “Neural Networks and Cellular Automata in Experimental High-energy Physics,” Comput. Phys. Commun. 49, 429–448 (1988).
  12. Dan Guest, Kyle Cranmer,  and Daniel Whiteson, “Deep Learning and its Application to LHC Physics,”   (2018), 10.1146/annurev-nucl-101917-021019, arXiv:1806.11484 [hep-ex] .
  13. Anja Butter, Gregor Kasieczka, Tilman Plehn,  and Michael Russell, “Deep-learned Top Tagging with a Lorentz Layer,” SciPost Phys. 5, 028 (2018), arXiv:1707.08966 [hep-ph] .
  14. Kim Albertsson et al., “Machine Learning in High Energy Physics Community White Paper,”   (2018), 10.1088/1742-6596/1085/2/022008, arXiv:1807.02876 [physics.comp-ph] .
  15. Huilin Qu and Loukas Gouskos, ‘‘Jet tagging via particle clouds,” Phys. Rev. D 101, 056019 (2020a).
  16. Dimitri Bourilkov, “Machine and Deep Learning Applications in Particle Physics,” Int. J. Mod. Phys. A 34, 1930019 (2020), arXiv:1912.08245 [physics.data-an] .
  17. Shiqi Gong, Qi Meng, Jue Zhang, Huilin Qu, Congqiao Li, Sitian Qian, Weitao Du, Zhi-Ming Ma,  and Tie-Yan Liu, “An Efficient Lorentz Equivariant Graph Neural Network for Jet Tagging,”  (2022), arXiv:2201.08187 [hep-ph] .
  18. Jonathan Shlomi, Peter Battaglia,  and Jean-Roch Vlimant, “Graph Neural Networks in Particle Physics,”   (2020), 10.1088/2632-2153/abbf9a, arXiv:2007.13681 [hep-ex] .
  19. Amit Chakraborty, Sung Hak Lim, Mihoko M. Nojiri,  and Michihisa Takeuchi, “Neural Network-based Top Tagger with Two-Point Energy Correlations and Geometry of Soft Emissions,”   (2020), 10.1007/JHEP07(2020)111, arXiv:2003.11787 [hep-ph] .
  20. Anja Butter and Tilman Plehn, “Generative Networks for LHC events,”   (2020), arXiv:2008.08558 [hep-ph] .
  21. Michael Kagan, “Image-Based Jet Analysis,”   (2020), arXiv:2012.09719 [physics.data-an] .
  22. Sung Hak Lim and Mihoko M. Nojiri, “Morphology for Jet Classification,”   (2020), arXiv:2010.13469 [hep-ph] .
  23. Frédéric A. Dreyer and Huilin Qu, “Jet tagging in the Lund plane with graph networks,” JHEP 03, 052 (2021), arXiv:2012.08526 [hep-ph] .
  24. Georgia Karagiorgi, Gregor Kasieczka, Scott Kravitz, Benjamin Nachman,  and David Shih, “Machine Learning in the Search for New Fundamental Physics,”   (2021), arXiv:2112.03769 [hep-ph] .
  25. Matthew D. Schwartz, “Modern Machine Learning and Particle Physics,”  (2021), arXiv:2103.12226 [hep-ph] .
  26. Huilin Qu, Congqiao Li,  and Sitian Qian, “Particle Transformer for Jet Tagging,”   (2022), arXiv:2202.03772 [hep-ph] .
  27. Pierre Baldi, Peter Sadowski,  and Daniel Whiteson, “Deep Learning From Four Vectors,”   (2022), arXiv:2203.03067 [hep-ex] .
  28. Tilman Plehn, Anja Butter, Barry Dillon,  and Claudius Krause, “Modern Machine Learning for LHC Physicists,”   (2022), arXiv:2211.01421 [hep-ph] .
  29. Giuseppe Carleo, Ignacio Cirac, Kyle Cranmer, Laurent Daudet, Maria Schuld, Naftali Tishby, Leslie Vogt-Maranto,  and Lenka Zdeborová, “Machine learning and the physical sciences,” Rev. Mod. Phys. 91, 045002 (2019).
  30. Simon Badger et al., “Machine Learning and LHC Event Generation,”   (2022), arXiv:2203.07460 [hep-ph] .
  31. Alexander Bogatskiy, Timothy Hoffman, David W. Miller,  and Jan T. Offermann, “PELICAN: Permutation Equivariant and Lorentz Invariant or Covariant Aggregator Network for Particle Physics,”   (2022b), arXiv:2211.00454 [hep-ph] .
  32. Oliver Atkinson, Akanksha Bhardwaj, Christoph Englert, Partha Konar, Vishal S. Ngairangbam,  and Michael Spannowsky, “IRC-Safe Graph Autoencoder for Unsupervised Anomaly Detection,” Front. Artif. Intell. 5, 943135 (2022), arXiv:2204.12231 [hep-ph] .
  33. Akanksha Bhardwaj, Christoph Englert, Wrishik Naskar, Vishal S. Ngairangbam,  and Michael Spannowsky, “Equivariant, Safe and Sensitive – Graph Networks for New Physics,”   (2024), arXiv:2402.12449 [hep-ph] .
  34. Patrick T. Komiske, Eric M. Metodiev,  and Jesse Thaler, “Energy Flow Networks: Deep Sets for Particle Jets,” JHEP 01, 121 (2019a), arXiv:1810.05165 [hep-ph] .
  35. Anja Butter et al., “The Machine Learning Landscape of Top Taggers,” SciPost Phys. 7, 014 (2019), arXiv:1902.09914 [hep-ph] .
  36. “Constituent-Based Top-Quark Tagging with the ATLAS Detector,”   (2022).
  37. Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan Salakhutdinov,  and Alexander Smola, “Deep sets,”  (2017).
  38. Miles Cranmer, , Christina Kreisch, Alice Pisani, Francisco Villaescusa-Navarro, David N. Spergel,  and Shirley Ho, “Histogram pooling operators: An alternative for deep sets,”  (2021).
  39. Wei Shen, Daohan Wang,  and Jin Min Yang, “Hierarchical high-point Energy Flow Network for jet tagging,” JHEP 09, 135 (2023), arXiv:2308.08300 [hep-ph] .
  40. Samuel Bright-Thonney, Benjamin Nachman,  and Jesse Thaler, “Safe but Incalculable: Energy-weighting is not all you need,”  (2023), arXiv:2311.07652 [hep-ph] .
  41. Carola F. Berger, Tibor Kucs,  and George F. Sterman, “Event shape / energy flow correlations,” Phys. Rev. D 68, 014012 (2003), arXiv:hep-ph/0303051 .
  42. Carola F. Berger and Lorenzo Magnea, “Scaling of power corrections for angularities from dressed gluon exponentiation,” Phys. Rev. D 70, 094010 (2004), arXiv:hep-ph/0407024 .
  43. Fyodor V. Tkachov, “Measuring multi - jet structure of hadronic energy flow or What is a jet?” Int. J. Mod. Phys. A 12, 5411–5529 (1997), arXiv:hep-ph/9601308 .
  44. N. A. Sveshnikov and F. V. Tkachov, “Jets and quantum field theory,” Phys. Lett. B 382, 403–408 (1996), arXiv:hep-ph/9512370 .
  45. Fyodor V. Tkachov, “A Theory of jet definition,” Int. J. Mod. Phys. A 17, 2783–2884 (2002), arXiv:hep-ph/9901444 .
  46. Diego M. Hofman and Juan Maldacena, “Conformal collider physics: Energy and charge correlations,” JHEP 05, 012 (2008), arXiv:0803.1467 [hep-th] .
  47. Demba Ba, Akshunna S. Dogra, Rikab Gambhir, Abiy Tasissa,  and Jesse Thaler, “SHAPER: can you hear the shape of a jet?” JHEP 06, 195 (2023), arXiv:2302.12266 [hep-ph] .
  48. Patrick T. Komiske, Eric M. Metodiev,  and Jesse Thaler, “Energy flow polynomials: A complete linear basis for jet substructure,” JHEP 04, 013 (2018a), arXiv:1712.07124 [hep-ph] .
  49. Dzmitry Bahdanau, Kyunghyun Cho,  and Yoshua Bengio, “Neural machine translation by jointly learning to align and translate,”  (2016), arXiv:1409.0473 [cs.CL] .
  50. Jianpeng Cheng, Li Dong,  and Mirella Lapata, “Long short-term memory-networks for machine reading,”  (2016), arXiv:1601.06733 [cs.CL] .
  51. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser,  and Illia Polosukhin, “Attention is all you need,”  (2023), arXiv:1706.03762 [cs.CL] .
  52. Jason Gallicchio and Matthew D. Schwartz, “Quark and Gluon Tagging at the LHC,” Phys. Rev. Lett. 107, 172001 (2011), arXiv:1106.3076 [hep-ph] .
  53. Philippe Gras, Stefan Höche, Deepak Kar, Andrew Larkoski, Leif Lönnblad, Simon Plätzer, Andrzej Siódmok, Peter Skands, Gregory Soyez,  and Jesse Thaler, “Systematics of quark/gluon tagging,” JHEP 07, 091 (2017), arXiv:1704.03878 [hep-ph] .
  54. Torbjorn Sjostrand, Stephen Mrenna,  and Peter Z. Skands, “PYTHIA 6.4 Physics and Manual,” JHEP 05, 026 (2006), arXiv:hep-ph/0603175 .
  55. Torbjörn Sjöstrand, Stefan Ask, Jesper R. Christiansen, Richard Corke, Nishita Desai, Philip Ilten, Stephen Mrenna, Stefan Prestel, Christine O. Rasmussen,  and Peter Z. Skands, “An introduction to PYTHIA 8.2,” Comput. Phys. Commun. 191, 159–177 (2015), arXiv:1410.3012 [hep-ph] .
  56. Matteo Cacciari, Gavin P. Salam,  and Gregory Soyez, “The anti-ktsubscript𝑘𝑡k_{t}italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT jet clustering algorithm,” JHEP 04, 063 (2008), arXiv:0802.1189 [hep-ph] .
  57. Matteo Cacciari, Gavin P. Salam,  and Gregory Soyez, “FastJet User Manual,” Eur. Phys. J. C 72, 1896 (2012), arXiv:1111.6097 [hep-ph] .
  58. Eric M. Metodiev and Jesse Thaler, “Jet Topics: Disentangling Quarks and Gluons at Colliders,” Phys. Rev. Lett. 120, 241602 (2018), arXiv:1802.00008 [hep-ph] .
  59. Patrick T. Komiske, Eric M. Metodiev,  and Jesse Thaler, “An operational definition of quark and gluon jets,” JHEP 11, 059 (2018b), arXiv:1809.01140 [hep-ph] .
  60. Quark versus Gluon Jet Tagging Using Jet Images with the ATLAS Detector, Tech. Rep. ATL-PHYS-PUB-2017-017 (CERN, Geneva, 2017).
  61. Taoli Cheng, “Recursive Neural Networks in Quark/Gluon Tagging,”   (2017), 10.1007/s41781-018-0007-y, arXiv:1711.02633 [hep-ph] .
  62. Hui Luo, Ming-Xing Luo, Kai Wang, Tao Xu,  and Guohuai Zhu, “Quark jet versus gluon jet: fully-connected neural networks with high-level features,” Sci. China Phys. Mech. Astron. 62, 991011 (2019), arXiv:1712.03634 [hep-ph] .
  63. Gregor Kasieczka, Nicholas Kiefer, Tilman Plehn,  and Jennifer M. Thompson, “Quark-Gluon Tagging: Machine Learning vs Detector,” SciPost Phys. 6, 069 (2019), arXiv:1812.09223 [hep-ph] .
  64. Patrick T. Komiske, Eric M. Metodiev,  and Jesse Thaler, “Energy Flow Networks: Deep Sets for Particle Jets,” JHEP 01, 121 (2019b), arXiv:1810.05165 [hep-ph] .
  65. Jason Sang Hun Lee, Inkyu Park, Ian James Watson,  and Seungjin Yang, “Quark-Gluon Jet Discrimination Using Convolutional Neural Networks,” J. Korean Phys. Soc. 74, 219–223 (2019a), arXiv:2012.02531 [hep-ex] .
  66. Jason Sang Hun Lee, Sang Man Lee, Yunjae Lee, Inkyu Park, Ian James Watson,  and Seungjin Yang, “Quark Gluon Jet Discrimination with Weakly Supervised Learning,” J. Korean Phys. Soc. 75, 652–659 (2019b), arXiv:2012.02540 [hep-ph] .
  67. Eric A. Moreno, Olmo Cerri, Javier M. Duarte, Harvey B. Newman, Thong Q. Nguyen, Avikar Periwal, Maurizio Pierini, Aidana Serikova, Maria Spiropulu,  and Jean-Roch Vlimant, “JEDI-net: a jet identification algorithm based on interaction networks,” Eur. Phys. J. C 80, 58 (2020), arXiv:1908.05318 [hep-ex] .
  68. Huilin Qu and Loukas Gouskos, “ParticleNet: Jet Tagging via Particle Clouds,” Phys. Rev. D 101, 056019 (2020b), arXiv:1902.08570 [hep-ph] .
  69. Vinicius Mikuni and Florencia Canelli, “ABCNet: An attention-based method for particle tagging,” Eur. Phys. J. Plus 135, 463 (2020), arXiv:2001.05311 [physics.data-an] .
  70. Minxuan He and Daohan Wang, “Quark/Gluon Discrimination and Top Tagging with Dual Attention Transformer,”  (2023), arXiv:2307.04723 [hep-ph] .
  71. Matthew J. Dolan, John Gargalionis,  and Ayodele Ore, “Quark-versus-gluon tagging in CMS Open Data with CWoLa and TopicFlow,”   (2023), arXiv:2312.03434 [hep-ph] .
  72. Kaustuv Datta and Andrew J. Larkoski, “Novel Jet Observables from Machine Learning,” JHEP 03, 086 (2018), arXiv:1710.01305 [hep-ph] .
  73. Stephen D. Ellis, Zoltan Kunszt,  and Davison E. Soper, “Jets at hadron colliders at order α−s3:𝛼superscript𝑠:3absent\alpha-s^{3:}italic_α - italic_s start_POSTSUPERSCRIPT 3 : end_POSTSUPERSCRIPT A Look inside,” Phys. Rev. Lett. 69, 3615–3618 (1992), arXiv:hep-ph/9208249 .
  74. F. Abe et al. (CDF), “A Measurement of jet shapes in p⁢p¯𝑝¯𝑝p\bar{p}italic_p over¯ start_ARG italic_p end_ARG collisions at s=1.8𝑠1.8\sqrt{s}=1.8square-root start_ARG italic_s end_ARG = 1.8 TeV,” Phys. Rev. Lett. 70, 713–717 (1993).
  75. Andrew J. Larkoski, Gavin P. Salam,  and Jesse Thaler, “Energy Correlation Functions for Jet Substructure,” JHEP 06, 108 (2013), arXiv:1305.0007 [hep-ph] .
  76. Andrew L. Maas, “Rectifier nonlinearities improve neural network acoustic models,”  (2013).
  77. Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica Hamrick, Jason Grout, Sylvain Corlay, Paul Ivanov, Damián Avila, Safia Abdalla,  and Carol Willing, “Jupyter notebooks – a publishing format for reproducible computational workflows,” in Positioning and Power in Academic Publishing: Players, Agents and Agendas, edited by F. Loizides and B. Schmidt (IOS Press, 2016) pp. 87 – 90.
  78. Fraciois Chollet, “Keras,” GitHub repository  (2017).
  79. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al., “Tensorflow: A system for large-scale machine learning.” in OSDI, Vol. 16 (2016) pp. 265–283.
  80. Lu Lu, “Dying ReLU and initialization: Theory and numerical examples,” Communications in Computational Physics 28, 1671–1706 (2020).
  81. Kaiming He, Xiangyu Zhang, Shaoqing Ren,  and Jian Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,”  (2015), arXiv:1502.01852 [cs.CV] .
  82. Diederik P. Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,”  (2017), arXiv:1412.6980 [cs.LG] .
  83. J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. Lemaître, A. Mertens,  and M. Selvaggi (DELPHES 3), “DELPHES 3, A modular framework for fast simulation of a generic collider experiment,” JHEP 02, 057 (2014), arXiv:1307.6346 [hep-ex] .
Citations (2)

Summary

  • The paper introduces Moment Pooling to reduce latent space dimensionality while preserving model performance.
  • It extends Deep Sets with higher-order moment aggregation, significantly improving quark/gluon jet classification in EFNs.
  • The method enhances interpretability by linking low-dimensional latent representations to known physical observables.

Streamlining Latent Spaces in Machine Learning with Moment Pooling

Introduction

In the field of machine learning, especially within applications to collider physics, the concept of learning latent representations of data presents both a challenge and an opportunity. Traditional methods involve leveraging high-dimensional latent spaces that, while effective, pose serious challenges in terms of interpretation and computational efficiency. In the quest for more streamlined and interpretable models, this paper introduces a novel advancement: Moment Pooling.

Moment Pooling: A Natural Extension to Deep Sets

Deep Sets, and by extension Energy Flow Networks (EFNs), offer a powerful framework for handling unordered sets of data. However, they inherently involve high-dimensional latent spaces. Moment Pooling emerges as a pivotal generalization of these architectures, enabling drastic reductions in latent space dimensionality without compromising on model performance. By extending the summation operation in Deep Sets to capture arbitrary multivariate moments, Moment Pooling leverages the inherent structure in the data, allowing for much more efficient learning and representation.

Moment Pooling's effectiveness is demonstrated through its application to EFNs in collider physics tasks, where it is shown to significantly outperform traditional methods in terms of efficiency and interpretability. Specifically, the implementation of Moment Pooling within EFNs, dubbed Moment EFNs, yields models that are not only equally or more accurate but also far less complex and easier to interpret.

Empirical Validation: Quark/Gluon Jet Classification

The efficacy of Moment Pooling is empirically validated through its application to the quark/gluon jet classification task. A series of Moment EFN models, with varying orders of moments and latent dimensions, were trained and evaluated. The results underscore the superiority of Moment EFNs; for a given latent dimension, increasing the order of moments (and thus, the effective latent dimension) consistently improves model performance. Remarkably, Moment EFNs with just a single latent dimension, when extended to higher-order moments, perform on par with traditional EFNs that employ significantly higher latent dimensions. This not only highlights the efficiency of Moment Pooling but also its potential in reducing computational costs and enhancing model interpretability.

Interpretability: Beyond Performance Metrics

A key advantage of Moment EFNs, and by extension, Moment Pooling, is their interpretability. For models with low latent dimensions, it becomes feasible to directly analyze and understand the learned latent representations. Such analyses reveal that Moment EFNs learn representations that bear a strong resemblance to known physical observables, thereby providing not just a powerful classification tool but also insights into the underlying physical processes.

Future Directions

Moment Pooling and Moment EFNs mark a significant step forward in machine learning applications, particularly in fields where interpretability and computational efficiency are paramount. Moving forward, the applicability of Moment Pooling extends beyond collider physics to any domain involving set-based data. Further exploration into the theoretical underpinnings of Moment Pooling, as well as its potential extensions and applications, remains a promising avenue for future research.

Conclusion

This paper introduced Moment Pooling, a powerful extension to Deep Sets architectures, and demonstrated its efficacy through the development and analysis of Moment EFNs. The results highlight Moment Pooling's ability to drastically reduce the complexity of machine learning models without sacrificing performance, thereby opening new doors for efficient and interpretable machine learning across various domains.