Moments of Clarity: Streamlining Latent Spaces in Machine Learning using Moment Pooling (2403.08854v2)
Abstract: Many machine learning applications involve learning a latent representation of data, which is often high-dimensional and difficult to directly interpret. In this work, we propose "Moment Pooling", a natural extension of Deep Sets networks which drastically decrease latent space dimensionality of these networks while maintaining or even improving performance. Moment Pooling generalizes the summation in Deep Sets to arbitrary multivariate moments, which enables the model to achieve a much higher effective latent dimensionality for a fixed latent dimension. We demonstrate Moment Pooling on the collider physics task of quark/gluon jet classification by extending Energy Flow Networks (EFNs) to Moment EFNs. We find that Moment EFNs with latent dimensions as small as 1 perform similarly to ordinary EFNs with higher latent dimension. This small latent dimension allows for the internal representation to be directly visualized and interpreted, which in turn enables the learned internal jet representation to be extracted in closed form.
- Supriyo Chakraborty, Richard Tomsett, Ramya Raghavendra, Daniel Harborne, Moustafa Alzantot, Federico Cerutti, Mani Srivastava, Alun Preece, Simon Julier, Raghuveer M. Rao, Troy D. Kelley, Dave Braines, Murat Sensoy, Christopher J. Willis, and Prudhvi Gurram, “Interpretability of deep learning models: A survey of results,” in 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) (2017) pp. 1–6.
- Leilani H. Gilpin, David Bau, Ben Z. Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal, “Explaining explanations: An overview of interpretability of machine learning,” (2018).
- Yu Zhang, Peter Tino, Ales Leonardis, and Ke Tang, “A survey on neural network interpretability,” IEEE Transactions on Emerging Topics in Computational Intelligence 5, 726–742 (2021).
- Christoph Molnar, Giuseppe Casalicchio, and Bernd Bischl, “Interpretable machine learning – a brief history, state-of-the-art and challenges,” in Communications in Computer and Information Science (Springer International Publishing, 2020) p. 417–431.
- Cynthia Rudin, Chaofan Chen, Zhi Chen, Haiyang Huang, Lesia Semenova, and Chudi Zhong, “Interpretable machine learning: Fundamental principles and 10 grand challenges,” (2021), arXiv:2103.11251 [cs.LG] .
- Barry M. Dillon, Gregor Kasieczka, Hans Olischlager, Tilman Plehn, Peter Sorrenson, and Lorenz Vogel, “Symmetries, Safety, and Self-Supervision,” (2021), arXiv:2108.04253 [hep-ph] .
- Alexander Bogatskiy et al., “Symmetry Group Equivariant Architectures for Physics,” in 2022 Snowmass Summer Study (2022) arXiv:2203.06153 [cs.LG] .
- Lei Gao and Ling Guan, “Interpretability of machine learning: Recent advances and future prospects,” (2023), arXiv:2305.00537 [cs.MM] .
- Alexander Bogatskiy, Timothy Hoffman, David W. Miller, Jan T. Offermann, and Xiaoyang Liu, “Explainable Equivariant Neural Networks for Particle Physics: PELICAN,” (2023), arXiv:2307.16506 [hep-ph] .
- Jeremy Wayland, Corinna Coupette, and Bastian Rieck, “Mapping the multiverse of latent representations,” (2024), arXiv:2402.01514 [cs.LG] .
- Bruce H. Denby, “Neural Networks and Cellular Automata in Experimental High-energy Physics,” Comput. Phys. Commun. 49, 429–448 (1988).
- Dan Guest, Kyle Cranmer, and Daniel Whiteson, “Deep Learning and its Application to LHC Physics,” (2018), 10.1146/annurev-nucl-101917-021019, arXiv:1806.11484 [hep-ex] .
- Anja Butter, Gregor Kasieczka, Tilman Plehn, and Michael Russell, “Deep-learned Top Tagging with a Lorentz Layer,” SciPost Phys. 5, 028 (2018), arXiv:1707.08966 [hep-ph] .
- Kim Albertsson et al., “Machine Learning in High Energy Physics Community White Paper,” (2018), 10.1088/1742-6596/1085/2/022008, arXiv:1807.02876 [physics.comp-ph] .
- Huilin Qu and Loukas Gouskos, ‘‘Jet tagging via particle clouds,” Phys. Rev. D 101, 056019 (2020a).
- Dimitri Bourilkov, “Machine and Deep Learning Applications in Particle Physics,” Int. J. Mod. Phys. A 34, 1930019 (2020), arXiv:1912.08245 [physics.data-an] .
- Shiqi Gong, Qi Meng, Jue Zhang, Huilin Qu, Congqiao Li, Sitian Qian, Weitao Du, Zhi-Ming Ma, and Tie-Yan Liu, “An Efficient Lorentz Equivariant Graph Neural Network for Jet Tagging,” (2022), arXiv:2201.08187 [hep-ph] .
- Jonathan Shlomi, Peter Battaglia, and Jean-Roch Vlimant, “Graph Neural Networks in Particle Physics,” (2020), 10.1088/2632-2153/abbf9a, arXiv:2007.13681 [hep-ex] .
- Amit Chakraborty, Sung Hak Lim, Mihoko M. Nojiri, and Michihisa Takeuchi, “Neural Network-based Top Tagger with Two-Point Energy Correlations and Geometry of Soft Emissions,” (2020), 10.1007/JHEP07(2020)111, arXiv:2003.11787 [hep-ph] .
- Anja Butter and Tilman Plehn, “Generative Networks for LHC events,” (2020), arXiv:2008.08558 [hep-ph] .
- Michael Kagan, “Image-Based Jet Analysis,” (2020), arXiv:2012.09719 [physics.data-an] .
- Sung Hak Lim and Mihoko M. Nojiri, “Morphology for Jet Classification,” (2020), arXiv:2010.13469 [hep-ph] .
- Frédéric A. Dreyer and Huilin Qu, “Jet tagging in the Lund plane with graph networks,” JHEP 03, 052 (2021), arXiv:2012.08526 [hep-ph] .
- Georgia Karagiorgi, Gregor Kasieczka, Scott Kravitz, Benjamin Nachman, and David Shih, “Machine Learning in the Search for New Fundamental Physics,” (2021), arXiv:2112.03769 [hep-ph] .
- Matthew D. Schwartz, “Modern Machine Learning and Particle Physics,” (2021), arXiv:2103.12226 [hep-ph] .
- Huilin Qu, Congqiao Li, and Sitian Qian, “Particle Transformer for Jet Tagging,” (2022), arXiv:2202.03772 [hep-ph] .
- Pierre Baldi, Peter Sadowski, and Daniel Whiteson, “Deep Learning From Four Vectors,” (2022), arXiv:2203.03067 [hep-ex] .
- Tilman Plehn, Anja Butter, Barry Dillon, and Claudius Krause, “Modern Machine Learning for LHC Physicists,” (2022), arXiv:2211.01421 [hep-ph] .
- Giuseppe Carleo, Ignacio Cirac, Kyle Cranmer, Laurent Daudet, Maria Schuld, Naftali Tishby, Leslie Vogt-Maranto, and Lenka Zdeborová, “Machine learning and the physical sciences,” Rev. Mod. Phys. 91, 045002 (2019).
- Simon Badger et al., “Machine Learning and LHC Event Generation,” (2022), arXiv:2203.07460 [hep-ph] .
- Alexander Bogatskiy, Timothy Hoffman, David W. Miller, and Jan T. Offermann, “PELICAN: Permutation Equivariant and Lorentz Invariant or Covariant Aggregator Network for Particle Physics,” (2022b), arXiv:2211.00454 [hep-ph] .
- Oliver Atkinson, Akanksha Bhardwaj, Christoph Englert, Partha Konar, Vishal S. Ngairangbam, and Michael Spannowsky, “IRC-Safe Graph Autoencoder for Unsupervised Anomaly Detection,” Front. Artif. Intell. 5, 943135 (2022), arXiv:2204.12231 [hep-ph] .
- Akanksha Bhardwaj, Christoph Englert, Wrishik Naskar, Vishal S. Ngairangbam, and Michael Spannowsky, “Equivariant, Safe and Sensitive – Graph Networks for New Physics,” (2024), arXiv:2402.12449 [hep-ph] .
- Patrick T. Komiske, Eric M. Metodiev, and Jesse Thaler, “Energy Flow Networks: Deep Sets for Particle Jets,” JHEP 01, 121 (2019a), arXiv:1810.05165 [hep-ph] .
- Anja Butter et al., “The Machine Learning Landscape of Top Taggers,” SciPost Phys. 7, 014 (2019), arXiv:1902.09914 [hep-ph] .
- “Constituent-Based Top-Quark Tagging with the ATLAS Detector,” (2022).
- Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan Salakhutdinov, and Alexander Smola, “Deep sets,” (2017).
- Miles Cranmer, , Christina Kreisch, Alice Pisani, Francisco Villaescusa-Navarro, David N. Spergel, and Shirley Ho, “Histogram pooling operators: An alternative for deep sets,” (2021).
- Wei Shen, Daohan Wang, and Jin Min Yang, “Hierarchical high-point Energy Flow Network for jet tagging,” JHEP 09, 135 (2023), arXiv:2308.08300 [hep-ph] .
- Samuel Bright-Thonney, Benjamin Nachman, and Jesse Thaler, “Safe but Incalculable: Energy-weighting is not all you need,” (2023), arXiv:2311.07652 [hep-ph] .
- Carola F. Berger, Tibor Kucs, and George F. Sterman, “Event shape / energy flow correlations,” Phys. Rev. D 68, 014012 (2003), arXiv:hep-ph/0303051 .
- Carola F. Berger and Lorenzo Magnea, “Scaling of power corrections for angularities from dressed gluon exponentiation,” Phys. Rev. D 70, 094010 (2004), arXiv:hep-ph/0407024 .
- Fyodor V. Tkachov, “Measuring multi - jet structure of hadronic energy flow or What is a jet?” Int. J. Mod. Phys. A 12, 5411–5529 (1997), arXiv:hep-ph/9601308 .
- N. A. Sveshnikov and F. V. Tkachov, “Jets and quantum field theory,” Phys. Lett. B 382, 403–408 (1996), arXiv:hep-ph/9512370 .
- Fyodor V. Tkachov, “A Theory of jet definition,” Int. J. Mod. Phys. A 17, 2783–2884 (2002), arXiv:hep-ph/9901444 .
- Diego M. Hofman and Juan Maldacena, “Conformal collider physics: Energy and charge correlations,” JHEP 05, 012 (2008), arXiv:0803.1467 [hep-th] .
- Demba Ba, Akshunna S. Dogra, Rikab Gambhir, Abiy Tasissa, and Jesse Thaler, “SHAPER: can you hear the shape of a jet?” JHEP 06, 195 (2023), arXiv:2302.12266 [hep-ph] .
- Patrick T. Komiske, Eric M. Metodiev, and Jesse Thaler, “Energy flow polynomials: A complete linear basis for jet substructure,” JHEP 04, 013 (2018a), arXiv:1712.07124 [hep-ph] .
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio, “Neural machine translation by jointly learning to align and translate,” (2016), arXiv:1409.0473 [cs.CL] .
- Jianpeng Cheng, Li Dong, and Mirella Lapata, “Long short-term memory-networks for machine reading,” (2016), arXiv:1601.06733 [cs.CL] .
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin, “Attention is all you need,” (2023), arXiv:1706.03762 [cs.CL] .
- Jason Gallicchio and Matthew D. Schwartz, “Quark and Gluon Tagging at the LHC,” Phys. Rev. Lett. 107, 172001 (2011), arXiv:1106.3076 [hep-ph] .
- Philippe Gras, Stefan Höche, Deepak Kar, Andrew Larkoski, Leif Lönnblad, Simon Plätzer, Andrzej Siódmok, Peter Skands, Gregory Soyez, and Jesse Thaler, “Systematics of quark/gluon tagging,” JHEP 07, 091 (2017), arXiv:1704.03878 [hep-ph] .
- Torbjorn Sjostrand, Stephen Mrenna, and Peter Z. Skands, “PYTHIA 6.4 Physics and Manual,” JHEP 05, 026 (2006), arXiv:hep-ph/0603175 .
- Torbjörn Sjöstrand, Stefan Ask, Jesper R. Christiansen, Richard Corke, Nishita Desai, Philip Ilten, Stephen Mrenna, Stefan Prestel, Christine O. Rasmussen, and Peter Z. Skands, “An introduction to PYTHIA 8.2,” Comput. Phys. Commun. 191, 159–177 (2015), arXiv:1410.3012 [hep-ph] .
- Matteo Cacciari, Gavin P. Salam, and Gregory Soyez, “The anti-ktsubscript𝑘𝑡k_{t}italic_k start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT jet clustering algorithm,” JHEP 04, 063 (2008), arXiv:0802.1189 [hep-ph] .
- Matteo Cacciari, Gavin P. Salam, and Gregory Soyez, “FastJet User Manual,” Eur. Phys. J. C 72, 1896 (2012), arXiv:1111.6097 [hep-ph] .
- Eric M. Metodiev and Jesse Thaler, “Jet Topics: Disentangling Quarks and Gluons at Colliders,” Phys. Rev. Lett. 120, 241602 (2018), arXiv:1802.00008 [hep-ph] .
- Patrick T. Komiske, Eric M. Metodiev, and Jesse Thaler, “An operational definition of quark and gluon jets,” JHEP 11, 059 (2018b), arXiv:1809.01140 [hep-ph] .
- Quark versus Gluon Jet Tagging Using Jet Images with the ATLAS Detector, Tech. Rep. ATL-PHYS-PUB-2017-017 (CERN, Geneva, 2017).
- Taoli Cheng, “Recursive Neural Networks in Quark/Gluon Tagging,” (2017), 10.1007/s41781-018-0007-y, arXiv:1711.02633 [hep-ph] .
- Hui Luo, Ming-Xing Luo, Kai Wang, Tao Xu, and Guohuai Zhu, “Quark jet versus gluon jet: fully-connected neural networks with high-level features,” Sci. China Phys. Mech. Astron. 62, 991011 (2019), arXiv:1712.03634 [hep-ph] .
- Gregor Kasieczka, Nicholas Kiefer, Tilman Plehn, and Jennifer M. Thompson, “Quark-Gluon Tagging: Machine Learning vs Detector,” SciPost Phys. 6, 069 (2019), arXiv:1812.09223 [hep-ph] .
- Patrick T. Komiske, Eric M. Metodiev, and Jesse Thaler, “Energy Flow Networks: Deep Sets for Particle Jets,” JHEP 01, 121 (2019b), arXiv:1810.05165 [hep-ph] .
- Jason Sang Hun Lee, Inkyu Park, Ian James Watson, and Seungjin Yang, “Quark-Gluon Jet Discrimination Using Convolutional Neural Networks,” J. Korean Phys. Soc. 74, 219–223 (2019a), arXiv:2012.02531 [hep-ex] .
- Jason Sang Hun Lee, Sang Man Lee, Yunjae Lee, Inkyu Park, Ian James Watson, and Seungjin Yang, “Quark Gluon Jet Discrimination with Weakly Supervised Learning,” J. Korean Phys. Soc. 75, 652–659 (2019b), arXiv:2012.02540 [hep-ph] .
- Eric A. Moreno, Olmo Cerri, Javier M. Duarte, Harvey B. Newman, Thong Q. Nguyen, Avikar Periwal, Maurizio Pierini, Aidana Serikova, Maria Spiropulu, and Jean-Roch Vlimant, “JEDI-net: a jet identification algorithm based on interaction networks,” Eur. Phys. J. C 80, 58 (2020), arXiv:1908.05318 [hep-ex] .
- Huilin Qu and Loukas Gouskos, “ParticleNet: Jet Tagging via Particle Clouds,” Phys. Rev. D 101, 056019 (2020b), arXiv:1902.08570 [hep-ph] .
- Vinicius Mikuni and Florencia Canelli, “ABCNet: An attention-based method for particle tagging,” Eur. Phys. J. Plus 135, 463 (2020), arXiv:2001.05311 [physics.data-an] .
- Minxuan He and Daohan Wang, “Quark/Gluon Discrimination and Top Tagging with Dual Attention Transformer,” (2023), arXiv:2307.04723 [hep-ph] .
- Matthew J. Dolan, John Gargalionis, and Ayodele Ore, “Quark-versus-gluon tagging in CMS Open Data with CWoLa and TopicFlow,” (2023), arXiv:2312.03434 [hep-ph] .
- Kaustuv Datta and Andrew J. Larkoski, “Novel Jet Observables from Machine Learning,” JHEP 03, 086 (2018), arXiv:1710.01305 [hep-ph] .
- Stephen D. Ellis, Zoltan Kunszt, and Davison E. Soper, “Jets at hadron colliders at order α−s3:𝛼superscript𝑠:3absent\alpha-s^{3:}italic_α - italic_s start_POSTSUPERSCRIPT 3 : end_POSTSUPERSCRIPT A Look inside,” Phys. Rev. Lett. 69, 3615–3618 (1992), arXiv:hep-ph/9208249 .
- F. Abe et al. (CDF), “A Measurement of jet shapes in pp¯𝑝¯𝑝p\bar{p}italic_p over¯ start_ARG italic_p end_ARG collisions at s=1.8𝑠1.8\sqrt{s}=1.8square-root start_ARG italic_s end_ARG = 1.8 TeV,” Phys. Rev. Lett. 70, 713–717 (1993).
- Andrew J. Larkoski, Gavin P. Salam, and Jesse Thaler, “Energy Correlation Functions for Jet Substructure,” JHEP 06, 108 (2013), arXiv:1305.0007 [hep-ph] .
- Andrew L. Maas, “Rectifier nonlinearities improve neural network acoustic models,” (2013).
- Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica Hamrick, Jason Grout, Sylvain Corlay, Paul Ivanov, Damián Avila, Safia Abdalla, and Carol Willing, “Jupyter notebooks – a publishing format for reproducible computational workflows,” in Positioning and Power in Academic Publishing: Players, Agents and Agendas, edited by F. Loizides and B. Schmidt (IOS Press, 2016) pp. 87 – 90.
- Fraciois Chollet, “Keras,” GitHub repository (2017).
- Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al., “Tensorflow: A system for large-scale machine learning.” in OSDI, Vol. 16 (2016) pp. 265–283.
- Lu Lu, “Dying ReLU and initialization: Theory and numerical examples,” Communications in Computational Physics 28, 1671–1706 (2020).
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” (2015), arXiv:1502.01852 [cs.CV] .
- Diederik P. Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,” (2017), arXiv:1412.6980 [cs.LG] .
- J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. Lemaître, A. Mertens, and M. Selvaggi (DELPHES 3), “DELPHES 3, A modular framework for fast simulation of a generic collider experiment,” JHEP 02, 057 (2014), arXiv:1307.6346 [hep-ex] .