Papers
Topics
Authors
Recent
Search
2000 character limit reached

Improving the performance of weak supervision searches using transfer and meta-learning

Published 11 Dec 2023 in hep-ph, cs.LG, and hep-ex | (2312.06152v2)

Abstract: Weak supervision searches have in principle the advantages of both being able to train on experimental data and being able to learn distinctive signal properties. However, the practical applicability of such searches is limited by the fact that successfully training a neural network via weak supervision can require a large amount of signal. In this work, we seek to create neural networks that can learn from less experimental signal by using transfer and meta-learning. The general idea is to first train a neural network on simulations, thereby learning concepts that can be reused or becoming a more efficient learner. The neural network would then be trained on experimental data and should require less signal because of its previous training. We find that transfer and meta-learning can substantially improve the performance of weak supervision searches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Y.-C. J. Chen, C.-W. Chiang, G. Cottin, and D. Shih, “Boosted W𝑊Witalic_W and Z𝑍Zitalic_Z tagging with jet charge and deep learning,” Phys. Rev. D 101 no. 5, (2020) 053001, arXiv:1908.08256 [hep-ph].
  2. E. Bernreuther, T. Finke, F. Kahlhoefer, M. Krämer, and A. Mück, “Casting a graph net to catch dark showers,” SciPost Phys. 10 no. 2, (2021) 046, arXiv:2006.08639 [hep-ph].
  3. S. Chang, T.-K. Chen, and C.-W. Chiang, “Distinguishing W′superscript𝑊′W^{\prime}italic_W start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT signals at hadron colliders using neural networks,” Phys. Rev. D 103 no. 3, (2021) 036016, arXiv:2007.14586 [hep-ph].
  4. C.-W. Chiang, D. Shih, and S.-F. Wei, “VBF vs. GGF Higgs with Full-Event Deep Learning: Towards a Decay-Agnostic Tagger,” Phys. Rev. D 107 no. 1, (2023) 016014, arXiv:2209.05518 [hep-ph].
  5. E. M. Metodiev, B. Nachman, and J. Thaler, “Classification without labels: Learning from mixed samples in high energy physics,” JHEP 10 (2017) 174, arXiv:1708.02949 [hep-ph].
  6. M. Farina, Y. Nakai, and D. Shih, “Searching for New Physics with Deep Autoencoders,” Phys. Rev. D 101 no. 7, (2020) 075021, arXiv:1808.08992 [hep-ph].
  7. J. Batson, C. G. Haaf, Y. Kahn, and D. A. Roberts, “Topological Obstructions to Autoencoding,” JHEP 04 (2021) 280, arXiv:2102.08380 [hep-ph].
  8. R. T. D’Agnolo and A. Wulzer, “Learning New Physics from a Machine,” Phys. Rev. D 99 no. 1, (2019) 015014, arXiv:1806.02350 [hep-ph].
  9. B. Nachman and D. Shih, “Anomaly Detection with Density Estimation,” Phys. Rev. D 101 (2020) 075042, arXiv:2001.04990 [hep-ph].
  10. A. Andreassen, B. Nachman, and D. Shih, “Simulation Assisted Likelihood-free Anomaly Detection,” Phys. Rev. D 101 no. 9, (2020) 095004, arXiv:2001.05001 [hep-ph].
  11. A. Hallin, J. Isaacson, G. Kasieczka, C. Krause, B. Nachman, T. Quadfasel, M. Schlaffer, D. Shih, and M. Sommerhalder, “Classifying anomalies through outer density estimation,” Phys. Rev. D 106 no. 5, (2022) 055006, arXiv:2109.00546 [hep-ph].
  12. G. Kasieczka, R. Mastandrea, V. Mikuni, B. Nachman, M. Pettee, and D. Shih, “Anomaly detection under coordinate transformations,” Phys. Rev. D 107 no. 1, (2023) 015009, arXiv:2209.06225 [hep-ph].
  13. A. Hallin, G. Kasieczka, T. Quadfasel, D. Shih, and M. Sommerhalder, “Resonant anomaly detection without background sculpting,” arXiv:2210.14924 [hep-ph].
  14. ATLAS Collaboration, G. Aad et al., “Dijet resonance search with weak supervision using s=13𝑠13\sqrt{s}=13square-root start_ARG italic_s end_ARG = 13 TeV p⁢p𝑝𝑝ppitalic_p italic_p collisions in the ATLAS detector,” Phys. Rev. Lett. 125 no. 13, (2020) 131801, arXiv:2005.02983 [hep-ex].
  15. J. H. Collins, P. Martín-Ramiro, B. Nachman, and D. Shih, “Comparing weak- and unsupervised methods for resonant anomaly detection,” Eur. Phys. J. C 81 no. 7, (2021) 617, arXiv:2104.02092 [hep-ph].
  16. B. M. Dillon, L. Favaro, F. Feiden, T. Modak, and T. Plehn, “Anomalies, Representations, and Self-Supervision,” arXiv:2301.04660 [hep-ph].
  17. J. H. Collins, K. Howe, and B. Nachman, “Anomaly Detection for Resonant New Physics with Machine Learning,” Phys. Rev. Lett. 121 no. 24, (2018) 241803, arXiv:1805.02664 [hep-ph].
  18. T. Finke, M. Hein, G. Kasieczka, M. Krämer, A. Mück, P. Prangchaikul, T. Quadfasel, D. Shih, and M. Sommerhalder, “Back To The Roots: Tree-Based Algorithms for Weakly Supervised Anomaly Detection,” arXiv:2309.13111 [hep-ph].
  19. M. Freytsis, M. Perelstein, and Y. C. San, “Anomaly Detection in Presence of Irrelevant Features,” arXiv:2310.13057 [hep-ph].
  20. S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on Knowledge and Data Engineering 22 no. 10, (2010) 1345–1359.
  21. L. Y. Pratt, J. Mostow, and C. A. Kamm, “Direct transfer of learned information among neural networks,” in AAAI Conference on Artificial Intelligence. 1991.
  22. T. Hospedales, A. Antoniou, P. Micaelli, and A. Storkey, “Meta-learning in neural networks: A survey,” IEEE Transactions on Pattern Analysis; Machine Intelligence 44 no. 09, (Sep, 2022) 5149–5169.
  23. Q. Sun, Y. Liu, T.-S. Chua, and B. Schiele, “Meta-transfer learning for few-shot learning,” arXiv:1812.02391 [cs.CV].
  24. G. Albouy et al., “Theory, phenomenology, and experimental avenues for dark showers: a Snowmass 2021 report,” Eur. Phys. J. C 82 no. 12, (2022) 1132, arXiv:2203.09503 [hep-ph].
  25. Z. Chacko, H.-S. Goh, and R. Harnik, “The Twin Higgs: Natural electroweak breaking from mirror symmetry,” Phys. Rev. Lett. 96 (2006) 231802, arXiv:hep-ph/0506256.
  26. P. W. Graham, D. E. Kaplan, and S. Rajendran, “Cosmological Relaxation of the Electroweak Scale,” Phys. Rev. Lett. 115 no. 22, (2015) 221801, arXiv:1504.07551 [hep-ph].
  27. H. Beauchesne, E. Bertuzzo, and G. Grilli Di Cortona, “Dark matter in Hidden Valley models with stable and unstable light dark mesons,” JHEP 04 (2019) 118, arXiv:1809.10152 [hep-ph].
  28. E. Bernreuther, F. Kahlhoefer, M. Krämer, and P. Tunney, “Strongly interacting dark sectors in the early Universe and at the LHC through a simplified portal,” JHEP 01 (2020) 162, arXiv:1907.04346 [hep-ph].
  29. H. Beauchesne and G. Grilli di Cortona, “Classification of dark pion multiplets as dark matter candidates and collider phenomenology,” JHEP 02 (2020) 196, arXiv:1910.10724 [hep-ph].
  30. CMS Collaboration, A. M. Sirunyan et al., “Search for new particles decaying to a jet and an emerging jet,” JHEP 02 (2019) 179, arXiv:1810.10069 [hep-ex].
  31. CMS Collaboration, A. Tumasyan et al., “Search for resonant production of strongly coupled dark matter in proton-proton collisions at 13 TeV,” JHEP 06 (2022) 156, arXiv:2112.11125 [hep-ex].
  32. ATLAS Collaboration, G. Aad et al., “Search for non-resonant production of semi-visible jets using Run 2 data in ATLAS,” Phys. Lett. B 848 (2024) 138324, arXiv:2305.18037 [hep-ex].
  33. ATLAS Collaboration, G. Aad et al., “Search for Resonant Production of Dark Quarks in the Dijet Final State with the ATLAS Detector,” arXiv:2311.03944 [hep-ex].
  34. D. Bardhan, Y. Kats, and N. Wunch, “Searching for dark jets with displaced vertices using weakly supervised machine learning,” arXiv:2305.04372 [hep-ph].
  35. C. Bierlich et al., “A comprehensive guide to the physics and usage of PYTHIA 8.3,” arXiv:2203.11601 [hep-ph].
  36. L. Carloni, J. Rathsman, and T. Sjostrand, “Discerning Secluded Sector gauge structures,” JHEP 04 (2011) 091, arXiv:1102.3795 [hep-ph].
  37. L. Carloni and T. Sjostrand, “Visible Effects of Invisible Hidden Valley Radiation,” JHEP 09 (2010) 105, arXiv:1006.2911 [hep-ph].
  38. J. Alwall, R. Frederix, S. Frixione, V. Hirschi, F. Maltoni, O. Mattelaer, H. S. Shao, T. Stelzer, P. Torrielli, and M. Zaro, “The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations,” JHEP 07 (2014) 079, arXiv:1405.0301 [hep-ph].
  39. DELPHES 3 Collaboration, J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. Lemaître, A. Mertens, and M. Selvaggi, “DELPHES 3, A modular framework for fast simulation of a generic collider experiment,” JHEP 02 (2014) 057, arXiv:1307.6346 [hep-ex].
  40. A. Butter et al., “The Machine Learning landscape of top taggers,” SciPost Phys. 7 (2019) 014, arXiv:1902.09914 [hep-ph].
  41. L. de Oliveira, M. Kagan, L. Mackey, B. Nachman, and A. Schwartzman, “Jet-images — deep learning edition,” JHEP 07 (2016) 069, arXiv:1511.05190 [hep-ph].
  42. G. Kasieczka, T. Plehn, M. Russell, and T. Schell, “Deep-learning Top Taggers or The End of QCD?,” JHEP 05 (2017) 006, arXiv:1701.08784 [hep-ph].
  43. F. Chollet et al., “Keras.” https://keras.io, 2015.
  44. M. Abadi et al., “TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems,” arXiv:1603.04467 [cs.DC].
  45. ATLAS Collaboration, “Formulae for Estimating Significance,” 2020.
Citations (5)

Summary

  • The paper significantly reduces the minimum signal requirement for weakly supervised learning by integrating transfer and meta-learning protocols.
  • It leverages transfer learning through pretraining on simulated signals and fine-tuning on experimental data to enhance feature extraction.
  • Meta-transfer learning is applied to refine model adaptability and robustness against systematic uncertainties in high-dimensional collider searches.

Enhancing Weak Supervision Signal Searches via Transfer and Meta-Learning

Introduction and Motivation

The paper "Improving the performance of weak supervision searches using transfer and meta-learning" (2312.06152) presents a systematic approach to ameliorate the intrinsic data inefficiency inherent in weak supervision strategies, specifically within the context of collider physics searches. The authors critically address the limitations of fully supervised methods—primarily susceptibility to simulation artifacts and model misspecification—and the deficits of unsupervised approaches, such as their inability to learn signal-specific features. The focus, instead, lies on Classification Without Labels (CWoLa), which optimally leverages weakly labeled experimental data but is hindered by a high signal threshold requirement for effective NN training.

The authors' principal innovation is to reduce the CWoLa learning threshold by equipping neural networks with robust inductive biases, realized via advanced transfer and meta-learning protocols. The study thoroughly evaluates these methods using jet images from dark shower scenarios, generated through the {\tt Pythia} Hidden Valley framework, benchmarking ID and DD decay mechanisms at varying granularity levels.

Event Generation and Data Representation

Signal events stem from ppZqˉDqDpp \to Z' \to \bar{q}_D q_D, with subsequent showering and hadronization in a dark sector, producing distinct jet images. These are contrasted against QCD-dominated SM backgrounds, with both datasets processed through realistic detector simulation (Delphes) and stringent kinematic selection representative of experimental workflows.

Jet images—serving as high-dimensional inputs for the CNNs—are centered, rotated, and flipped to canonical orientations before being discretized to 25×2525\times25, 50×5050\times50, and 75×7575\times75 resolutions. Figure 1

Figure 1

Figure 1

Figure 1

Figure 1: Sample 2D PTP_T jet histograms for a signal event prior to preprocessing, showing raw localization in (η,ϕ\eta,\phi) space.

This preprocessing is critical for ensuring that learning is focused on physically relevant differences rather than trivial coordinate misalignments.

Baseline: CWoLa Performance

The study employs a dual-jet CNN architecture with shared trainable feature extractors, processing both SR and SB jet images and subsequently combining outputs via a product operation. The CWoLa baseline demonstrates that, while effective at higher signal yields, a clear learning threshold persists; below this threshold, the network fails to differentiate signal from background, and performance deteriorates, especially as input dimensionality increases.

Transfer Learning: Substantially Lowering the Learning Threshold

Transfer learning is operationalized through pretraining on an aggregate of simulated signals (excluding the evaluation benchmark), after which only the classifier head is reinitialized and fine-tuned on experimental (SR/SB mixed) data with the feature extractor frozen. This approach enables the CNN to internalize generalizable representations pertinent to dark jet phenomenology (such as multiplicity, thrust), directly enhancing downstream CWoLa learning efficiency.

Empirically, the adoption of transfer learning:

  • Substantially lowers the required signal fraction for competitive discovery significance.
  • Yields a more stable and lower-variance estimator due to priors encoded in the feature extractor.
  • Demonstrates pronounced benefits as jet image resolution (and thus task complexity) increases.

Meta-Transfer Learning: Further Efficiencies through “Learning-to-Learn”

To push beyond the static inductive bias imparted by transfer learning, the authors implement an adapted meta-transfer learning (MTL) algorithm where scaling and shifting layers are meta-optimized across a family of dark shower tasks. Unlike transfer learning, MTL targets the network’s adaptability: meta-training episodes alternately update classifier parameters and per-filter modulation variables to maximize future fine-tuning speed and accuracy.

Key aspects:

  • Pretraining fixes feature extractor weights, with meta-updates applied only to scaling, shifting, and classifier weights.
  • Each meta-training episode encompasses all off-benchmark models, mimicking a cross-model generalization scenario.
  • Fine-tuning is conducted with only the classifier weights and meta-learned modulations.

MTL delivers modest but systematic improvements over pure transfer learning on low- and mid-resolution jet images, notably reducing the minimal signal requirement for successful extraction. For high-resolution images, the improvement is less pronounced with baseline kernels but can be enhanced by appropriately adjusting kernel sizes.

Robustness to Systematic Uncertainties

The study extends its analysis by quantifying the effect of systematic uncertainties on the background estimation, integrating modified significance calculations with a 1% systematic uncertainty. Results demonstrate that, although all methods experience compression in achievable significance, the relative advantage of transfer and meta-learning over pure CWoLa remains unaffected. Figure 2

Figure 2: Comparison of transfer learning and pure CWoLa performance in the presence of 1% background systematic uncertainty, demonstrating maintained improvement.

Implications and Future Prospects

The findings carry significant practical and theoretical implications:

  • Collider Experiment Readiness: The demonstrated reduction in training signal requirements brings weakly supervised methods closer to practical deployment in rare signal searches, such as non-standard dark sector signatures.
  • Generalizability: While demonstrated on Hidden Valley models, the framework is applicable to any scenario where simulated signals are abundant but experimental signal statistics are limiting.
  • Frontiers in AI for HEP: The positive but saturating returns of current MTL highlight both the promise and limitations of current meta-learning paradigms and point toward the need for further algorithmic innovation tailored to high-dimensional, low-signal collider tasks.

Conclusion

This work establishes, through rigorous benchmarking and quantitative analysis, that transfer and meta-learning considerably enhance the practical utility of weakly supervised classifiers in LHC-type searches. The main theoretical outcome is the substantial reduction of the minimum signal needed for effective network training, as well as the construction of models with reduced variance and robustness to systematic effects. While transfer learning already achieves most of the attainable gain, meta-learning offers additional refinements, especially as model and data complexity scale.

The study suggests fertile ground for refinement—such as exploring alternative meta-learning algorithms, richer data augmentations, or active sampling curricula—and is positioned as a valuable methodology for future experimental searches requiring maximal sample efficiency under weak supervision.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 0 likes about this paper.