Improving the performance of weak supervision searches using transfer and meta-learning

Published 11 Dec 2023 in hep-ph, cs.LG, and hep-ex | (2312.06152v2)

Abstract: Weak supervision searches have in principle the advantages of both being able to train on experimental data and being able to learn distinctive signal properties. However, the practical applicability of such searches is limited by the fact that successfully training a neural network via weak supervision can require a large amount of signal. In this work, we seek to create neural networks that can learn from less experimental signal by using transfer and meta-learning. The general idea is to first train a neural network on simulations, thereby learning concepts that can be reused or becoming a more efficient learner. The neural network would then be trained on experimental data and should require less signal because of its previous training. We find that transfer and meta-learning can substantially improve the performance of weak supervision searches.

Abstract PDF HTML Upgrade to Chat

Authors (3)

References (45)

Citations (5)

View on Semantic Scholar

Summary

The paper significantly reduces the minimum signal requirement for weakly supervised learning by integrating transfer and meta-learning protocols.
It leverages transfer learning through pretraining on simulated signals and fine-tuning on experimental data to enhance feature extraction.
Meta-transfer learning is applied to refine model adaptability and robustness against systematic uncertainties in high-dimensional collider searches.

Enhancing Weak Supervision Signal Searches via Transfer and Meta-Learning

Introduction and Motivation

The paper "Improving the performance of weak supervision searches using transfer and meta-learning" (2312.06152) presents a systematic approach to ameliorate the intrinsic data inefficiency inherent in weak supervision strategies, specifically within the context of collider physics searches. The authors critically address the limitations of fully supervised methods—primarily susceptibility to simulation artifacts and model misspecification—and the deficits of unsupervised approaches, such as their inability to learn signal-specific features. The focus, instead, lies on Classification Without Labels (CWoLa), which optimally leverages weakly labeled experimental data but is hindered by a high signal threshold requirement for effective NN training.

The authors' principal innovation is to reduce the CWoLa learning threshold by equipping neural networks with robust inductive biases, realized via advanced transfer and meta-learning protocols. The study thoroughly evaluates these methods using jet images from dark shower scenarios, generated through the {\tt Pythia} Hidden Valley framework, benchmarking ID and DD decay mechanisms at varying granularity levels.

Event Generation and Data Representation

Signal events stem from $pp \to Z' \to \bar{q}_D q_D$ , with subsequent showering and hadronization in a dark sector, producing distinct jet images. These are contrasted against QCD-dominated SM backgrounds, with both datasets processed through realistic detector simulation (Delphes) and stringent kinematic selection representative of experimental workflows.

Jet images—serving as high-dimensional inputs for the CNNs—are centered, rotated, and flipped to canonical orientations before being discretized to $25\times25$ , $50\times50$ , and $75\times75$ resolutions.

Figure 1: Sample 2D $P_T$ jet histograms for a signal event prior to preprocessing, showing raw localization in ( $\eta,\phi$ ) space.

This preprocessing is critical for ensuring that learning is focused on physically relevant differences rather than trivial coordinate misalignments.

Baseline: CWoLa Performance

The study employs a dual-jet CNN architecture with shared trainable feature extractors, processing both SR and SB jet images and subsequently combining outputs via a product operation. The CWoLa baseline demonstrates that, while effective at higher signal yields, a clear learning threshold persists; below this threshold, the network fails to differentiate signal from background, and performance deteriorates, especially as input dimensionality increases.

Transfer Learning: Substantially Lowering the Learning Threshold

Transfer learning is operationalized through pretraining on an aggregate of simulated signals (excluding the evaluation benchmark), after which only the classifier head is reinitialized and fine-tuned on experimental (SR/SB mixed) data with the feature extractor frozen. This approach enables the CNN to internalize generalizable representations pertinent to dark jet phenomenology (such as multiplicity, thrust), directly enhancing downstream CWoLa learning efficiency.

Empirically, the adoption of transfer learning:

Substantially lowers the required signal fraction for competitive discovery significance.
Yields a more stable and lower-variance estimator due to priors encoded in the feature extractor.
Demonstrates pronounced benefits as jet image resolution (and thus task complexity) increases.

Meta-Transfer Learning: Further Efficiencies through “Learning-to-Learn”

To push beyond the static inductive bias imparted by transfer learning, the authors implement an adapted meta-transfer learning (MTL) algorithm where scaling and shifting layers are meta-optimized across a family of dark shower tasks. Unlike transfer learning, MTL targets the network’s adaptability: meta-training episodes alternately update classifier parameters and per-filter modulation variables to maximize future fine-tuning speed and accuracy.

Key aspects:

Pretraining fixes feature extractor weights, with meta-updates applied only to scaling, shifting, and classifier weights.
Each meta-training episode encompasses all off-benchmark models, mimicking a cross-model generalization scenario.
Fine-tuning is conducted with only the classifier weights and meta-learned modulations.

MTL delivers modest but systematic improvements over pure transfer learning on low- and mid-resolution jet images, notably reducing the minimal signal requirement for successful extraction. For high-resolution images, the improvement is less pronounced with baseline kernels but can be enhanced by appropriately adjusting kernel sizes.

Robustness to Systematic Uncertainties

The study extends its analysis by quantifying the effect of systematic uncertainties on the background estimation, integrating modified significance calculations with a 1% systematic uncertainty. Results demonstrate that, although all methods experience compression in achievable significance, the relative advantage of transfer and meta-learning over pure CWoLa remains unaffected.

Figure 2: Comparison of transfer learning and pure CWoLa performance in the presence of 1% background systematic uncertainty, demonstrating maintained improvement.

Implications and Future Prospects

The findings carry significant practical and theoretical implications:

Collider Experiment Readiness: The demonstrated reduction in training signal requirements brings weakly supervised methods closer to practical deployment in rare signal searches, such as non-standard dark sector signatures.
Generalizability: While demonstrated on Hidden Valley models, the framework is applicable to any scenario where simulated signals are abundant but experimental signal statistics are limiting.
Frontiers in AI for HEP: The positive but saturating returns of current MTL highlight both the promise and limitations of current meta-learning paradigms and point toward the need for further algorithmic innovation tailored to high-dimensional, low-signal collider tasks.

Conclusion

This work establishes, through rigorous benchmarking and quantitative analysis, that transfer and meta-learning considerably enhance the practical utility of weakly supervised classifiers in LHC-type searches. The main theoretical outcome is the substantial reduction of the minimum signal needed for effective network training, as well as the construction of models with reduced variance and robustness to systematic effects. While transfer learning already achieves most of the attainable gain, meta-learning offers additional refinements, especially as model and data complexity scale.

The study suggests fertile ground for refinement—such as exploring alternative meta-learning algorithms, richer data augmentations, or active sampling curricula—and is positioned as a valuable methodology for future experimental searches requiring maximal sample efficiency under weak supervision.

Markdown Report Issue