Strong CWoLa: Binary Classification Without Background Simulation (2503.14876v1)
Abstract: Supervised deep learning methods have been successful in the field of high energy physics, and the trend within the field is to move away from high level reconstructed variables to lower level, higher dimensional features. Supervised methods require labelled data, which is typically provided by a simulator. As the number of features increases, simulation accuracy decreases, leading to greater domain shift between training and testing data when using lower-level features. This work demonstrates that the classification without labels paradigm can be used to remove the need for background simulation when training supervised classifiers. This can result in classifiers with higher performance on real data than those trained on simulated data.