Support Vector Machines and Generalisation in HEP (1610.09932v1)

Published 19 Oct 2016 in physics.data-an, cs.LG, and hep-ex

Abstract: We review the concept of support vector machines (SVMs) and discuss examples of their use. One of the benefits of SVM algorithms, compared with neural networks and decision trees is that they can be less susceptible to over fitting than those other algorithms are to over training. This issue is related to the generalisation of a multivariate algorithm (MVA); a problem that has often been overlooked in particle physics. We discuss cross validation and how this can be used to improve the generalisation of a MVA in the context of High Energy Physics analyses. The examples presented use the Toolkit for Multivariate Analysis (TMVA) based on ROOT and describe our improvements to the SVM functionality and new tools introduced for cross validation within this framework.

Citations (9)

View on Semantic Scholar

Summary

The paper demonstrates that enhanced SVM frameworks and new cross-validation tools significantly improve generalization in HEP multivariate analyses.
The methodology covers hard and soft margin SVMs along with kernel functions, illustrated through the Higgs boson challenge.
Results indicate that using k-fold cross-validation and ROC analysis, SVMs achieve robust performance compared to decision trees.

Support Vector Machines and Generalization in High Energy Physics

Introduction

Support Vector Machines (SVMs) are a robust machine learning method utilized across various fields, including High Energy Physics (HEP), due to their effectiveness in multivariate analysis (MVA). This paper investigates SVMs' advantages in the context of HEP, focusing on their generalization capabilities compared to other algorithms like neural networks and decision trees. The authors discuss improvements to SVM functionalities within the ROOT-based Toolkit for Multivariate Analysis (TMVA) and introduce new cross-validation tools to enhance SVM generalization in HEP applications.

Support Vector Machines

Hard Margin SVM

The hard margin SVM is designed for linearly separable data, defining a maximal margin hyperplane. This paper outlines the mathematical formulation of the hard margin SVM, including the introduction of a geometric margin $\gamma$ , and the dual-space representation solved through Lagrangian minimization. A distinction is made between functional and geometric margins, with a focus on the role of support vectors in defining these margins.

Soft Margin SVM

For real-world applications, data may not be perfectly linearly separable due to noise and variations; thus, the paper describes the soft margin SVM, which introduces slack variables $\xi_i$ and the cost parameter $C$ , enabling some misclassification. This approach relaxes constraints to allow data points on the incorrect side of the decision boundary, explained via modifications to the Lagrangian form and constraints on the $\alpha_i$ parameters.

Kernel Functions

In scenarios requiring non-linear classification, kernel functions map data into higher-dimensional feature spaces without explicit knowledge of the transformation—a process known as the Kernel Trick. The paper delineates conditions for valid kernel functions, including symmetry and Mercer's condition. It further provides implementations of polynomial, radial basis function (RBF), and multi-Gaussian kernels within TMVA, emphasizing their respective roles and configurations.

Higgs Boson Example

The paper presents a practical example using the Higgs Machine Learning Challenge dataset, comparing the performance of SVMs with different kernel functions against a Boosted Decision Tree. Six discriminative variables are used for classifier training. While ROC curves demonstrate similar outcomes across classifiers, the need for generalization checks is highlighted for robust performance assessments.

Generalization Techniques

Hold-out Validation

Hold-out validation involves splitting the dataset into training and testing subsets, optimizing classifier hyperparameters on training data, and assessing performance on test data. The paper identifies limitations in estimating errors due to potential data splitting biases.

k-fold Cross-validation

For limited datasets, k-fold cross-validation avoids hold-out constraints by dividing data into $k$ folds, iteratively training the model on $k-1$ folds, and testing on the remaining fold. The average error rate across folds provides a performance estimate. The paper discusses the trade-offs involved in choosing $k$ and advices leveraging more comprehensive validation practices by further segmenting data into separate training, testing, and validation sets.

Higgs Boson Example Revisited

The paper illustrates superior outcomes using 5-fold cross-validation over hold-out validation for SVMs with RBF kernels. ROC analysis confirms improved classifier performance and greater alignment between training and testing datasets, demonstrating enhanced generalization when using cross-validation.

Conclusion

The paper outlines theoretical and practical aspects of SVMs in HEP, highlighting their efficacy in multivariate analysis through generalization improvements achieved via cross-validation techniques. The ROOT TMVA framework's enhancements facilitate refined SVM applications in HEP data analyses, paving pathways for future exploration and performance optimization of such models within the complex domain of particle physics.