Improved Training Speed, Accuracy, and Data Utilization Through Loss Function Optimization (1905.11528v3)

Published 27 May 2019 in cs.LG, cs.CV, cs.NE, and stat.ML

Abstract: As the complexity of neural network models has grown, it has become increasingly important to optimize their design automatically through metalearning. Methods for discovering hyperparameters, topologies, and learning rate schedules have lead to significant increases in performance. This paper shows that loss functions can be optimized with metalearning as well, and result in similar improvements. The method, Genetic Loss-function Optimization (GLO), discovers loss functions de novo, and optimizes them for a target task. Leveraging techniques from genetic programming, GLO builds loss functions hierarchically from a set of operators and leaf nodes. These functions are repeatedly recombined and mutated to find an optimal structure, and then a covariance-matrix adaptation evolutionary strategy (CMA-ES) is used to find optimal coefficients. Networks trained with GLO loss functions are found to outperform the standard cross-entropy loss on standard image classification tasks. Training with these new loss functions requires fewer steps, results in lower test error, and allows for smaller datasets to be used. Loss-function optimization thus provides a new dimension of metalearning, and constitutes an important step towards AutoML.

Citations (71)

View on Semantic Scholar

Summary

The paper introduces GLO, a Genetic Loss-function Optimization framework that evolves loss functions to achieve faster convergence and improved accuracy on image classification tasks.
It employs evolutionary computation, including genetic programming and CMA-ES, to automatically discover loss functions that outperform traditional cross-entropy.
Empirical results on MNIST and CIFAR-10 demonstrate enhanced data efficiency and robustness, underscoring the framework's potential for automated machine learning optimization.

Improved Training Speed, Accuracy, and Data Utilization Through Loss Function Optimization

The paper by Santiago Gonzalez and Risto Miikkulainen explores the use of meta-learning to optimize loss functions, introducing a Genetic Loss-function Optimization (GLO) framework. GLO leverages evolutionary computation (EC) techniques to discover and refine new loss functions, which ultimately improves neural network performance in tasks such as image classification—a domain where traditional loss functions like cross-entropy have predominated.

Genetic Loss-function Optimization Overview

GLO uses meta-learning to optimize not only the usual facets of neural networks such as hyperparameters and architectures but also loss functions. The paper introduces a method that starts with the creation of candidate loss functions using genetic programming principles—organizing them as tree structures. These candidate functions are evolved through mechanisms of recombination and mutation toward optimal forms. Subsequently, optimization of coefficients within these functions is conducted using a covariance-matrix adaptation evolutionary strategy (CMA-ES).

Empirical Validation with Image Classification Tasks

GLO's effectiveness is validated empirically using two well-known datasets: MNIST and CIFAR-10. The former is a simpler, standard benchmark for image classification, while the latter poses a more complex, real-world challenge. GLO discovered a novel loss function, termed Baikal, that outperformed cross-entropy in various metrics, including training speed, test accuracy, and data efficiency.

Key Insights and Results

Training Speed and Accuracy: Baikal and its optimized variant BaikalCMA demonstrated faster convergence and higher accuracy compared to models using the traditional cross-entropy loss on both MNIST and CIFAR-10 datasets. This suggests a more effective learning process and better performance within fixed time constraints.
Data Utilization: The paper highlights Baikal's superior performance with smaller datasets, indicating a reduction in overfitting. This suggests that the discovered loss functions introduce an implicit form of regularization, which enhances generalization.
Transferability: Baikal, initially discovered on MNIST, showed its robustness and adaptability by also improving performance when transferred to the CIFAR-10 dataset.

Theoretical Implications and Future Directions

The paper makes an important theoretical claim that optimizing loss functions adds a crucial dimension to meta-learning endeavors, providing a pathway toward fully automated machine learning solutions. This framework can be further extended to other domains such as generative adversarial networks (GANs), where harmonizing objectives between generator and discriminator networks is pivotal.

Future Research Directions:

Exploring the joint optimization of loss functions with network architectures and hyperparameters to unearth synergies among these different facets of neural network training.
Applying GLO to a wider variety of tasks and datasets to diversify the types of discovered loss functions and their applications.
Investigating co-evolutionary approaches where different components of neural networks are optimized simultaneously, potentially facilitating even more efficient and capable algorithms.

Conclusion

GLO presents a novel approach to loss function optimization, cementing its role as a vital component in the metalearning toolkit. Gonzalez and Miikkulainen's work demonstrates clear empirical benefits and opens up new pathways for research and application within automated machine learning and beyond. The results substantiate the utility of evolutionary strategies in exploring and optimizing the space of loss functions, thus broadening the horizons for neural network training methodologies.

PDF Markdown

Related Papers

YouTube

Show All Videos