Evolving Loss Functions for Specific Image Augmentation Techniques (2404.06633v1)
Abstract: Previous work in Neural Loss Function Search (NLFS) has shown a lack of correlation between smaller surrogate functions and large convolutional neural networks with massive regularization. We expand upon this research by revealing another disparity that exists, correlation between different types of image augmentation techniques. We show that different loss functions can perform well on certain image augmentation techniques, while performing poorly on others. We exploit this disparity by performing an evolutionary search on five types of image augmentation techniques in the hopes of finding image augmentation specific loss functions. The best loss functions from each evolution were then taken and transferred to WideResNet-28-10 on CIFAR-10 and CIFAR-100 across each of the five image augmentation techniques. The best from that were then taken and evaluated by fine-tuning EfficientNetV2Small on the CARS, Oxford-Flowers, and Caltech datasets across each of the five image augmentation techniques. Multiple loss functions were found that outperformed cross-entropy across multiple experiments. In the end, we found a single loss function, which we called the inverse bessel logarithm loss, that was able to outperform cross-entropy across the majority of experiments.
- Discovering parametric activation functions. Neural Networks, 148:48–65, 2022.
- Evolutionary optimization of deep learning activation functions. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference, GECCO ’20, pp. 289–296, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450371285. doi: 10.1145/3377930.3389841. URL https://doi.org/10.1145/3377930.3389841.
- Meta learning via learned loss. In International Conference on Pattern Recognition (ICPR), pp. 4161–4168, 2019.
- Autoaugment: Learning augmentation policies from data. arXiv preprint arXiv:1805.09501, 2018.
- Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 702–703, 2020.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, 2009. doi: 10.1109/CVPR.2009.5206848.
- Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Computer Vision and Pattern Recognition Workshop, 2004.
- Searching for robustness: Loss learning for noisy classification tasks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6670–6679, October 2021.
- Loss function learning for domain generalization by implicit gradient. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp. 7002–7016. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/gao22b.html.
- Improved training speed, accuracy, and data utilization through loss function optimization. In 2020 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8, 2020. doi: 10.1109/CEC48606.2020.9185777.
- Optimizing loss functions through multi-variate Taylor polynomial parameterization. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’21, pp. 305–313, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383509. doi: 10.1145/3449639.3459277. URL https://doi.org/10.1145/3449639.3459277.
- AutoLoss-GMS: Searching generalized margin-based softmax loss function for person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4744–4753, June 2022.
- Identity mappings in deep residual networks. In European Conference on Computer Vision, pp. 630–645. Springer, 2016.
- Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012a.
- Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012b.
- Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012c.
- Population based augmentation: Efficient learning of augmentation policy schedules. In International conference on machine learning, pp. 2731–2741. PMLR, 2019.
- M. G. Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81–93, 1938. ISSN 00063444. URL http://www.jstor.org/stable/2332226.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- 3d object representations for fine-grained categorization. In 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13), Sydney, Australia, 2013.
- Learning multiple layers of features from tiny images. Technical report, University of Toronto, Toronto, Ontario, 2009.
- Autoloss-zero: Searching loss functions from scratch for generic tasks. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 999–1008, 2022. doi: 10.1109/CVPR52688.2022.00108.
- Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pp. 2980–2988, 2017.
- Loss function discovery for object detection via convergence-simulation driven search. arXiv preprint arXiv:2102.04700, 2021.
- A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11976–11986, 2022.
- Neural loss function evolution for large-scale image classifier convolutional neural networks. arXiv preprint arXiv, 2024.
- When does label smoothing help? Advances in neural information processing systems, 32, 2019.
- M-E. Nilsback and A. Zisserman. Automated flower classification over a large number of classes. In Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing, Dec 2008.
- Learning symbolic model-agnostic loss functions via meta-learning. arXiv preprint arXiv:2209.08907, 2022.
- Regularized evolution for image classifier architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp. 4780–4789, 2019.
- Leslie N Smith. A disciplined approach to neural network hyper-parameters: Part 1–learning rate, batch size, momentum, and weight decay. arXiv preprint arXiv:1803.09820, 2018.
- Efficientnetv2: Smaller models and faster training. In International Conference on Machine Learning, pp. 10096–10106. PMLR, 2021.
- Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
- Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.