Generalized Activation via Multivariate Projection (2309.17194v2)
Abstract: Activation functions are essential to introduce nonlinearity into neural networks, with the Rectified Linear Unit (ReLU) often favored for its simplicity and effectiveness. Motivated by the structural similarity between a shallow Feedforward Neural Network (FNN) and a single iteration of the Projected Gradient Descent (PGD) algorithm, a standard approach for solving constrained optimization problems, we consider ReLU as a projection from R onto the nonnegative half-line R+. Building on this interpretation, we extend ReLU by substituting it with a generalized projection operator onto a convex cone, such as the Second-Order Cone (SOC) projection, thereby naturally extending it to a Multivariate Projection Unit (MPU), an activation function with multiple inputs and multiple outputs. We further provide mathematical proof establishing that FNNs activated by SOC projections outperform those utilizing ReLU in terms of expressive power. Experimental evaluations on widely-adopted architectures further corroborate MPU's effectiveness against a broader range of existing activation functions.
- Abien Fred Agarap. Deep Learning using Rectified Linear Units (ReLU). arXiv preprint arXiv:1803.08375, 2018.
- Multi-valued threshold functions. Cybernetics, 7(4):626–635, July 1971. ISSN 1573-8337. doi: 10.1007/BF01071034.
- A Survey of Complex-Valued Neural Networks. arXiv preprint arXiv:2101.12249, 2021.
- Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning, 3(1):1–122, 2011.
- A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40:120–145, 5 2011. ISSN 09249907. doi: 10.1007/S10851-010-0251-1.
- Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). arXiv preprint arXiv:1511.07289, 2015.
- Lipschitz certificates for layered network structures driven by averaged activation operators. SIAM Journal on Mathematics of Data Science, 2:529–557, 3 2019. doi: 10.1137/19m1272780. URL https://arxiv.org/abs/1903.01014v4.
- Binaryconnect: Training deep neural networks with binary weights during propagations. Advances in neural information processing systems, 28, 2015.
- Language Modeling with Gated Convolutional Networks. arXiv preprint arXiv:1612.08083, 2016.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, 2009. doi: 10.1109/CVPR.2009.5206848.
- Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning. arXiv preprint arXiv:1702.03118, 2017.
- Maxout Networks. arXiv preprint arXiv:1302.4389, 2013.
- Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. arXiv preprint arXiv:1502.01852, 2015.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Gaussian Error Linear Units (GELUs). arXiv preprint arXiv:1606.08415, 2016.
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning, pp. 448–456. PMLR, June 2015.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Learning multiple layers of features from tiny images. 2009.
- Network in network. arXiv preprint arXiv:1312.4400, 2013.
- Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
- Rectifier nonlinearities improve neural network acoustic models. In Proc. Icml, volume 30, pp. 3. Atlanta, GA, 2013.
- Isaac gym: High performance gpu-based physics simulation for robot learning. arXiv preprint arXiv:2108.10470, 2021.
- MMPretrain-Github-repository. mmpretrain, 2023. URL https://github.com/kuangliu/pytorch-cifar.
- Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814, 2010.
- Embedding soft thresholding function into deep learning models for noisy radar emitter signal recognition. Electronics, 11(14), 2022. ISSN 2079-9292. doi: 10.3390/electronics11142142. URL https://www.mdpi.com/2079-9292/11/14/2142.
- Convolution in Convolution for Network in Network. IEEE Transactions on Neural Networks and Learning Systems, 29(5):1587–1597, May 2018. ISSN 2162-2388. doi: 10.1109/TNNLS.2017.2676130.
- Proximal algorithms. Found. Trends Optim., 1(3):127–239, jan 2014. ISSN 2167-3888. doi: 10.1561/2400000003. URL https://doi.org/10.1561/2400000003.
- Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc., 2019.
- proximity-operator website. The proximity operator repository, 2023. URL http://proximity-operator.net.
- Pytorch-CIFAR-Github-repository. pytorch-cifar, 2023. URL https://github.com/open-mmlab/mmpretrain.
- FReLU: Flexible Rectified Linear Units for Improving Convolutional Neural Networks. In 2018 24th International Conference on Pattern Recognition (ICPR), pp. 1223–1228, August 2018. doi: 10.1109/ICPR.2018.8546022.
- Searching for Activation Functions. arXiv preprint arXiv:1710.05941, 2017.
- Kafnets: Kernel-based non-parametric activation functions for neural networks. Neural Networks, 110:19–32, February 2019. ISSN 0893-6080. doi: 10.1016/j.neunet.2018.11.002.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Understanding and improving convolutional neural networks via concatenated rectified linear units. In international conference on machine learning, pp. 2217–2225. PMLR, 2016.
- Implicit Neural Representations with Periodic Activation Functions. arXiv preprint arXiv:2006.09661, 2020.
- Improving Deep Neural Networks with Probabilistic Maxout Units, February 2014.
- Compete to Compute. In Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013.
- Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pp. 10347–10357. PMLR, 2021.
- Parametric Exponential Linear Unit for Deep Convolutional Neural Networks. arXiv preprint arXiv:1605.09332, 2016.
- Better than real: Complex-valued neural nets for MRI fingerprinting. In 2017 IEEE International Conference on Image Processing (ICIP), pp. 3953–3957, September 2017. doi: 10.1109/ICIP.2017.8297024.
- Enhancing adversarial defense by k-winners-take-all. International Conference on Learning Representations. URL https://par.nsf.gov/biblio/10347305.