Your Network May Need to Be Rewritten: Network Adversarial Based on High-Dimensional Function Graph Decomposition (2405.03712v1)
Abstract: In the past, research on a single low dimensional activation function in networks has led to internal covariate shift and gradient deviation problems. A relatively small research area is how to use function combinations to provide property completion for a single activation function application. We propose a network adversarial method to address the aforementioned challenges. This is the first method to use different activation functions in a network. Based on the existing activation functions in the current network, an adversarial function with opposite derivative image properties is constructed, and the two are alternately used as activation functions for different network layers. For complex situations, we propose a method of high-dimensional function graph decomposition(HD-FGD), which divides it into different parts and then passes through a linear layer. After integrating the inverse of the partial derivatives of each decomposed term, we obtain its adversarial function by referring to the computational rules of the decomposition process. The use of network adversarial methods or the use of HD-FGD alone can effectively replace the traditional MLP+activation function mode. Through the above methods, we have achieved a substantial improvement over standard activation functions regarding both training efficiency and predictive accuracy. The article addresses the adversarial issues associated with several prevalent activation functions, presenting alternatives that can be seamlessly integrated into existing models without any adverse effects. We will release the code as open source after the conference review process is completed.
- Learning activation functions to improve deep neural networks. arXiv preprint arXiv:1412.6830, 2014.
- Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- L* relu: piece-wise linear activation functions for deep fine-grained visual categorization. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1218–1227, 2020.
- Clip: Cheap lipschitz training of neural networks. In International Conference on Scale Space and Variational Methods in Computer Vision, pages 307–319. Springer, 2021.
- Fast and accurate deep network learning by exponential linear units (elus). arxiv 2015. arXiv preprint arXiv:1511.07289, 2020.
- P Kingma Diederik. Adam: A method for stochastic optimization. (No Title), 2014.
- Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomputing, 503:92–108, 2022.
- Is it time to swish? comparing deep learning activation functions across nlp tasks. arXiv preprint arXiv:1901.02671, 2019.
- Fast image restoration with multi-bin trainable linear units. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4190–4199, 2019.
- Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015.
- Identity mappings in deep residual networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pages 630–645. Springer, 2016.
- Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. pmlr, 2015.
- Delve into activations: Towards understanding dying neuron. IEEE Transactions on Artificial Intelligence, 2022.
- Self-normalizing neural networks. Advances in neural information processing systems, 30, 2017.
- Adaptive batch normalization for practical domain adaptation. Pattern Recognition, 80:109–117, 2018.
- Natural-logarithm-rectified activation function in convolutional neural networks. In 2019 IEEE 5th International Conference on Computer and Communications (ICCC), pages 2000–2008. IEEE, 2019.
- Adaptive gradient methods with dynamic bound of learning rate. arXiv preprint arXiv:1902.09843, 2019.
- On the difficulty of training recurrent neural networks. In International conference on machine learning, pages 1310–1318. Pmlr, 2013.
- Searching for activation functions. arXiv preprint arXiv:1710.05941, 2017.
- An analysis of convolutional neural networks for image classification. Procedia computer science, 132:377–384, 2018.
- Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1874–1883, 2016.
- Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
- On the importance of initialization and momentum in deep learning. In International conference on machine learning, pages 1139–1147. PMLR, 2013.
- Parametric exponential linear unit for deep convolutional neural networks. In 2017 16th IEEE international conference on machine learning and applications (ICMLA), pages 207–214. IEEE, 2017.
- Hyperactivations for activation function exploration. In 31st Conference on Neural Information Processing Systems (NIPS 2017), Workshop on Meta-learning. Long Beach, USA, 2017.
- Group normalization. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018.
- Adabelief optimizer: Adapting stepsizes by the belief in observed gradients. Advances in neural information processing systems, 33:18795–18806, 2020.
- Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016.