Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Novel Paradigm for Neural Computation: X-Net with Learnable Neurons and Adaptable Structure (2401.01772v2)

Published 3 Jan 2024 in cs.AI and cs.NI

Abstract: Multilayer perception (MLP) has permeated various disciplinary domains, ranging from bioinformatics to financial analytics, where their application has become an indispensable facet of contemporary scientific research endeavors. However, MLP has obvious drawbacks. 1), The type of activation function is single and relatively fixed, which leads to poor `representation ability' of the network, and it is often to solve simple problems with complex networks; 2), the network structure is not adaptive, it is easy to cause network structure redundant or insufficient. In this work, we propose a novel neural network paradigm X-Net promising to replace MLPs. X-Net can dynamically learn activation functions individually based on derivative information during training to improve the network's representational ability for specific tasks. At the same time, X-Net can precisely adjust the network structure at the neuron level to accommodate tasks of varying complexity and reduce computational costs. We show that X-Net outperforms MLPs in terms of representational capability. X-Net can achieve comparable or even better performance than MLP with much smaller parameters on regression and classification tasks. Specifically, in terms of the number of parameters, X-Net is only 3% of MLP on average and only 1.1% under some tasks. We also demonstrate X-Net's ability to perform scientific discovery on data from various disciplines such as energy, environment, and aerospace, where X-Net is shown to help scientists discover new laws of mathematics or physics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. An overview of evolutionary algorithms for parameter optimization. Evolutionary computation, 1(1):1–23, 1993.
  2. Peter I Frazier. A tutorial on bayesian optimization. arXiv preprint arXiv:1807.02811, 2018.
  3. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016.
  4. Large-scale evolution of image classifiers. In International conference on machine learning, pages 2902–2911. PMLR, 2017.
  5. A genetic programming approach to designing convolutional neural network architectures. In Proceedings of the genetic and evolutionary computation conference, pages 497–504, 2017.
  6. Dynamic optimization of neural network structures using probabilistic modeling. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
  7. Deep kronecker neural networks: A general framework for neural networks with adaptive activation functions. Neurocomputing, 468:165–180, 2022.
  8. Learning activation functions to improve deep neural networks. arXiv preprint arXiv:1412.6830, 2014.
  9. Tests for spatial dependence and heterogeneity in spatially autoregressive varying coefficient models with application to boston house price analysis. Regional Science and Urban Economics, 79:103470, 2019.
  10. Airfoil Self-Noise. UCI Machine Learning Repository, 2014. DOI: https://doi.org/10.24432/C5VW2C.
  11. National contributions to climate change due to historical emissions of carbon dioxide, methane, and nitrous oxide since 1850. Scientific Data, 10(1):155, 2023.
  12. Solar and wind power data from the chinese state grid renewable energy generation forecasting competition. Scientific Data, 9(1):577, 2022.
  13. A test of goodness of fit. Journal of the American statistical association, 49(268):765–769, 1954.
  14. Advantages of the mean absolute error (mae) over the root mean square error (rmse) in assessing average model performance. Climate research, 30(1):79–82, 2005.
  15. Vasily E Tarasov. On chain rule for fractional derivatives. Communications in Nonlinear Science and Numerical Simulation, 30(1-3):1–4, 2016.
  16. Léon Bottou. Stochastic gradient descent tricks. In Neural networks: Tricks of the trade, pages 421–436. Springer, 2012.
  17. Information-theoretic generalization bounds for stochastic gradient descent. In Conference on Learning Theory, pages 3526–3545. PMLR, 2021.
  18. Karim Abbas. Multiplications to multipliers: Performing arithmetic in hardware. In From Algorithms to Hardware Architectures, pages 49–85. Springer, 2023.
  19. Adaptively truncating backpropagation through time to control gradient bias. In Uncertainty in Artificial Intelligence, pages 799–808. PMLR, 2020.
  20. Preventing gradient explosions in gated recurrent units. Advances in neural information processing systems, 30, 2017.
  21. An improved ant colony optimization algorithm based on hybrid strategies for scheduling problem. IEEE access, 7:20281–20292, 2019.
  22. Learning step size controllers for robust neural network training. In Thirtieth AAAI Conference on Artificial Intelligence, 2016.
  23. Learned step size quantization. arXiv preprint arXiv:1902.08153, 2019.
  24. On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265, 2019.
Citations (1)

Summary

We haven't generated a summary for this paper yet.