A Statistical-Modelling Approach to Feedforward Neural Network Model Selection (2207.04248v5)
Abstract: Feedforward neural networks (FNNs) can be viewed as non-linear regression models, where covariates enter the model through a combination of weighted summations and non-linear functions. Although these models have some similarities to the approaches used within statistical modelling, the majority of neural network research has been conducted outside of the field of statistics. This has resulted in a lack of statistically-based methodology, and, in particular, there has been little emphasis on model parsimony. Determining the input layer structure is analogous to variable selection, while the structure for the hidden layer relates to model complexity. In practice, neural network model selection is often carried out by comparing models using out-of-sample performance. However, in contrast, the construction of an associated likelihood function opens the door to information-criteria-based variable and architecture selection. A novel model selection method, which performs both input- and hidden-node selection, is proposed using the Bayesian information criterion (BIC) for FNNs. The choice of BIC over out-of-sample performance as the model selection objective function leads to an increased probability of recovering the true model, while parsimoniously achieving favourable out-of-sample performance. Simulation studies are used to evaluate and justify the proposed method, and applications on real data are investigated.
- State-of-the-art in artificial neural network applications: A survey. Heliyon, 4(11):e00938.
- Akaike, H. (1998). Information Theory and an Extension of the Maximum Likelihood Principle, pages 199–213. Springer New York, New York, NY.
- Model selection and multi-model inference. Second. NY: Springer-Verlag.
- Bishop, C. M. et al. (1995). Neural networks for pattern recognition, chapter 9, pages 353–354. Oxford university press.
- Breiman, L. (2001). Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statistical Science, 16(3):199 – 231.
- A survey on feature selection methods. Computers & Electrical Engineering, 40(1):16–28.
- Neural networks: A review from a statistical perspective. Statistical Science, 9(1):2–30.
- Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2(4):303–314.
- Efron, B. (2020). Prediction, estimation, and attribution. International Statistical Review, 88(S1):S28–S59.
- Elder, J. F. (2003). The generalization paradox of ensembles. Journal of Computational and Graphical Statistics, 12(4):853–864.
- A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 20(1):101–148.
- On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 222(594-604):309–368.
- Goldberg, Y. (2016). A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57:345–420.
- Variable selection – a review and recommendations for the practicing statistician. Biometrical Journal, 60(3):431–449.
- Bridging breiman’s brook: From algorithmic modeling to statistical learning. Observational Studies, 7(1):107–125.
- Multilayer feedforward networks are universal approximators. Neural Networks, 2(5):359–366.
- Interpreting deep neural networks with the package innsight. arXiv preprint arXiv:2306.10822.
- Deep learning. Nature, 521(7553):436–444.
- selectnn: A Statistically-Based Approach to Neural Network Model Selection. R package version 0.0.0.9000.
- Miller, A. (2002). Subset selection in regression. chapman and hall/CRC.
- Network information criterion-determining the number of hidden units for an artificial neural network model. IEEE Transactions on Neural Networks, 5(6):865–872.
- Deep learning for anomaly detection: A review. ACM Computing Surveys (CSUR), 54(2):1–38.
- Design of experiments and focused grid search for neural network parameter optimization. Neurocomputing, 186:22–34.
- nnet: Feed-forward neural networks and multinomial log-linear models. R package version, 7.3-17.
- Ripley, B. D. (1993). Statistical aspects of neural networks. In Nielsen, B. O. E., Jensen, J. L., and Kendall, W. S., editors, Networks and Chaos: Statistical and Probabilistic Aspects, pages 40–123. Chapman & Hall.
- Ripley, B. D. (1994). Neural networks and related methods for classification. Journal of the Royal Statistical Society: Series B (Methodological), 56(3):409–437.
- Semi-structured deep distributional regression: Combining structured additive models and deep learning. arXiv preprint arXiv:2002.05777.
- Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2):461–464.
- Learning sparse deep neural networks with a spike-and-slab prior. Statistics & Probability Letters, 180:109246.
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288.
- Bayesian deep net glm and glmm. Journal of Computational and Graphical Statistics, 29(1):97–113.
- Modern Applied Statistics with S. Springer, New York, fourth edition. ISBN 0-387-95457-0.
- Deep learning for computer vision: A brief review. Computational Intelligence and Neuroscience, 2018.
- White, H. (1989). Learning in Artificial Neural Networks: A Statistical Perspective. Neural Computation, 1(4):425–464.
- Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. Journal of the American Statistical Association, 93(441):120–131.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.