Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
120 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
51 tokens/sec
2000 character limit reached

Adopting Robustness and Optimality in Fitting and Learning (1510.03826v4)

Published 13 Oct 2015 in cs.LG, cs.NE, and math.OC

Abstract: We generalized a modified exponentialized estimator by pushing the robust-optimal (RO) index $\lambda$ to $-\infty$ for achieving robustness to outliers by optimizing a quasi-Minimin function. The robustness is realized and controlled adaptively by the RO index without any predefined threshold. Optimality is guaranteed by expansion of the convexity region in the Hessian matrix to largely avoid local optima. Detailed quantitative analysis on both robustness and optimality are provided. The results of proposed experiments on fitting tasks for three noisy non-convex functions and the digits recognition task on the MNIST dataset consolidate the conclusions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Outlier detection for high dimensional data. In ACM Sigmod Record, volume 30, pages 37–46. ACM.
  2. Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop.
  3. Generalized denoising auto-encoders as generative models. In Advances in Neural Information Processing Systems, pages 899–907.
  4. A robust backpropagation learning algorithm for function approximation. Neural Networks, IEEE Transactions on, 5(3):467–479.
  5. Robust logistic regression and classification. In Advances in Neural Information Processing Systems, pages 253–261.
  6. Learning deep sigmoid belief networks with data augmentation. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, pages 268–276.
  7. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672–2680.
  8. A pairwise algorithm for training multilayer perceptrons with the normalized risk-averting error criterion. In Neural Networks (IJCNN), 2014 International Joint Conference on, pages 358–365. IEEE.
  9. Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366.
  10. Jacobson, D. H. (1973). Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games. Automatic Control, IEEE Transactions on, 18(2):124–131.
  11. Spatial transformer networks. arXiv preprint arXiv:1506.02025.
  12. What is the best multi-stage architecture for object recognition? In Computer Vision, 2009 IEEE 12th International Conference on, pages 2146–2153. IEEE.
  13. Deep visual-semantic alignments for generating image descriptions. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  14. Angle-based outlier detection in high-dimensional data. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 444–452. ACM.
  15. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105.
  16. Conditionally unbiased bounded-influence estimation in general regression models, with applications to generalized linear models. Journal of the American Statistical Association, 84(406):460–466.
  17. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324.
  18. Deeply-supervised nets.
  19. Liano, K. (1996). Robust error measure for supervised neural network learning with outliers. Neural Networks, IEEE Transactions on, 7(1):246–250.
  20. Lo, J. T.-H. (2010). Convexification for data fitting. Journal of global optimization, 46(2):307–315.
  21. Overcoming the local-minimum problem in training multilayer perceptrons with the nrae training method. In Advances in Neural Networks–ISNN 2012, pages 440–447. Springer.
  22. Overcoming the local-minimum problem in training multilayer perceptrons by gradual deconvexification. In Neural Networks (IJCNN), The 2013 International Joint Conference on, pages 1–6. IEEE.
  23. Convolutional kernel networks. In Advances in Neural Information Processing Systems, pages 2627–2635.
  24. Pregibon, D. (1982). Resistant fits for some commonly used logistic models with medical applications. Biometrics, pages 485–498.
  25. Training deep neural networks on noisy labels with bootstrapping. arXiv preprint arXiv:1412.6596.
  26. Rousseeuw, P. J. (1984). Least median of squares regression. Journal of the American statistical association, 79(388):871–880.
  27. Optimization for machine learning. Mit Press.
  28. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958.
  29. Optimally hounded score functions for generalized linear models with applications to logistic regression. Biometrika, 73(2):413–424.
  30. Going deeper with convolutions.
  31. Robust logistic regression using shift parameters. CoRR.
  32. Adaptive normalized risk-averting training for deep neural networks. arXiv preprint arXiv:1506.02690.
  33. Whittle, P. (1990). Risk-sensitive optimal control. John Wiley & Son Ltd.
  34. Stochastic pooling for regularization of deep convolutional neural networks. arXiv preprint arXiv:1301.3557.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.