Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Investigating the Histogram Loss in Regression (2402.13425v2)

Published 20 Feb 2024 in cs.LG, cs.AI, and stat.ML

Abstract: It is becoming increasingly common in regression to train neural networks that model the entire distribution even if only the mean is required for prediction. This additional modeling often comes with performance gain and the reasons behind the improvement are not fully known. This paper investigates a recent approach to regression, the Histogram Loss, which involves learning the conditional distribution of the target variable by minimizing the cross-entropy between a target distribution and a flexible histogram prediction. We design theoretical and empirical analyses to determine why and when this performance gain appears, and how different components of the loss contribute to it. Our results suggest that the benefits of learning distributions in this setup come from improvements in optimization rather than modelling extra information. We then demonstrate the viability of the Histogram Loss in common deep learning applications without a need for costly hyperparameter tuning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. On the optimization of deep networks: Implicit acceleration by overparameterization. In International Conference on Machine Learning, 2018.
  2. Jonathan T Barron. A general and adaptive robust loss function. In IEEE Conference on Computer Vision and Pattern Recognition, 2019.
  3. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 2013.
  4. A distributional perspective on reinforcement learning. In International Conference on Machine Learning, 2017.
  5. Distributional reinforcement learning. 2023.
  6. The million song dataset. In Ismir, 2011.
  7. C Bishop. Pattern recognition and machine learning. 2006.
  8. Christopher M Bishop. Mixture density networks. Technical Report, 1994.
  9. Clément L Canonne. A short note on an inequality between kl and tv. arXiv, 2022.
  10. Two deterministic half-quadratic regularization algorithms for computed imaging. In International Conference on Image Processing, 1994.
  11. François Chollet et al. Keras, 2015.
  12. The loss surfaces of multilayer networks. In Artificial Intelligence and Statistics, 2015.
  13. Implicit quantile networks for distributional reinforcement learning. In International conference on machine learning, 2018a.
  14. Distributional reinforcement learning with quantile regression. In AAAI Conference on Artificial Intelligence, 2018b.
  15. Sharp minima can generalize for deep nets. In International Conference on Machine Learning, 2017.
  16. Event labeling combining ensemble detectors and background knowledge. Progress in Artificial Intelligence, 2014.
  17. Sharpness-aware minimization for efficiently improving generalization. In International Conference on Learning Representations, 2021.
  18. Chainerrl: A deep reinforcement learning library. Journal of Machine Learning Research, 2021.
  19. Deep label distribution learning with label ambiguity. IEEE Transactions on Image Processing, 2017.
  20. 2d image registration in ct images using radial image descriptors. Medical Image Computing and Computer-Assisted Intervention, 2011.
  21. Train faster, generalize better: Stability of stochastic gradient descent. In International Conference on Machine Learning, 2016.
  22. Rainbow: Combining improvements in deep reinforcement learning. In AAAI conference on artificial intelligence, 2018.
  23. Flat minima. Neural computation, 1997a.
  24. Long short-term memory. Neural computation, 1997b.
  25. Train longer, generalize better: closing the generalization gap in large batch training of neural networks. Advances in neural information processing systems, 2017.
  26. Peter J Huber. Robust statistics. In International Encyclopedia of Statistical Science. 2011.
  27. Improving regression performance with distributional losses. In International Conference on Machine Learning, 2018.
  28. Neural tangent kernel: Convergence and generalization in neural networks. Advances in Neural Information Processing Systems, 2018.
  29. Scalable real-time recurrent learning using columnar-constructive networks. Journal of Machine Learning Research, 2023.
  30. Fantastic generalization measures and where to find them. In International Conference on Learning Representations, 2020.
  31. On large-batch training for deep learning: Generalization gap and sharp minima. In International Conference on Learning Representations, 2017.
  32. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.
  33. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems, 2017.
  34. On the generalization of representations in reinforcement learning. In International Conference on Artificial Intelligence and Statistics, 2022.
  35. Bandit algorithms. Cambridge University Press, 2020.
  36. Efficient backprop. In Neural networks: Tricks of the trade. 1998.
  37. Visualizing the loss landscape of neural nets. In Advances in Neural Information Processing Systems, 2018.
  38. A comparative analysis of expected and distributional reinforcement learning. In AAAI Conference on Artificial Intelligence, 2019.
  39. On the effect of auxiliary tasks on representation dynamics. In International Conference on Artificial Intelligence and Statistics, 2021.
  40. Reward augmented maximum likelihood for neural structured prediction. In Advances In Neural Information Processing Systems, 2016.
  41. Sensitivity and generalization in neural networks: an empirical study. In International Conference on Learning Representations, 2018.
  42. Pmlb: a large benchmark suite for machine learning evaluation and comparison. BioData Mining, 2017.
  43. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 2011.
  44. Categorical depth distribution network for monocular 3d object detection. In Conference on Computer Vision and Pattern Recognition, 2021.
  45. Contractive auto-encoders: Explicit invariance during feature extraction. In International Conference on Machine Learning, 2011.
  46. Dex: Deep expectation of apparent age from a single image. In IEEE International Conference on Computer Vision Workshops, 2015.
  47. An analysis of categorical distributional reinforcement learning. In International Conference on Artificial Intelligence and Statistics, 2018.
  48. Robust large margin deep neural networks. IEEE Transactions on Signal Processing, 2017.
  49. Theoretical insights into the optimization landscape of over-parameterized shallow neural networks. IEEE Transactions on Information Theory, 2018.
  50. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 2014.
  51. Attention is all you need. Advances in neural information processing systems, 2017.
  52. Are transformers effective for time series forecasting? 2023.
  53. Tong Zhang. Mathematical analysis of machine learning algorithms. Cambridge University Press, 2023.
  54. Informer: Beyond efficient transformer for long sequence time-series forecasting. In AAAI Conference on Artificial Intelligence, 2021.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com