How Inverse Conditional Flows Can Serve as a Substitute for Distributional Regression (2405.05429v3)
Abstract: Neural network representations of simple models, such as linear regression, are being studied increasingly to better understand the underlying principles of deep learning algorithms. However, neural representations of distributional regression models, such as the Cox model, have received little attention so far. We close this gap by proposing a framework for distributional regression using inverse flow transformations (DRIFT), which includes neural representations of the aforementioned models. We empirically demonstrate that the neural representations of models in DRIFT can serve as a substitute for their classical statistical counterparts in several applications involving continuous, ordered, time-series, and survival outcomes. We confirm that models in DRIFT empirically match the performance of several statistical methods in terms of estimation of partial effects, prediction, and aleatoric uncertainty quantification. DRIFT covers both interpretable statistical models and flexible neural networks opening up new avenues in both statistical modeling and deep learning.
- Neural additive models: Interpretable machine learning with neural nets. Advances in Neural Information Processing Systems, 34:4699–4711, 2021.
- Individual survival curves with conditional normalizing flows. In 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA), pages 1–10, 2021.
- Residuals for Relative Risk Regression. Biometrika, 75(1):65–74, 1988. 10.1093/biomet/75.1.65.
- Deep conditional transformation models. In Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2021, Bilbao, Spain, September 13–17, 2021, Proceedings, Part III 21, pages 3–18. Springer, 2021.
- A. Bender and F. Scheipl. Pammtools: Piece-wise exponential additive mixed modeling tools. arXiv preprint arXiv:1806.01042, 2018. 10.48550/arXiv.1806.01042.
- A generalized additive model approach to time-to-event analysis. Statistical Modelling, 18(3-4):299–321, 2018. 10.1177/1471082x17748083.
- C. M. Bishop. Mixture Density Networks. Aston University, 1994.
- An analysis of transformations. Journal of the Royal Statistical Society B, 26(2):211–243, 1964. 10.1111/j.2517-6161.1964.tb00553.x.
- L. Breiman. Random forests. Machine Learning, 45:5–32, 2001. 10.1023/a:1010933404324.
- Node-gam: Neural generalized additive model for interpretable deep learning. In International Conference on Learning Representations, 2023.
- Residual flows for invertible generative modeling. Advances in Neural Information Processing Systems, 32, 2019.
- Analysis of Transformation Models with Censored Data. Biometrika, 82(4):835–845, 1995. 10.1093/biomet/82.4.835.
- Embedding domain knowledge for machine learning of complex material systems. MRS Communications, 9(3):806–820, 2019. 10.1557/mrc.2019.90.
- A. Ciampi and Y. Lechevallier. Designing neural networks from statistical models: A new approach to data exploration. In KDD, pages 45–50, 1995.
- A. Ciampi and Y. Lechevallier. Statistical models as building blocks of neural networks. Communications in Statistics-Theory and Methods, 26(4):991–1009, 1997. 10.1080/03610929708831963.
- Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, 47(4):547–553, 2009. 10.1016/j.dss.2009.05.016.
- D. R. Cox. Regression Models and Life-Tables. Journal of the Royal Statistical Society B, 34(2):187–202, 1972. 10.1111/j.2517-6161.1972.tb00899.x.
- D. A. de Waal and J. V. du Toit. Generalized additive models from a neural network perspective. In Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007), pages 265–270. IEEE, 2007.
- Density estimation using real nvp. arXiv Preprint arXiv:1605.08803, 2016. 10.48550/arXiv.1605.08803.
- Frequentist uncertainty quantification in semi-structured neural networks. In F. Ruiz, J. Dy, and J.-W. van de Meent, editors, Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 of Proceedings of Machine Learning Research, pages 1924–1941. PMLR, 2023.
- D. Dua and C. Graff. UCI machine learning repository, 2017.
- Y. Gal and Z. Ghahramani. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning, pages 1050–1059, 2016.
- X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 249–256. JMLR Workshop and Conference Proceedings, 2010.
- T. Gneiting and A. E. Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477):359–378, 2007. 10.1198/016214506000001437.
- C. Gu. Smoothing Spline ANOVA Models, volume 297. Springer-Verlag, 2013. 10.1007/978-1-4614-5369-7.
- T. Hastie and R. Tibshirani. Generalized additive models. Statistal Science, 1(4):297–310, 1986. 10.1201/9780203753781.
- How Implicit Regularization of ReLU Neural Networks Characterizes the Learned Function–Part I: The 1-D Case of Two Layers with Random First Layer. arXiv Preprint arXiv:1911.02903, 2019. 10.48550/arXiv.1911.02903.
- Deep transformation models for functional outcome prediction after acute ischemic stroke. Biometrical Journal, 65(6):2100379, 2023. 10.1002/bimj.202100379.
- Multilayer feedforward networks are universal approximators. Neural Networks, 2(5):359–366, 1989. 10.1016/0893-6080(89)90020-8.
- Conditional transformation models. Journal of the Royal Statistical Society B, pages 3–27, 2014. 10.1111/rssb.12017.
- Most Likely Transformations. Scandinavian Journal of Statistics, 45(1):110–134, 2018. 10.1111/sjos.12291.
- tram: Transformation Models, 2022. R package version 0.7-2.
- Neural autoregressive flows. In International Conference on Machine Learning, pages 2078–2087. PMLR, 2018.
- P. J. Huber. Robust statistics. In International Encyclopedia of Statistical Science, pages 1248–1251. Springer-Verlag, 2011.
- forecast: Forecasting Functions for Time Series and Linear Models, 2023. R package version 8.20.
- E. L. Kaplan and P. Meier. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53(282):457–481, 1958. 10.1080/01621459.1958.10501452.
- Improved variational inference with inverse autoregressive flow. Advances in Neural Information Processing Systems, 29, 2016.
- Bayesian Generalized Additive Models for Location, Scale, and Shape for Zero-Inflated and Overdispersed Count Data. Journal of the American Statistical Association, 110(509):405–419, 2015. ISSN 0162-1459, 1537-274X. 10.1080/01621459.2014.912955.
- Rage Against the Mean – A Review of Distributional Regression Approaches. Econometrics and Statistics, 26:99–123, 2023. 10.1016/j.ecosta.2021.07.006.
- Estimating conditional distributions with neural networks using R package deeptrafo. Journal of Statistical Software, 2022a. 10.48550/arXiv.2211.13665. Forthcoming.
- Deep and interpretable regression models for ordinal outcomes. Pattern Recognition, 122:108263, 2022b. 10.1016/j.patcog.2021.108263.
- Deeppamm: Deep piecewise exponential additive mixed models for complex hazard structures in survival analysis. In Advances in Knowledge Discovery and Data Mining: 26th Pacific-Asia Conference, PAKDD 2022, Chengdu, China, May 16–19, 2022, Proceedings, Part II, pages 249–261. Springer, 2022.
- Efficient sieve maximum likelihood estimation of time-transformation models. Journal of Statistical Theory and Practice, 7:285–303, 2013. 10.1080/15598608.2013.772835.
- Generalized linear models. Journal of the Royal Statistical Society A, 135(3):370–384, 1972. 10.2307/2344614.
- Normalizing flows for probabilistic modeling and inference. The Journal of Machine Learning Research, 22(1):2617–2680, 2021.
- Y. Pawitan. In All Likelihood: Statistical Modelling and Inference Using Likelihood. Oxford University Press, 2001.
- W. J. Potts. Generalized additive neural networks. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 194–200, 1999.
- Neural basis models for interpretability. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022.
- Robust normalizing flows using bernstein-type polynomials. arXiv Preprint arXiv:2102.03509, 2021. 10.48550/arXiv.2102.03509.
- D. Rezende and S. Mohamed. Variational inference with normalizing flows. In International Conference on Machine Learning, pages 1530–1538. PMLR, 2015.
- C. Rudin. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nature Machine Intelligence, 1(5):206–215, 2019. 10.1038/s42256-019-0048-x.
- D. Rügamer. A New PHO-rmula for Improved Performance of Semi-Structured Networks. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 29291–29305. PMLR, 2023.
- Probabilistic time series forecasts with autoregressive transformation models. Statistics and Computing, 33(2):37, 2023a. 10.1007/s11222-023-10212-8.
- Semi-structured distributional regression. The American Statistician, pages 1–12, 2023b. 10.1080/00031305.2022.2164054.
- Mixture of Experts Distributional Regression: Implementation Using Robust Estimation with Adaptive First-Order Methods. AStA Advances in Statistical Analysis, 2023c. 10.1007/s10182-023-00486-8.
- L. Schumaker. Spline Functions: Basic Theory. Cambridge university press, 2007.
- A comparison of arima and lstm in forecasting time series. In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 1394–1401, 2018.
- Deep transformation models: Tackling complex regression problems with neural network based transformation models. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 2476–2481. IEEE, 2021.
- Distribution-Free Location-Scale Regression. The American Statistician, pages 1–18, 2023. 10.1080/00031305.2023.2203177.
- Towards complementary explanations using deep neural networks. In MLCN/DLF/iMIMIC@MICCAI, 2018.
- Lifelong machine learning systems: Beyond learning algorithms. In 2013 AAAI Spring Symposium Series, 2013.
- R. Sonabend. Scoring rules in survival analysis. arXiv Preprint arXiv:2212.05260, 2022. 10.48550/arXiv.2212.05260.
- R. Stewart and S. Ermon. Label-free supervision of neural networks with physics and domain knowledge. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
- B. M. Taylor. Spatial modelling of emergency service response times. Journal of the Royal Statistical Society A, pages 433–453, 2017. 10.1111/rssa.12192.
- G. Tutz. Regression for Categorical Data, volume 34. Cambridge University Press, 2011.
- Micn: Multi-scale local and global context modeling for long-term series forecasting. In The Eleventh International Conference on Learning Representations, 2023.
- R. W. M. Wedderburn. Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika, 61(3):439, 1974. 10.2307/2334725.
- H. White. Consequences and detection of misspecified nonlinear regression models. Journal of the American Statistical Association, 76(374):419–433, 1981. 10.1080/01621459.1981.10477663.
- H. White. Maximum likelihood estimation of misspecified models. Econometrica, 50(1):1–25, 1982. 10.2307/1912526.
- Learning likelihoods with conditional normalizing flows. arXiv Preprint arXiv:1912.00042, 2019. 10.48550/arXiv.1912.00042.
- S. N. Wood. Generalized Additive Models: An Introduction with R. CRC press, 2017. 10.1201/9781315370279.
- Timesnet: Temporal 2d-variation modeling for general time series analysis. In The Eleventh International Conference on Learning Representations, 2023.
- C. Xu and S. A. Jackson. Machine learning and complex biological data, 2019.
- Gami-net: An explainable neural network based on generalized additive models with structured interactions. Pattern Recognition, 120:108192, 2021. 10.1016/j.patcog.2021.108192.
- K. Zhang and Y.-X. Wang. Deep Learning Meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive? arXiv Preprint arXiv:2204.09664, 2022. 10.48550/arXiv.2204.09664.