Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multivariate Trend Filtering for Lattice Data (2112.14758v2)

Published 29 Dec 2021 in stat.ML, cs.LG, math.ST, and stat.TH

Abstract: We study a multivariate version of trend filtering, called Kronecker trend filtering or KTF, for the case in which the design points form a lattice in $d$ dimensions. KTF is a natural extension of univariate trend filtering (Steidl et al., 2006; Kim et al., 2009; Tibshirani, 2014), and is defined by minimizing a penalized least squares problem whose penalty term sums the absolute (higher-order) differences of the parameter to be estimated along each of the coordinate directions. The corresponding penalty operator can be written in terms of Kronecker products of univariate trend filtering penalty operators, hence the name Kronecker trend filtering. Equivalently, one can view KTF in terms of an $\ell_1$-penalized basis regression problem where the basis functions are tensor products of falling factorial functions, a piecewise polynomial (discrete spline) basis that underlies univariate trend filtering. This paper is a unification and extension of the results in Sadhanala et al. (2016, 2017). We develop a complete set of theoretical results that describe the behavior of $k{\mathrm{th}}$ order Kronecker trend filtering in $d$ dimensions, for every $k \geq 0$ and $d \geq 1$. This reveals a number of interesting phenomena, including the dominance of KTF over linear smoothers in estimating heterogeneously smooth functions, and a phase transition at $d=2(k+1)$, a boundary past which (on the high dimension-to-smoothness side) linear smoothers fail to be consistent entirely. We also leverage recent results on discrete splines from Tibshirani (2020), in particular, discrete spline interpolation results that enable us to extend the KTF estimate to any off-lattice location in constant-time (independent of the size of the lattice $n$).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (94)
  1. A TV based restoration model with local constraints. Journal of Scientific Computing, 34(3):209–236, 2008.
  2. Modular proximal optimization for multidimensional total-variation regularization. Journal of Machine Learning Research, 19(56):1–82, 2018.
  3. Fused density estimation: Theory and methods. Journal of Royal Statistical Society: Series B, 81(5):839–860, 2019.
  4. Aurélien F. Bibaut and Mark J. van der Laan. Fast rates for empirical risk minimization over càdlàg functions with bounded sectional variation norm. arXiv: 1907.09244, 2019.
  5. Gaussian model selection. Journal of the European Mathematical Society, 3(3):203–268, 2001.
  6. Distributed optimization and statistical learning via the alternative direction method of multipliers. Foundations and Trends in Machine Learning, 3(1):1–122, 2011.
  7. New multiscale transforms, minimum total variation synthesis: Applications to edge-preserving image reconstruction. Signal Processing, 82(11):1519–1543, 2002.
  8. Antonin Chambolle. An algorithm for total variation minimization and applications. Journal of Mathematical Imaging and Vision, 20(1):89–97, 2004.
  9. Antonin Chambolle. Total variation minimization and a class of binary MRF models. In Energy Minimization Methods in Computer Vision and Pattern Recognition, pages 136–152. Springer, 2005.
  10. Image recovery via total variation minimization and related problems. Numerische Mathematik, 76(2):167–188, 1997.
  11. High-order total variation-based image restoration. SIAM Journal on Scientific Computing, 22(2):503–516, 2000.
  12. Aspects of total variation regularized L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT function approximation. SIAM Journal on Applied Mathematics, 65(5):1817–1837, 2005.
  13. Adaptive estimation of multivariate piecewise polynomials and bounded variation functions by optimal decision trees. Annals of Statistics, 49(5):2531–2551, 2021a.
  14. New risk bounds for 2d total variation denoising. IEEE Transactions on Information Theory, 67(6):4060–4091, 2021b.
  15. An Introduction to Wavelets. Academic Press, 1992a.
  16. Compactly supported box-spline wavelets. Approximation Theory and its Applications, 8(3):77–100, 1992b.
  17. On the prediction performance of the Lasso. Bernoulli, 23(1):552–581, 2017.
  18. Ingrid Daubechies. Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, 1992.
  19. Frame-constrained total variation regularization for white noise regression. Annals of Statistics, 49(3), 2021.
  20. Constructive Approximation. Springer, 1993.
  21. Wavelets. Acta Numerica, 1:1–56, 1992.
  22. Hyperbolic wavelet approximation. Constructive Approximation, 14(1):1–26, 1998.
  23. Automated regularization parameter selection in multi-scale total variation models for image restoration. Journal of Mathematical Imaging and Vision, 40(1):82–104, 2011.
  24. David L. Donoho. CART and best-ortho-basis: a connection. Annals of Statistics, 25(5):1870–1911, 1997.
  25. Minimax estimation via wavelet shrinkage. Annals of Statistics, 26(8):879–921, 1998.
  26. On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming, 55(1):293–318, 1992.
  27. Bradley Efron. How biased is the apparent error rate of a prediction rule? Journal of the American Statistical Association, 81(394):461–470, 1986.
  28. Measure Theory and Fine Properties of Functions. CRC Press, 2015. Revised edition.
  29. Multivariate extensions of isotonic regression and total variation denoising via entire monotonicity and Hardy-Krause variation. Annals of Statistics, 49(2):769–792, 2021.
  30. Multiscale wavelets on trees, graphs and high dimensional data: theory and applications to semi supervised learning. In Proceedings of the Annual Conference on Learning Theory, 2010.
  31. Construction of tight frames on graphs and application to denoising. In Handbook of Big Data Analytics, pages 503–522. Springer, 2018.
  32. Universal pointwise selection rule in multivariate function estimation. Bernoulli, 14(4):1150–1190, 2008.
  33. Structural adaptation via Lpsubscript𝐿𝑝L_{p}italic_L start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-norm oracle inequalities. Probability Theory and Related Fields, 143(1–2):41–71, 2009.
  34. Bandwidth selection in kernel density estimation: Oracle inequalities and adaptive minimax optimality. Annals of Statistics, 39(3):1608–1632, 2011.
  35. General selection rule from a family of linear estimators. Theory of Probability & Its Applications, 57(2):209–226, 2013.
  36. On adaptive minimax density estimation on Rdsuperscript𝑅𝑑R^{d}italic_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Probability Theory and Related Fields, 159(3):479–543, 2014.
  37. Adaptive risk bounds in univariate total variation denoising and trend filtering. Annals of Statistics, 48(1):205–209, 2020.
  38. Generalized Additive Models. Chapman & Hall, 1990.
  39. Optimal rates for total variation denoising. In Proceedings of the Annual Conference on Learning Theory, 2016.
  40. Nicholas Johnson. A dynamic programming algorithm for the fused lasso and l0subscript𝑙0l_{0}italic_l start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT-segmentation. Journal of Computational and Graphical Statistics, 22(2):246–260, 2013.
  41. Iain M. Johnstone. Gaussian Estimation: Sequence and Wavelet Models. Cambridge University Press, 2015. Draft version.
  42. Nonlinear estimation in anisotropic multi-index denoising. Probability Theory and Related Fields, 121(2):137–170, 2001.
  43. Nonlinear estimation in anisotropic multi-index denoising. Sparse case. Theory of Probability & Its Applications, 52(1):58–77, 2008.
  44. MARS via LASSO. arXiv: 2111.11694, 2021.
  45. ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT trend filtering. SIAM Review, 51(2):339–360, 2009.
  46. Quantile smoothing splines. Biometrika, 81(4):673–680, 1994.
  47. Minimax Theory of Image Reconstructions. Springer, 2003.
  48. Oleg V. Lepski. Adaptive estimation over anisotropic functional classes via oracle approach. Annals of Statistics, 43(3):1178–1242, 2015.
  49. Optimal pointwise adaptive methods in nonparametric estimation. Annals of Statistics, 25(6):2512–2546, 1997.
  50. Optimal spatial adaptation to inhomogeneous smoothness: An approach based on kernel estimates with variable bandwidth selectors. Annals of Statistics, 25(3):929–947, 1997.
  51. Oleg V. Lepskii. On a problem of adaptive estimation in Gaussian white noise. Theory of Probability & Its Applications, 35(3):454–466, 1991.
  52. Oleg V. Lepskii. Asymptotically minimax adaptive estimation. I: Upper bounds. Optimally adaptive estimates. Theory of Probability & Its Applications, 36(4):682–697, 1992.
  53. Oleg V. Lepskii. Asymptotically minimax adaptive estimation. II. Schemes without optimal adaptation: Adaptive estimators. Theory of Probability & Its Applications, 37(3):433–448, 1993.
  54. A sharp error analysis for the fused lasso, with application to approximate changepoint screening. In Advances in Neural Information Processing Systems, 2017.
  55. Rudolph A. H. Lorentz and Wolodymyr R. Madych. Wavelets and generalized box splines. Applicable Analysis, 44(1–2):51–76, 1992.
  56. Stephane Mallat. A Wavelet Tour of Signal Processing. Academic Press, 2009. Third edition.
  57. Stephane G. Mallat. Multiresolution approximations and wavelet orthonormal bases of L2⁢(R)superscript𝐿2𝑅L^{2}(R)italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_R ). Transactions of the American Mathematical Society, 315(1):69–87, 1989a.
  58. Stephane G. Mallat. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7):674–693, 1989b.
  59. Enno Mammen. Nonparametric regression under qualitative smoothness assumptions. Annals of Statistics, 19(2):741–759, 1991.
  60. Enno Mammen and Sara van de Geer. Locally apadtive regression splines. Annals of Statistics, 25(1):387–413, 1997.
  61. Yves Meyer. Principe d’incertitude, bases Hilbertiennes et algebres d’operateurs. Séminaire Bourbaki, 145–146(662):209–223, 1987.
  62. Yves Meyer. Ondelettes et Opérateurs. Hermann, 1990.
  63. Progress in Wavelet Analysis and Applications. Atlantica Séguier Frontières, 1993.
  64. Signal processing by the nonparametric maximum likelihood. Problems on Information Transmission, 20(3):29–46, 1984.
  65. Rate of convergence of nonparametric estimates of maximum-likelihood type. Problems on Information Transmission, 21(4):17–33, 1985.
  66. Discrete (Legendre) orthogonal polynomials—a survey. International Journal for Numerical Methods in Engineering, 8(4):743–770, 1974.
  67. Michael H. Neumann. Multivariate wavelet thresholding in anisotropic function spaces. Statistica Sinica, 10(2):399–431, 2000.
  68. Michael H. Neumann and Rainer von Sachs. Wavelet thresholding in anisotropic function classes and application to adaptive estimation of evolutionary spectra. Annals of Statistics, 25(1):38–76, 1997.
  69. Francesco Ortelli and Sara van de Geer. Prediction bounds for higher order total variation regularized least squares. Annals of Statistics, 49(5), 2021a.
  70. Francesco Ortelli and Sara van de Geer. Tensor denoising with trend filtering. arXiv: 2101.10692, 2021b.
  71. Oscar Hernan Madrid Padilla and James G. Scott. Nonparametric density estimation by histogram trend filtering. arXiv: 1509.04348, 2016.
  72. The DFS fused lasso: Linear-time denoising over general graphs. Journal of Machine Learning Research, 18:176–1, 2018.
  73. Adaptive non-parametric regression with the k-nn fused lasso. Biometrika, 107(2):293–310, 2020.
  74. Fast and flexible ADMM algorithms for trend filtering. Journal of Computational and Graphical Statistics, 25(3):839–858, 2016.
  75. Wavelets and pre-wavelets in low dimensions. Journal of Approximation Theory, 71(1):18–38, 1992.
  76. Total variation based image restoration with free local constraints. In Proceedings of the International Conference on Image Processing, pages 31–35, 1994.
  77. Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena, 60(1):259–268, 1992.
  78. Veeranjaneyulu Sadhanala. Nonparametric Methods with Total Variation Type Regularization. PhD thesis, Machine Learning Department, Carnegie Mellon University, 2019.
  79. Additive models via trend filtering. Annals of Statistics, 47(6):3032–3068, 2019.
  80. Total variation classes beyond 1d: Minimax rates, and the limitations of linear smoothers. In Advances in Neural Information Processing Systems, 2016.
  81. Higher-total variation classes on grids: Minimax theory and trend filtering methods. In Advances in Neural Information Processing Systems, 2017.
  82. Detecting activations over graphs using spanning tree wavelet bases. In Proceedings of the International Conference on Artificial Intelligence and Statistics, 2013.
  83. Splines in higher order TV regularization. International Journal of Computer Vision, 70(3):214–255, 2006.
  84. Charles Stein. Estimation of the mean of a multivariate normal distribution. Annals of Statistics, 9(6):1135–1151, 1981.
  85. Ryan J. Tibshirani. Adaptive piecewise polynomial estimation via trend filtering. Annals of Statistics, 42(1):285–323, 2014.
  86. Ryan J. Tibshirani. Dykstra’s algorithm, ADMM, and coordinate descent: Connections, insights, and extensions. In Advances in Neural Information Processing Systems, 2017.
  87. Ryan J. Tibshirani. Divided differences, falling factorials, and discrete splines: Another look at trend filtering and related problems. arXiv: 2003.03886, 2020.
  88. The solution path of the generalized lasso. Annals of Statistics, 39(3):1335–1371, 2011.
  89. Degrees of freedom in lasso problems. Annals of Statistics, 40(2):1198–1232, 2012.
  90. Alexandre B. Tsybakov. Introduction to Nonparametric Estimation. Springer, 2009.
  91. Iterative methods for total variation denoising. SIAM Journal on Scientific Computing, 17(1):227–238, 1996.
  92. The falling factorial basis and its statistical applications. In Proceedings of the International Conference on Machine Learning, 2014.
  93. Trend filtering on graphs. Journal of Machine Learning Research, 17(105):1–41, 2016.
  94. Steven Siwei Ye and Oscar Hernan Madrid Padilla. Non-parametric quantile regression via the k-nn fused lasso. Journal of Machine Learning Research, 22(111):1–38, 2021.
Citations (8)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com