Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Max-Linear Regression by Convex Programming (2103.07020v2)

Published 12 Mar 2021 in stat.ML, cs.IT, cs.LG, math.IT, math.ST, and stat.TH

Abstract: We consider the multivariate max-linear regression problem where the model parameters $\boldsymbol{\beta}{1},\dotsc,\boldsymbol{\beta}{k}\in\mathbb{R}{p}$ need to be estimated from $n$ independent samples of the (noisy) observations $y = \max_{1\leq j \leq k} \boldsymbol{\beta}_{j}{\mathsf{T}} \boldsymbol{x} + \mathrm{noise}$. The max-linear model vastly generalizes the conventional linear model, and it can approximate any convex function to an arbitrary accuracy when the number of linear models $k$ is large enough. However, the inherent nonlinearity of the max-linear model renders the estimation of the regression parameters computationally challenging. Particularly, no estimator based on convex programming is known in the literature. We formulate and analyze a scalable convex program given by anchored regression (AR) as the estimator for the max-linear regression problem. Under the standard Gaussian observation setting, we present a non-asymptotic performance guarantee showing that the convex program recovers the parameters with high probability. When the $k$ linear components are equally likely to achieve the maximum, our result shows a sufficient number of noise-free observations for exact recovery scales as {$k{4}p$} up to a logarithmic factor. { This sample complexity coincides with that by alternating minimization (Ghosh et al., {2021}). Moreover, the same sample complexity applies when the observations are corrupted with arbitrary deterministic noise. We provide empirical results that show that our method performs as our theoretical result predicts, and is competitive with the alternating minimization algorithm particularly in presence of multiplicative Bernoulli noise. Furthermore, we also show empirically that a recursive application of AR can significantly improve the estimation accuracy.}

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. A. Ghosh, A. Pananjady, A. Guntuboyina, and K. Ramchandran, “Max-affine regression: Parameter estimation for gaussian designs,” IEEE Transactions on Information Theory, vol. 68, no. 3, pp. 1851–1885, 2021.
  2. S. Bahmani and J. Romberg, “Phase retrieval meets statistical learning theory: A flexible convex relaxation,” in Artificial Intelligence and Statistics, 2017, pp. 252–260.
  3. S. Bahmani, “Estimation from nonlinear observations via convex programming with application to bilinear regression,” Electronic Journal of Statistics, vol. 13, no. 1, pp. 1978–2011, 2019.
  4. S. Bahmani and J. Romberg, “Solving equations of random convex functions via anchored regression,” Foundations of Computational Mathematics, vol. 19, no. 4, pp. 813–841, 2019.
  5. T. Goldstein and C. Studer, “Phasemax: Convex phase retrieval via basis pursuit,” IEEE Transactions on Information Theory, vol. 64, no. 4, pp. 2675–2689, 2018.
  6. E. J. Candes, T. Strohmer, and V. Voroninski, “Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming,” Communications on Pure and Applied Mathematics, vol. 66, no. 8, pp. 1241–1274, 2013.
  7. I. Waldspurger, A. d’Aspremont, and S. Mallat, “Phase recovery, maxcut and complex semidefinite programming,” Mathematical Programming, vol. 149, no. 1-2, pp. 47–81, 2015.
  8. R. Balestriero and R. Baraniuk, “Mad max: Affine spline insights into deep learning,” arXiv preprint arXiv:1805.06576, 2018.
  9. R. Balestriero, R. Cosentino, B. Aazhang, and R. Baraniuk, “The geometry of deep networks: Power diagram subdivision,” in Advances in Neural Information Processing Systems, 2019, pp. 15 806–15 815.
  10. R. Balestriero, S. Paris, and R. Baraniuk, “Max-affine spline insights into deep generative networks,” arXiv preprint arXiv:2002.11912, 2020.
  11. A. Siahkamari, V. Saligrama, D. Castanon, and B. Kulis, “Learning Bregman divergences,” arXiv preprint arXiv:1905.11545, 2019.
  12. G. Balázs, “Convex regression: Theory, practice, and applications,” Ph.D. dissertation, University of Alberta, 2016.
  13. A. Magnani and S. P. Boyd, “Convex piecewise-linear fitting,” Optimization and Engineering, vol. 10, no. 1, pp. 1–17, 2009.
  14. A. Toriello and J. P. Vielma, “Fitting piecewise linear continuous functions,” European Journal of Operational Research, vol. 219, no. 1, pp. 86–95, 2012.
  15. L. A. Hannah and D. B. Dunson, “Multivariate convex regression with adaptive partitioning,” The Journal of Machine Learning Research, vol. 14, no. 1, pp. 3261–3294, 2013.
  16. V. T. Ho, H. A. Le Thi, and T. P. Dinh, “DCA with successive DC decomposition for convex piecewise-linear fitting,” in International Conference on Computer Science, Applied Mathematics and Applications.   Springer, 2019, pp. 39–51.
  17. P. D. Tao and L. T. H. An, “A DC optimization algorithm for solving the trust-region subproblem,” SIAM Journal on Optimization, vol. 8, no. 2, pp. 476–505, 1998.
  18. E. J. Candes and Y. Plan, “Matrix completion with noise,” Proceedings of the IEEE, vol. 98, no. 6, pp. 925–936, 2010.
  19. J. van den Brand, “A deterministic linear program solver in current matrix multiplication time,” in Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms.   SIAM, 2020, pp. 259–278.
  20. L. Gurobi Optimization, “Gurobi optimizer reference manual,” 2021. [Online]. Available: http://www.gurobi.com
  21. A. Ghosh, A. Pananjady, A. Guntuboyina, and K. Ramchandran, “Max-affine regression: Provable, tractable, and near-optimal statistical estimation,” arXiv preprint arXiv:1906.09255, 2019.
  22. I. Diakonikolas, J. H. Park, and C. Tzamos, “Relu regression with massart noise,” Advances in Neural Information Processing Systems, vol. 34, 2021.
  23. E. John and E. A. Yıldırım, “Implementation of warm-start strategies in interior-point methods for linear programming in fixed dimension,” Computational Optimization and Applications, vol. 41, no. 2, pp. 151–183, 2008.
  24. V. N. Vapnik and A. Y. Chervonenkis, “On the uniform convergence of relative frequencies of events to their probabilities,” in Measures of complexity.   Springer, 2015, pp. 11–30.
  25. Y. S. Tan and R. Vershynin, “Phase retrieval via randomized kaczmarz: theoretical guarantees,” Information and Inference: A Journal of the IMA, vol. 8, no. 1, pp. 97–123, 2019.
  26. S. Dirksen, “Tail bounds via generic chaining,” 2015.
  27. R. M. Dudley, “The sizes of compact subsets of Hilbert space and continuity of Gaussian processes,” Journal of Functional Analysis, vol. 1, no. 3, pp. 290–330, 1967.
  28. B. Carl, “Inequalities of Bernstein-Jackson-type and the degree of compactness of operators in Banach spaces,” in Annales de l’institut Fourier, vol. 35, no. 3, 1985, pp. 79–118.
  29. M. Junge and K. Lee, “Generalized notions of sparsity and restricted isometry property. Part I: A unified framework,” Information and Inference: A Journal of the IMA, vol. 9, no. 1, pp. 157–193, 2020.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets