Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On Sinkhorn's Algorithm and Choice Modeling (2310.00260v2)

Published 30 Sep 2023 in math.OC, cs.LG, and econ.EM

Abstract: For a broad class of models widely used in practice for choice and ranking data based on Luce's choice axiom, including the Bradley--Terry--Luce and Plackett--Luce models, we show that the associated maximum likelihood estimation problems are equivalent to a classic matrix balancing problem with target row and column sums. This perspective opens doors between two seemingly unrelated research areas, and allows us to unify existing algorithms in the choice modeling literature as special instances or analogs of Sinkhorn's celebrated algorithm for matrix balancing. We draw inspirations from these connections and resolve some open problems on the study of Sinkhorn's algorithm. We establish the global linear convergence of Sinkhorn's algorithm for non-negative matrices whenever finite scaling matrices exist, and characterize its linear convergence rate in terms of the algebraic connectivity of a weighted bipartite graph. We further derive the sharp asymptotic rate of linear convergence, which generalizes a classic result of Knight (2008). To our knowledge, these are the first quantitative linear convergence results for Sinkhorn's algorithm for general non-negative matrices and positive marginals. Our results highlight the importance of connectivity and orthogonality structures in matrix balancing and Sinkhorn's algorithm, which could be of independent interest. More broadly, the connections we establish in this paper between matrix balancing and choice modeling could also help motivate further transmission of ideas and lead to interesting results in both disciplines.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (90)
  1. Anderson JE, Van Wincoop E (2003) Gravity with gravitas: A solution to the border puzzle. American economic review 93(1):170–192.
  2. Bacharach M (1965) Estimating nonnegative matrices from marginal data. International Economic Review 6(3):294–310.
  3. Bacharach M (1970) Biproportional matrices and input-output change, volume 16 (CUP Archive).
  4. Balinski M, Pukelsheim F (2006) Matrices and politics .
  5. Batsell RR, Polking JC (1985) A new class of market share models. Marketing Science 4(3):177–198.
  6. Bauer FL (1963) Optimally scaled matrices. Numerische Mathematik 5(1):73–87.
  7. Beck A, Tetruashvili L (2013) On the convergence of block coordinate descent type methods. SIAM journal on Optimization 23(4):2037–2060.
  8. Berkson J (1944) Application of the logistic function to bio-assay. Journal of the American Statistical Association 39(227):357–365.
  9. Bertsekas DP (1997) Nonlinear programming. Journal of the Operational Research Society 48(3):334–334.
  10. Beurling A (1960) An automorphism of product measures. Annals of Mathematics 189–200.
  11. Birch M (1963) Maximum likelihood in three-way contingency tables. Journal of the Royal Statistical Society Series B: Statistical Methodology 25(1):220–233.
  12. Bradley RA, Terry ME (1952) Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika 39(3/4):324–345.
  13. Bregman LM (1967a) Proof of the convergence of sheleikhovskii’s method for a problem with transportation constraints. USSR Computational Mathematics and Mathematical Physics 7(1):191–204.
  14. Bregman LM (1967b) The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR computational mathematics and mathematical physics 7(3):200–217.
  15. Brualdi RA (1968) Convex sets of non-negative matrices. Canadian Journal of Mathematics 20:144–157.
  16. Bushell PJ (1973) Hilbert’s metric and positive contraction mappings in a banach space. Archive for Rational Mechanics and Analysis 52:330–338.
  17. Caron F, Doucet A (2012) Efficient bayesian inference for generalized bradley–terry models. Journal of Computational and Graphical Statistics 21(1):174–196.
  18. Chakrabarty D, Khanna S (2021) Better and simpler error analysis of the sinkhorn–knopp algorithm for matrix scaling. Mathematical Programming 188(1):395–407.
  19. Cuturi M (2013) Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems 26.
  20. Deming WE, Stephan FF (1940) On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. The Annals of Mathematical Statistics 11(4):427–444.
  21. Djoković D (1970) Note on nonnegative matrices. Proceedings of the American Mathematical Society 25(1):80–82.
  22. Dykstra O (1956) A note on the rank analysis of incomplete block designs–applications beyond the scope of existing tables. Biometrics 12(3):301–306.
  23. Elo AE (1978) The rating of chessplayers, past and present (Arco Pub.).
  24. Esfahani PM, Kuhn D (2018) Data-driven distributionally robust optimization using the wasserstein metric: performance guarantees and tractable reformulations. Mathematical Programming 171(1-2):115–166.
  25. Fiedler M (1973) Algebraic connectivity of graphs. Czechoslovak mathematical journal 23(2):298–305.
  26. Fienberg SE (1970) An iterative procedure for estimation in contingency tables. The Annals of Mathematical Statistics 41(3):907–917.
  27. Ford LR (1957) Solution of a ranking problem from binary comparisons. The American Mathematical Monthly 64(8P2):28–33.
  28. Ford LR, Fulkerson DR (1956) Maximal flow through a network. Canadian journal of Mathematics 8:399–404.
  29. Ford LR, Fulkerson DR (1957) A simple algorithm for finding maximal network flows and an application to the hitchcock problem. Canadian journal of Mathematics 9:210–218.
  30. Fortet R (1940) Résolution d’un systeme d’équations de m. schrödinger. J. Math. Pure Appl. IX 1:83–105.
  31. Franklin J, Lorenz J (1989) On the scaling of multidimensional matrices. Linear Algebra and its applications 114:717–735.
  32. Friedland S (2017) On schrödinger’s bridge problem. Sbornik: Mathematics 208(11):1705.
  33. Gale D, et al. (1957) A theorem on flows in networks. Pacific J. Math 7(2):1073–1082.
  34. Galichon A (2018) Optimal transport methods in economics (Princeton University Press).
  35. Galichon A (2021) The unreasonable effectiveness of optimal transport in economics. arXiv preprint arXiv:2107.04700 .
  36. Galichon A, Salanié B (2021) Matching with trade-offs: Revealed preferences over competing characteristics. arXiv preprint arXiv:2102.12811 .
  37. Georgiou TT, Pavon M (2015) Positive contraction mappings for classical and quantum schrödinger systems. Journal of Mathematical Physics 56(3):033301.
  38. Good IJ (1963) Maximum entropy for hypothesis formulation, especially for multidimensional contingency tables. The Annals of Mathematical Statistics 34(3):911–934.
  39. Gurvits L (2004) Classical complexity and quantum entanglement. Journal of Computer and System Sciences 69(3):448–484.
  40. Hall P (1935) On representatives of subsets. Journal of the London Mathematical Society 1(1):26–30.
  41. Hausman JA, Ruud PA (1987) Specifying and testing econometric models for rank-ordered data. Journal of econometrics 34(1-2):83–104.
  42. Hunter DR (2004) Mm algorithms for generalized bradley-terry models. The annals of statistics 32(1):384–406.
  43. Idel M (2016) A review of matrix scaling and sinkhorn’s normal form for matrices and positive maps. arXiv preprint arXiv:1609.06349 .
  44. Ireland CT, Kullback S (1968) Contingency tables with given marginals. Biometrika 55(1):179–188.
  45. Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM) 46(5):604–632.
  46. Knight PA (2008) The sinkhorn–knopp algorithm: convergence and applications. SIAM Journal on Matrix Analysis and Applications 30(1):261–275.
  47. Knight PA, Ruiz D (2013) A fast algorithm for matrix balancing. IMA Journal of Numerical Analysis 33(3):1029–1047.
  48. Kruithof J (1937) Telefoonverkeersrekening. De Ingenieur 52:15–25.
  49. Kullback S (1997) Information theory and statistics (Courier Corporation).
  50. Lamond B, Stewart NF (1981) Bregman’s balancing method. Transportation Research Part B: Methodological 15(4):239–248.
  51. Landau E (1895) Zur relativen wertbemessung der turnierresultate. Deutsches Wochenschach 11:366–369.
  52. Lange K (2016) MM optimization algorithms (SIAM).
  53. Léger F (2021) A gradient descent perspective on sinkhorn. Applied Mathematics & Optimization 84(2):1843–1855.
  54. Leontief WW (1965) The structure of the us economy. Scientific American 212(4):25–35.
  55. Luce RD (1959) Individual choice behavior: A theoretical analysis (Wiley).
  56. Luo ZQ, Tseng P (1992) On the convergence of the coordinate descent method for convex differentiable minimization. Journal of Optimization Theory and Applications 72(1):7–35.
  57. McFadden D (1978) Modelling the choice of residential location. Spatial interaction Theory and Planning Models .
  58. McFadden D (1981) Econometric models of probabilistic choice. Structural analysis of discrete data with econometric applications 198272.
  59. McFadden D, Train K (2000) Mixed mnl models for discrete response. Journal of applied Econometrics 15(5):447–470.
  60. McFadden D, et al. (1973) Conditional logit analysis of qualitative choice behavior .
  61. Newman M (2023) Efficient computation of rankings from pairwise comparisons. Journal of Machine Learning Research 24(238):1–25.
  62. Nguyen S (1984) Estimating origin destination matrices from observed flows. Publication of: Elsevier Science Publishers BV .
  63. Plackett RL (1975) The analysis of permutations. Journal of the Royal Statistical Society: Series C (Applied Statistics) 24(2):193–202.
  64. Plane DA (1982) An information theoretic approach to the estimation of migration flows. Journal of Regional Science 22(4):441–456.
  65. Pukelsheim F (2006) Current issues of apportionment methods. Mathematics and democracy: recent advances in voting systems and collective choice, 167–176 (Springer).
  66. Pukelsheim F (2014) Biproportional scaling of matrices and the iterative proportional fitting procedure. Annals of Operations Research 215:269–283.
  67. Pukelsheim F, Simeone B (2009) On the iterative proportional fitting procedure: Structure of accumulation points and l1-error analysis .
  68. Pyatt G, Round JI (1985) Social accounting matrices: A basis for planning. (No Title) .
  69. Ruiz D (2001) A scaling algorithm to equilibrate both rows and columns norms in matrices. Technical report, CM-P00040415.
  70. Ruschendorf L (1995) Convergence of the iterative proportional fitting procedure. The Annals of Statistics 1160–1174.
  71. Schneider MH, Zenios SA (1990) A comparative study of algorithms for matrix balancing. Operations research 38(3):439–455.
  72. Schrödinger E (1931) Über die umkehrung der naturgesetze. Sitzungsberichte der preussischen Akademie der Wissenschaften, physikalisch-mathematische Klasse 8(9):144–153.
  73. Silva JS, Tenreyro S (2006) The log of gravity. The Review of Economics and statistics 88(4):641–658.
  74. Sinkhorn R (1964) A relationship between arbitrary positive matrices and doubly stochastic matrices. The annals of mathematical statistics 35(2):876–879.
  75. Sinkhorn R (1967) Diagonal equivalence to matrices with prescribed row and column sums. The American Mathematical Monthly 74(4):402–405.
  76. Sinkhorn R (1974) Diagonal equivalence to matrices with prescribed row and column sums. ii. Proceedings of the American Mathematical Society 45(2):195–198.
  77. Sinkhorn R, Knopp P (1967) Concerning nonnegative matrices and doubly stochastic matrices. Pacific Journal of Mathematics 21(2):343–348.
  78. Soules GW (1991) The rate of convergence of sinkhorn balancing. Linear algebra and its applications 150:3–40.
  79. Spielman DA (2007) Spectral graph theory and its applications. 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), 29–38 (IEEE).
  80. Stone R (1962) Multiple classifications in social accounting. Bulletin de l’Institut International de Statistique 39(3):215–233.
  81. Stone R, Brown A (1971) A computable model of economic growth.
  82. Theil H (1967) Economics and information theory. (No Title) .
  83. Theil H, Rey G (1966) A quadratic programming approach to the estimation of transition probabilities. Management Science 12(9):714–721.
  84. Thionet P (1964) Note sur le remplissage d’un tableau à double entrée. Journal de la société française de statistique 105:228–247.
  85. Thurstone LL (1927) The method of paired comparisons for social values. The Journal of Abnormal and Social Psychology 21(4):384.
  86. Tomlin JA (2003) A new paradigm for ranking pages on the world wide web. Proceedings of the 12th international conference on World Wide Web, 350–355.
  87. Tseng P, Bertsekas DP (1987) Relaxation methods for problems with strictly convex separable costs and linear constraints. Mathematical Programming 38(3):303–321.
  88. Tversky A (1972) Elimination by aspects: A theory of choice. Psychological review 79(4):281.
  89. Yule GU (1912) On the methods of measuring association between two attributes. Journal of the Royal Statistical Society 75(6):579–652.
  90. Zermelo E (1929) Die berechnung der turnier-ergebnisse als ein maximumproblem der wahrscheinlichkeitsrechnung. Mathematische Zeitschrift 29(1):436–460.
Citations (3)

Summary

We haven't generated a summary for this paper yet.