Papers
Topics
Authors
Recent
Search
2000 character limit reached

Inferring Dynamic Networks from Marginals with Iterative Proportional Fitting

Published 28 Feb 2024 in stat.ML, cs.LG, cs.SI, math.OC, math.ST, and stat.TH | (2402.18697v2)

Abstract: A common network inference problem, arising from real-world data constraints, is how to infer a dynamic network from its time-aggregated adjacency matrix and time-varying marginals (i.e., row and column sums). Prior approaches to this problem have repurposed the classic iterative proportional fitting (IPF) procedure, also known as Sinkhorn's algorithm, with promising empirical results. However, the statistical foundation for using IPF has not been well understood: under what settings does IPF provide principled estimation of a dynamic network from its marginals, and how well does it estimate the network? In this work, we establish such a setting, by identifying a generative network model whose maximum likelihood estimates are recovered by IPF. Our model both reveals implicit assumptions on the use of IPF in such settings and enables new analyses, such as structure-dependent error bounds on IPF's parameter estimates. When IPF fails to converge on sparse network data, we introduce a principled algorithm that guarantees IPF converges under minimal changes to the network structure. Finally, we conduct experiments with synthetic and real-world data, which demonstrate the practical value of our theoretical and algorithmic contributions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (87)
  1. Aas, E. Limit points of the iterative scaling procedure. Ann Oper Res, 215:15–23, 2014.
  2. Adamczak, R. A tail inequality for suprema of unbounded empirical processes with applications to markov chains. 2008.
  3. A fast and accurate method for estimating people flow from spatiotemporal population data. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI ’18), 2018.
  4. Epidemic forecasting on networks: Bridging local samples with global outcomes. Working paper, 2023.
  5. An introduction to random matrices. Number 118. Cambridge university press, 2010.
  6. Bennett, G. Probability inequalities for the sum of independent random variables. Journal of the American Statistical Association, 57(297):33–45, 1962.
  7. Incomplete two-dimensional contingency tables. Biometrics, 25(1):119–128, 1969.
  8. Discrete Multivariate Analysis. Springer, New York, NY, 1974.
  9. Generalized results for the existence and consistency of the mle in the bradley-terry-luce model. In International Conference on Machine Learning, pp.  2160–2177. PMLR, 2022.
  10. Bregman, L. M. Proof of the convergence of sheleikhovskii’s method for a problem with transportation constraints. USSR Computational Mathematics and Mathematical Physics, 7(1):191–204, 1967.
  11. A method for direct estimation of origin/destination trip matrices. Transportation Science, 15(1):32–49, 1981.
  12. Orlicz random fourier features. The Journal of Machine Learning Research, 21(1):5739–5775, 2020.
  13. Mobility network models of covid-19 explain inequities and inform reopening. Nature, 589(7840):82–87, 2021a.
  14. Supporting covid-19 policy response with large-scale mobility-based modeling. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’21), 2021b.
  15. Analysis of overdispersion in airborne transmission of covid-19. Physics of Fluids, 34(051914), 2022.
  16. Network inference and influence maximization from samples. In Proceedings of the 38th International Conference on Machine Learning (ICML’21), 2021.
  17. CitiBike. System data, 2023. Available at https://citibikenyc.com/system-data.
  18. Leveraging administrative data for bias audits: Assessing disparate coverage with mobility data for covid-19 policy. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT’21), 2021.
  19. Csiszár, I. I-divergence geometry of probability distributions and minimization problems. Ann. Probab., 3(1):146–158, 1975.
  20. Information geometry and alternating minimization procedures. Statistics and Decisions, Supplement Issue, 1:205–237, 1984.
  21. Multi-item auctions. Journal of Political Economy, 94(4), 1986.
  22. On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Ann. Math. Statist., 11(4):427–444, 1940.
  23. Dewey. Safegraph data for academic research, 2023. Available at https://www.deweydata.io/data-partners/safegraph.
  24. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010.
  25. The gravity model in transportation analysis : theory and extensions. VSP, Utrecht, Netherlands, 1990.
  26. Fiedler, M. Algebraic connectivity of graphs. Czechoslovak mathematical journal, 23(2):298–305, 1973.
  27. Fienberg, S. E. An iterative procedure for estimation in contingency tables. Ann. Math. Statist., 41(3):907–917, 1970.
  28. On the scaling of multidimensional matrices. Linear Algebra and its applications, 114:717–735, 1989.
  29. Cupid’s invisible hand: Social surplus and identification in matching models. The Review of Economic Studies, 89(5):2600–2629, 2022a.
  30. Estimating separable matching models. arXiv preprint arXiv:2204.00362, 2022b.
  31. Regression analyses of counts and rates: Poisson, overdispersed poisson, and negative binomial models. Psychological Bulletin, 118(3):392–404, 1995.
  32. Inferring networks of diffusion and influence. ACM Trans. Knowl. Discov. Data, 5(4):1–37, 2012.
  33. Note on sampling without replacing from a finite collection of matrices. arXiv preprint arXiv:1001.2738, 2010.
  34. Hall, P. On representatives of subsets. Journal of the London Mathematical Society, s1-10(1), 1935.
  35. Network inference via the time-varying graphical lasso. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’17), 2017.
  36. Harremoës, P. Binomial and poisson distributions as maximum entropy distributions. IEEE Transactions on Information Theory, 47(5):2039–2041, 2001.
  37. Minimax rate for learning from pairwise comparisons in the btl model. In International Conference on Machine Learning, pp.  4193–4202. PMLR, 2020.
  38. Holford, T. R. The analysis of rates and of survivorship using log-linear models. Biometrics, 36(2):299–305, 1980.
  39. Staying at home is a privilege: Evidence from finegrained mobile phone location data in the united states during the covid-19 pandemic. Annals of the American Association of Geographers, 2021.
  40. Huber, P. J. The behavior of maximum likelihood estimates under nonstandard conditions. Berkeley Symp. on Math. Statist. and Prob., 5.1:221–233, 1967.
  41. Idel, M. A review of matrix scaling and sinkhorn’s normal form for matrices and positive maps. arXiv, 2016.
  42. Contingency tables with given marginals. Biometrika, 55(1):179–188, 1968.
  43. Kruithof, J. Telefoonverkeersrekening. De Ingenieur, 52:15–25, 1937.
  44. Inverting a steady-state. In Proceedings of the 8th ACM International Conference on Web Search and Data Mining (WSDM’15), 2015.
  45. Léger, F. A gradient descent perspective on sinkhorn. Applied Mathematics & Optimization, 84(2):1843–1855, 2021.
  46. Optimal intervention on weighted networks via edge centrality. In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM’23), 2023.
  47. Little, R. J. Post-stratification: A modeler’s perspective. Journal of the American Statistical Association, 88(423):1001–1012, 1993.
  48. Models for contingency tables with known margins when target and sampled populations differ. Journal of the American Statistical Association, 86(413):87–95, 1991.
  49. Estimating population attribute values in a table: “get me started in”’ iterative proportional fitting. The Professional Geographer, 68(3):451–461, 2015.
  50. Evaluating the performance of iterative proportional fitting for spatial microsimulation: New tests for an established technique. Journal of Artificial Societies and Social Simulation, 18(2), 2015.
  51. Luce, R. D. Individual choice behavior. 1959.
  52. On the convergence of the coordinate descent method for convex differentiable minimization. Journal of Optimization Theory and Applications, 72(1):7–35, 1992.
  53. Choicerank: Identifying preferences from node traffic in networks. In Proceedings of the 34th International Conference on Machine Learning (ICML’17), 2017.
  54. Iterative proportional fitting procedure to determine bus route passenger origin–destination flows. Transportation Research Record, 2145(1):59–65, 2010.
  55. McCullagh, P. Generalized linear models. Routledge, 2019.
  56. Distance-based model for estimating a bus route origin-destination matrix. Transportation Research Record, pp.  16–16, 1994.
  57. Strategic choices of migrants and smugglers in the central mediterranean sea. arXiv, 2022.
  58. Plane, D. A. An information theoretic approach to the estimation of migration flows. Journal of Regional Science, 22(4):441–456, 1982.
  59. Pukelsheim, F. Biproportional scaling of matrices and the iterative proportional fitting procedure. Ann. Oper. Res., 215:269–283, 2014.
  60. On the iterative proportional fitting procedure: Structure of accumulation points and l1-error analysis. Preprint, 2009.
  61. On sinkhorn’s algorithm and choice modeling. arXiv preprint arXiv:2310.00260, 2023.
  62. Equivalence of maxent and poisson point process models for species distribution modeling in ecology. Biometrics, 69(1):274–281, 2013.
  63. High-dimensional statistics. arXiv preprint arXiv:2310.19244, 2023.
  64. Learning to infer structures of network games. In Proceedings of the 39th International Conference on Machine Learning (ICML’22), 2022.
  65. Ruschendorf, L. Convergence of the iterative proportional fitting procedure. The Annals of Statistics, 23(4):1160–1174, 1995.
  66. SafeGraph. What about bias in the safegraph dataset? 2019. Available at https://www.safegraph.com/blog/what-about-bias-in-the-safegraph-dataset.
  67. SafeGraph. Patterns. 2020a. Available at https://docs.safegraph.com/docs/monthly-patterns.
  68. SafeGraph. Social distancing metrics. 2020b. Available at https://docs.safegraph.com/docs/social-distancing-metrics.
  69. SafeGraph. Determining points of interest visits from location data: A technical guide to visit attribution. 2021. Available at https://www.safegraph.com/guides/visit-attribution-white-paper.
  70. Approximation algorithms for reducing the spectral radius to control epidemic spread. In Proceedings of the 2015 SIAM International Conference on Data Mining (SDM’15), 2015.
  71. A comparative study of algorithms for matrix balancing. Operations Research, 38(3):439–455, 1990.
  72. Learning rich rankings. Advances in Neural Information Processing Systems, 33:9435–9446, 2020.
  73. Estimation from pairwise comparisons: Sharp minimax bounds with topology dependence. In Artificial intelligence and statistics, pp.  856–865. PMLR, 2015.
  74. The log of gravity. The Review of Economics and statistics, 88(4):641–658, 2006.
  75. Sinkhorn, R. A relationship between arbitrary positive matrices and doubly stochastic matrices. The annals of mathematical statistics, 35(2):876–879, 1964.
  76. Sinkhorn, R. Diagonal equivalence to matrices with prescribed row and column sums. ii. Proceedings of the American Mathematical Society, 45(2):195–198, 1974.
  77. Sion, M. On general minimax theorems. 1958.
  78. Spielman, D. Spectral graph theory. Combinatorial scientific computing, 18:18, 2012.
  79. Spectral sparsification of graphs. SIAM Journal on Computing, 40(4), 2011.
  80. Matrix perturbation theory. 1990.
  81. Gelling, and melting, large graphs by edge manipulation. In Proceedings of the 21st ACM international conference on Information and knowledge management (CIKM’12), 2012.
  82. Vershynin, R. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
  83. Parameter estimation for generalized thurstone choice models. In International Conference on Machine Learning, pp.  498–506. PMLR, 2016.
  84. Convergence rates of gradient descent and mm algorithms for bradley-terry models. In International Conference on Artificial Intelligence and Statistics, pp.  1254–1264. PMLR, 2020.
  85. Epidemic spreading in real networks: an eigenvalue viewpoint. In 22nd International Symposium on Reliable Distributed Systems, 2003.
  86. Wong, D. W. S. The reliability of using the iterative proportional fitting procedure. The Professional Geographer, 44(3):340–348, 1992.
  87. Zipf, G. K. The P1⁢P2/Dsubscript𝑃1subscript𝑃2𝐷P_{1}P_{2}/Ditalic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / italic_D hypothesis: On the intercity movement of persons. American Sociological Review, 11(6), 1946.
Citations (2)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 5 tweets with 62 likes about this paper.