Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Copula-based transferable models for synthetic population generation (2302.09193v3)

Published 17 Feb 2023 in stat.ML and cs.LG

Abstract: Population synthesis involves generating synthetic yet realistic representations of a target population of micro-agents for behavioral modeling and simulation. Traditional methods, often reliant on target population samples, such as census data or travel surveys, face limitations due to high costs and small sample sizes, particularly at smaller geographical scales. We propose a novel framework based on copulas to generate synthetic data for target populations where only empirical marginal distributions are known. This method utilizes samples from different populations with similar marginal dependencies, introduces a spatial component into population synthesis, and considers various information sources for more realistic generators. Concretely, the process involves normalizing the data and treating it as realizations of a given copula, and then training a generative model before incorporating the information on the marginals of the target population. Utilizing American Community Survey data, we assess our framework's performance through standardized root mean squared error (SRMSE) and so-called sampled zeros. We focus on its capacity to transfer a model learned from one population to another. Our experiments include transfer tests between regions at the same geographical level as well as to lower geographical levels, hence evaluating the framework's adaptability in varied spatial contexts. We compare Bayesian Networks, Variational Autoencoders, and Generative Adversarial Networks, both individually and combined with our copula framework. Results show that the copula enhances machine learning methods in matching the marginals of the reference data. Furthermore, it consistently surpasses Iterative Proportional Fitting in terms of SRMSE in the transferability experiments, while introducing unique observations not found in the original training sample.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. A learning based transportation oriented simulation system. Transportation Research Part B, 38(7):613–633, 2004.
  2. Creating synthetic household populations: Problems and approach. Transportation Research Record, 2014:85–91, 2007.
  3. Integration of activity scheduling and traffic assignment in ADAPTS activity-based model. In TRB 91st Annual Meeting Compendium of Papers DVD, number 12-4225, Washington DC, USA, Jan. 2012. Transportation Research Board.
  4. Population synthesis with subregion-level control variable aggregation. Journal of Transportation Engineering, 135(9):632–639, 2009.
  5. Efficient correlation matching for fitting discrete multivariate distributions with arbitrary marginals and normal-copula dependence. INFORMS Journal on Computing, 21:88–106, 2009.
  6. J. Barthelemy and P. L. Toint. Synthetic population generation without a sample. Transportation Science, 47(2):266–279, 2013.
  7. C. R. Bhat and N. Eluru. A copula-based approach to accommodate residential self-selection effects in travel behavior modeling. Transportation Research Part B, 43(7):749–765, 2009.
  8. C. M. Bishop. Pattern Recognition and Machine Learning. Springer, New-York, NY, USA, 2006.
  9. A variational autoencoder solution for road traffic forecasting systems: Missing data imputation, dimension reduction, model selection and anomaly detection. Transportation Research Part C, 115:102622, 2020.
  10. Assessing water resource system vulnerability to unprecedented hydrological drought using copulas to characterize drought duration and deficit. Water Resources Research, 51(11):8927–8948, 2015.
  11. Joint model of weekend discretionary activity participation and episode duration. Transportation Research Record, 2413:34–44, 2014.
  12. S. S. Borysov and J. Rich. Introducing synthetic pseudo panels: application to transport behaviour dynamics. Transportation, 48(5):2493–2520, 2021.
  13. How to generate micro-agents? A deep generative modeling approach to population synthesis. Transportation Research Part C, 106:73–97, 2019.
  14. Sacsim: An applied activity-based model system with fine-level spatial and temporal resolution. Journal of Choice Modelling, 3(1):5–31, 2010.
  15. Synthetic population generation by combining a hierarchical, simulation-based approach with reweighting by generalized raking. Transportation Research Record, 2493:107–116, 2015.
  16. Generation of synthetic populations in social simulations: A review of methods and practices. Journal of Artificial Societies and Social Simulation, 25(2):6, 2022. ISSN 1460-7425.
  17. Copula Methods in Finance. John Wiley & Sons, Chichester, United Kingdom, 2004.
  18. Population synthesis using iterative proportional fitting (ipf): A review and future research. Transportation Research Procedia, 17:223–233, 2016. ISSN 2352-1465. International Conference on Transportation Planning and Implementation Methodologies for Developing Countries (12th TPMDC) Selected Proceedings, IIT Bombay, Mumbai, India, 10-12 December 2014.
  19. G. F. Cooper. The computational complexity of probabilistic inference using Bayesian belief networks. Artificial Intelligence, 42(2):393–405, 1990.
  20. On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. The Annals of Mathematical Statistics, 11(4):427–444, 1940.
  21. SYNSAM: A methodology for synthesizing household transportation survey data. Working paper 7618, Institute of Transportation Studies, University of California, Berkeley, CA, USA, 1976.
  22. F. Durante and C. Sempi. Principles of Copula Theory. Taylor & Francis, Boca Raton, FL, USA, 2016. doi: 10.1201/b18674.
  23. Population updating system structures and models embedded in the comprehensive econometric microsimulator for urban systems. Transportation Research Record, 2076:171–182, 2008.
  24. Simulation based population synthesis. Transportation Research Part B, 58:243–263, 2013.
  25. Prediction of rare feature combinations in population synthesis: Application of deep generative modelling. Transportation Research Part C, 120:102787, 2020.
  26. C. Genest and A.-C. Favre. Everything you always wanted to know about copula modeling but were afraid to ask. Journal of hydrologic engineering, 12(4):347–368, 2007.
  27. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014.
  28. Smartphone location identification and transport mode recognition using an ensemble of generative adversarial networks. In Adjunct Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers, UbiComp-ISWC ’20, pages 311––316, New York, NY, USA, Sept. 2020. Association for Computing Machinery.
  29. Population synthesis for microsimulating travel behavior. Transportation Research Record, 2014:92–101, 2007.
  30. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20(3):194–243, 1995.
  31. Z. Huang and P. Williamson. A comparison of synthetic reconstruction and combinatorial optimisation approaches to the creation of small-area microdata. Working Paper 2001/2, University of Liverpool, Oct. 2001.
  32. S. Hörl and M. Balac. Synthetic population and travel demand for Paris and Île-de-France based on open and publicly available data. Transportation Research Part C, 130:103291, 2021.
  33. Call-type dependence in multiskill call centers. Simulation, 89(6):722–734, 2013.
  34. Copulae in Mathematical and Quantitative Finance: Proceedings of the Workshop Held in Cracow, 10–11 July 2012, volume 213 of Lecture Notes in Statistics, Berlin Heidelberg, Germany, 2013. Springer-Verlag. doi: 10.1007/978-3-642-35407-6.
  35. H. Joe. Multivariate Models and Dependence Concepts. Springer, New York, NY, USA, 1997.
  36. H. Joe. Dependence modeling with copulas. CRC Press, Boca Raton, FL, USA, 2015.
  37. Dependence-preserving approach to synthesizing household characteristics. Transportation Research Record, 2302:192–200, 2012.
  38. On modelling human population characteristics with copulas. Procedia Computer Science, 151:210–217, 2019. The 10th International Conference on Ambient Systems, Networks and Technologies (ANT 2019) / The 2nd International Conference on Emerging Data and Industry 4.0 (EDI40 2019) / Affiliated Workshops.
  39. E.-J. Kim and P. Bansal. A deep generative model for feasible and diverse population synthesis. Transportation Research Part C, 148:104053, 2023.
  40. D. P. Kingma and M. Welling. Auto-encoding variational bayes. In International Conference on Learning Representations (ICLR), Banff, AB, Canada, Apr. 2014.
  41. Enhanced synthetic population generator that accommodates control variables at multiple geographic resolutions. Transportation Research Record, 2563:40–50, 2016.
  42. W. Lam and F. Bacchus. Learning bayesian belief networks: An approach based on the mdl principle. Computational Intelligence, 10(3):269–293, 1994.
  43. Data Preprocessing, chapter 2, pages 16–50. John Wiley & Sons, Hoboken, NJ, USA, 2014.
  44. K. Müller and K. W. Axhausen. Population synthesis for microsimulation: State of the art. In TRB 90th Annual Meeting Compendium of Papers DVD, number 11-1789, Washington DC, USA, Jan. 2011. Transportation Research Board.
  45. R. B. Nelsen. An Introduction to Copulas. Springer, New York, NY, USA, second edition, 2006.
  46. A. K. Nikoloulopoulos. Copula-based models for multivariate discrete response data. In P. Jaworski, F. Durante, and W. K. Härdle, editors, Copulae in Mathematical and Quantitative Finance, volume 213 of Lecture Notes in Statistics, pages 231–249. Springer, 2013.
  47. Copulae in High Dimensions: An Introduction, chapter 13, pages 247–277. Springer, Berlin, Germany, third edition, 2017.
  48. Rate-based daily arrival process models with application to call centers. Operations Research, 64, 2016.
  49. Residential self-selection effects in an activity time-use behavior model. Transportation Research Part B, 43(7):729–748, 2009. doi: https://doi.org/10.1016/j.trb.2009.02.002.
  50. Advances in population synthesis: fitting many attributes per agent and fitting to household and person margins simultaneously. Transportation, 39(3):685–704, 2012.
  51. Copula-based method for addressing endogeneity in models of severity of traffic crash injuries: Application to two-vehicle crashes. Transportation Research Record, 2147:75–87, 2010.
  52. Forecasting travel behavior using Markov Chains-based approaches. Transportation Research Part C, 69:402–417, 2016a.
  53. Hidden Markov model-based population synthesis. Transportation Research Part B, 90:1–21, 2016b.
  54. P. Salvini and E. Miller. ILUTE: An operational prototype of a comprehensive microsimulation model of urban systems. Networks and Spatial Economics, 5(2):217–234, 2005.
  55. A. Sklar. Fonctions de répartition à n dimensions et leurs marges. Publications de l’Institut de Statistique de l’Université de Paris, 8:229–231, 1959.
  56. L. Sun and A. Erath. A Bayesian network approach for population synthesis. Transportation Research Part C, 61:49–62, 2015.
  57. A hierarchical mixture modeling framework for population synthesis. Transportation Research Part B, 114:199–212, 2018.
  58. Modeling tabular data using conditional GAN. In H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, and E. B. Fox, editors, Proceedings of the 33rd International Conference on Neural Information Processing Systems, pages 7335–7345, Red Hook, NY, USA, 2019. Curran Associates Inc.
  59. Comparing methods for generating a two-layered synthetic population. Transportation Research Record, 2675(1):136–147, 2021.
  60. Semi-supervised GANs to infer travel modes in GPS trajectories. Journal of Big Data Analytics in Transportation, 3(3):201–211, 2021.
  61. Methodology to match distributions of both household and person attributes in generation of synthetic populations. In TRB 88th Annual Meeting Compendium of Papers DVD, number 09-2096, Washington DC, USA, Jan. 2009. Transportation Research Board.
  62. A generative model of urban activities from cellular data. IEEE Transactions on Intelligent Transportation Systems, 19(6):1682–1696, 2018.
  63. Connected population synthesis for transportation simulation. Transportation Research Part C, 103:1–16, 2019.
  64. Accessibility in a post-apartheid city: Comparison of two approaches for accessibility computations. Networks and Spatial Economics, 18(2):241–271, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Pascal Jutras-Dubé (5 papers)
  2. Mohammad B. Al-Khasawneh (2 papers)
  3. Zhichao Yang (37 papers)
  4. Javier Bas (1 paper)
  5. Fabian Bastin (1 paper)
  6. Cinzia Cirillo (2 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com