Geographic Spines in the 2020 Census Disclosure Avoidance System (2203.16654v3)
Abstract: The 2020 Census Disclosure Avoidance System (DAS) is a formally private mechanism that first adds independent noise to cross tabulations for a set of pre-specified hierarchical geographic units, which is known as the geographic spine. After post-processing these noisy measurements, DAS outputs a formally private database with fields indicating location in the standard census geographic spine, which is defined by the United States as a whole, states, counties, census tracts, block groups, and census blocks. This paper describes how the geographic spine used internally within DAS to define the initial noisy measurements impacts accuracy of the output database. Specifically, tabulations for geographic areas tend to be most accurate for geographic areas that both 1) can be derived by aggregating together geographic units above the block geographic level of the internal spine, and 2) are closer to the geographic units of the internal spine. After describing the accuracy tradeoffs relevant to the choice of internal DAS geographic spine, we provide the settings used to define the 2020 Census production DAS runs.
- The 2020 Census Disclosure Avoidance System TopDown Algorithm. Harvard Data Science Review, (Special Issue 2), 2022. https://hdsr.mitpress.mit.edu/pub/7evz361i.
- 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop, chapter Identification of Rural and Special Populations: American Indians and Alaska Natives. The National Academies Press, Washington, DC, 2020.
- Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, pages 635–658. Springer, 2016.
- The discrete gaussian for differential privacy. Advances in Neural Information Processing Systems, 33:15676–15688, 2020.
- Disclosure avoidance for the 2020 census demographic and housing characteristics file. arXiv preprint arXiv:2312.10863, 2023.
- Gaussian differential privacy. Journal of the Royal Statistical Society Series B, 84(1):3–37, 2022.
- Our data, ourselves: Privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 486–503. Springer, 2006.
- Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pages 265–284. Springer, 2006.
- Universally utility-maximizing privacy mechanisms. SIAM Journal on Computing, 41(6):1673–1693, 2012.
- Boosting the accuracy of differentially private histograms through consistency. Proceedings of the VLDB Endowment, 3(1), 2010.
- Bayesian and frequentist semantics for common variations of differential privacy: Applications to the 2020 census. arXiv preprint arXiv:2209.03310, 2022.
- No free lunch in data privacy. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 193–204, 2011.
- Optimizing linear counting queries under differential privacy. In Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 123–134, 2010.
- Optimizing error of high-dimensional statistical queries under differential privacy. Proceedings of the VLDB Endowment, 11(10):1206–1219, 2018.
- A statistical framework for differential privacy. Journal of the American Statistical Association, 105(489):375–389, 2010.
- Ryan Cumings-Menon (11 papers)
- John M. Abowd (18 papers)
- Robert Ashmead (8 papers)
- Daniel Kifer (65 papers)
- Philip Leclerc (9 papers)
- Jeffrey Ocker (1 paper)
- Michael Ratcliffe (1 paper)
- Pavel Zhuravlev (7 papers)