Improving Data Minimization through Decentralized Data Architectures
Abstract: In this research project, we investigate an alternative to the standard cloud-centralized data architecture. Specifically, we aim to leave part of the application data under the control of the individual data owners in decentralized personal data stores. Our primary goal is to increase data minimization, i. e., enabling more sensitive personal data to be under the control of its owners while providing a straightforward and efficient framework to design architectures that allow applications to run and data to be analyzed. To serve this purpose, the centralized part of the schema contains aggregating views over this decentralized data. We propose to design a declarative language that extends SQL, for architects to specify different kinds of tables and views at the schema level, along with sensitive columns and their minimum granularity level of their aggregations. Local updates need to be reflected in the centralized views while ensuring privacy throughout intermediate calculations; for this we pursue the integration of distributed materialized view maintenance and multi-party computation (MPC) techniques. We finally aim to implement this system, where the personal data stores could either live in mobile devices or encrypted cloud storage, in order to evaluate its performance properties.
- A. K. Gupta, I. S. Mumick, Maintenance of materialized views: Problems, techniques, and applications, IEEE Data Eng. Bull. 18 (1999) 3–18.
- M. Raasveldt, H. Mühleisen, Duckdb: an embeddable analytical database, in: Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, ACM, 2019, pp. 1981–1984.
- A. P. I. et. al., Celliq : Real-time cellular network analytics at scale, in: 12th USENIX Symposium on Networked Systems Design and Implementation, NSDI 15, Oakland, CA, USA, May 4-6, 2015, USENIX Association, 2015, pp. 309–322.
- M. Raasveldt, H. Mühleisen, Monetdblite: An embedded analytical database, CoRR abs/1805.08520 (2018). URL: http://arxiv.org/abs/1805.08520. arXiv:1805.08520.
- C. Dwork, Differential privacy: A survey of results, in: Theory and Applications of Models of Computation, 5th International Conference, TAMC 2008, Xi’an, China, April 25-29, 2008. Proceedings, 2008.
- F. McSherry, Privacy integrated queries, in: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD), Association for Computing Machinery, Inc., 2009.
- T. A. et. al., The dataflow model: A practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing, Proc. VLDB Endow. 8 (2015) 1792–1803.
- Y. Lindell, Secure multiparty computation, Commun. ACM 64 (2020) 86–96. URL: https://doi.org/10.1145/3387108. doi:10.1145/3387108.
- J. C. C. et. al., Spanner: Google’s globally-distributed database, in: OSDI, 2012.
- Secrecy: Secure collaborative analytics on secret-shared data, CoRR abs/2102.01048 (2021). URL: https://arxiv.org/abs/2102.01048. arXiv:2102.01048.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.