Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Protecting Sensitive Tabular Data in Hybrid Clouds (2312.01354v1)

Published 3 Dec 2023 in cs.CR

Abstract: Regulated industries, such as Healthcare and Finance, are starting to move parts of their data and workloads to the public cloud. However, they are still reluctant to trust the public cloud with their most sensitive records, and hence leave them in their premises, leveraging the hybrid cloud architecture. We address the security and performance challenges of big data analytics using a hybrid cloud in a real-life use case from a hospital. In this use case, the hospital collects sensitive patient data and wants to run analytics on it in order to lower antibiotics resistance, a significant challenge in healthcare. We show that it is possible to run large-scale analytics on data that is securely stored in the public cloud encrypted using Apache Parquet Modular Encryption (PME), without significant performance losses even if the secret encryption keys are stored on-premises. PME is a standard mechanism for data encryption and key management, not specific to any public cloud, and therefore helps prevent vendor lock-in. It also provides privacy and integrity guarantees, and enables granular access control to the data. We also present an innovation in PME for lowering the performance hit incurred by calls to the Key Management Service. Our solution therefore enables protecting large amounts of sensitive data in hybrid clouds and still allows to efficiently gain valuable insights from it.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (11)
  1. Apache Parquet 2019. PARQUET-1373: Encryption Key Management Tools. Retrieved May 31, 2021 from https://docs.google.com/document/d/1bEu903840yb95k9q2X-BlsYKuXoygE4VnMDl9xz_zhk
  2. Data Protection as a Service in the Multi-Cloud Environment. In 2019 IEEE 12th International Conference on Cloud Computing (CLOUD). 81–85. https://doi.org/10.1109/CLOUD.2019.00025
  3. FHIR 2021. IBM FHIR Server - implementation of the HL7 FHIR specification. Retrieved May 31, 2021 from https://ibm.github.io/FHIR/
  4. Gidon Gershinsky. 2019. Parquet modular encryption: Confidentiality and integrity of sensitive column data [Conference presentation]. Retrieved May 31, 2021 from https://conferences.oreilly.com/strata/strata-ny-2019/public/schedule/detail/77144.html
  5. Tomer Solomon Gidon Gershinsky. 2019. Test Driving Parquet Encryption. Retrieved May 31, 2021 from https://medium.com/@tomersolomon/test-driving-parquet-encryption-3d5319f5bc22
  6. NIST 2020. NIST Special Publication 800-57 Part 1 Revision 5 Recommendation for Key Management: Part 1 – General. Retrieved May 31, 2021 from https://doi.org/10.6028/NIST.SP.800-57pt1r5
  7. Apache Parquet. 2019. Parquet Modular Encryption specification in parquet-format. Retrieved May 31, 2021 from https://github.com/apache/parquet-format/blob/apache-parquet-format-2.7.0/Encryption.md
  8. SYNTHEA 2021. SyntheaTM is an open-source, synthetic patient generator that models the medical history of synthetic patients. Retrieved May 31, 2021 from https://synthetichealth.github.io/synthea/
  9. VAULT 2020. Vault by HashiCorp. Retrieved May 31, 2021 from https://www.vaultproject.io/
  10. Xiangqiang Xu and Xinghui Zhao. 2015. A Framework for Privacy-Aware Computing on Hybrid Clouds with Mixed-Sensitivity Data. In 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems. 1344–1349. https://doi.org/10.1109/HPCC-CSS-ICESS.2015.110
  11. Apache Spark: A Unified Engine for Big Data Processing. Commun. ACM 59, 11 (Oct. 2016), 56–65. https://doi.org/10.1145/2934664
Citations (1)

Summary

We haven't generated a summary for this paper yet.