The Queen's Guard: A Secure Enforcement of Fine-grained Access Control In Distributed Data Analytics Platforms (2106.13123v4)

Published 24 Jun 2021 in cs.CR

Abstract: Distributed data analytics platforms (i.e., Apache Spark, Hadoop) provide high-level APIs to programmatically write analytics tasks that are run distributedly in multiple computing nodes. The design of these frameworks was primarily motivated by performance and usability. Thus, the security takes a back seat. Consequently, they do not inherently support fine-grained access control or offer any plugin mechanism to enable it, making them risky to be used in multi-tier organizational settings. There have been attempts to build "add-on" solutions to enable fine-grained access control for distributed data analytics platforms. In this paper, first, we show that straightforward enforcement of ``add-on'' access control is insecure under adversarial code execution. Specifically, we show that an attacker can abuse platform-provided APIs to evade access controls without leaving any traces. Second, we designed a two-layered (i.e., proactive and reactive) defense system to protect against API abuses. On submission of a user code, our proactive security layer statically screens it to find potential attack signatures prior to its execution. The reactive security layer employs code instrumentation-based runtime checks and sandboxed execution to throttle any exploits at runtime. Next, we propose a new fine-grained access control framework with an enhanced policy language that supports map and filter primitives. Finally, we build a system named SecureDL with our new access control framework and defense system on top of Apache Spark, which ensures secure access control policy enforcement under adversaries capable of executing code. To the best of our knowledge, this is the first fine-grained attribute-based access control framework for distributed data analytics platforms that is secure against platform API abuse attacks. Performance evaluation showed that the overhead due to added security is low.

Citations (2,291)

View on Semantic Scholar

Summary

The paper presents SecureDL, a dual-layered framework that uses proactive static analysis and reactive runtime checks to enforce access control.
It critiques insecure add-on solutions, revealing that transient API attacks can bypass traditional inline reference monitors in distributed platforms.
Empirical results show a minimal performance overhead of about 4% on a 6-node Hadoop cluster, demonstrating SecureDL’s practical viability.

The Queen's Guard: A Secure Enforcement of Fine-grained Access Control In Distributed Data Analytics Platforms

Introduction

The paper "The Queen's Guard: A Secure Enforcement of Fine-grained Access Control In Distributed Data Analytics Platforms" addresses one of the significant security challenges in distributed data analytics frameworks such as Apache Spark and Hadoop. The primary issue discussed is the lack of native support for fine-grained access control within these platforms, making them unsuitable for multi-tier organizational settings where data security is paramount. This paper critiques existing "add-on" solutions and proposes a robust security framework named SecureDL that incorporates both proactive and reactive defense layers to ensure secure execution of user-defined analytics tasks.

Identified Issues with Existing Solutions

The authors argue that existing attempts to integrate fine-grained access control mechanisms into these platforms, often through inline reference monitors (IRMs) and code instrumentation, are fundamentally insecure under adversarial conditions. Specifically, it has been observed that an attacker can manipulate platform-provided APIs to evade access controls stealthily, which they term as "transient attacks." These attacks do not leave noticeable traces, rendering traditional monitoring and detection mechanisms ineffective.

Proposed Two-Layered Defense System

Proactive Defense Layer

The paper introduces a two-fold defense strategy, beginning with a proactive layer that employs static code analysis to identify potential attack signatures in user-submitted code before execution. The static analysis methods are designed to effectively detect suspicious API usages that could indicate attempts to bypass access control mechanisms. This approach targets both blockable and non-blockable API abuses, ensuring that system integrity is maintained by mitigating threats at the source.

Reactive Defense Layer

Recognizing the limitations of static code analysis, the proposed system includes a reactive defense layer. This layer implements runtime checks and enforces sandboxed execution to handle any security breaches that slip through the proactive defenses. The key components of the reactive layer involve:

Binary Integrity Checking: Ensuring the integrity of the trusted computing base (TCB).
Static Code Instrumentation-Based Runtime Checks: Monitors potential exploits during execution.
Java Security Manager: Provides additional runtime protections by restricting harmful API invocations.

Enhanced Access Control Framework

The authors propose a new fine-grained attribute-based access control framework that extends traditional models by supporting both map and filter primitives. This framework, implemented in SecureDL on top of Apache Spark, allows for more versatile and dynamic policy enforcement which can cater to various data types and organizational requirements. The policies can be specified using Scala, enabling sophisticated enforcement logic for unstructured data.

Implementation Specifics

Aspect-Oriented Programming (AOP) is leveraged to inject the access control policies dynamically, ensuring no modifications to the core platform code are required. This framework-agnostic approach is tested by integrating with Apache Hive to demonstrate its applicability across different data processing systems.

Performance Evaluation

The empirical evaluation results presented in the paper are compelling, demonstrating that the performance overhead introduced by the SecureDL system is minimal:

For a 6-node Hadoop cluster, a mean performance overhead of approximately 4% was observed.
The overhead due to added security measures is described as low, thus proving the feasibility of the solution in real-world scenarios.

Implications and Future Developments

Practical Implications:

SecureDL facilitates secure, fine-grained access control in distributed data analytics platforms, enhancing their suitability for deployment in security-conscious organizational environments. This reduces the risk of data breaches and unauthorized data accesses, ensuring that sensitive information is processed in compliance with organizational policies.

Theoretical Implications:

The paper's findings challenge the adequacy of existing IRM-based solutions and highlight the need for multi-layered security mechanisms. Establishing the dual-layered defense approach as a standard could spark further research into hybrid security models and the development of more resilient data processing frameworks.

Conclusion

The paper presents a well-founded critique of the current state of fine-grained access control in distributed data analytics platforms and proposes a comprehensive solution that effectively addresses security shortcomings. SecureDL, with its two-layered defense and enhanced policy language, represents a significant step forward in securing distributed data processing systems. Future research may build upon these findings to further refine and enhance security frameworks compatible with evolving AI and big data landscapes.

PDF Markdown