Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AIQL: Enabling Efficient Attack Investigation from System Monitoring Data (1806.02290v2)

Published 6 Jun 2018 in cs.CR and cs.DB

Abstract: The need for countering Advanced Persistent Threat (APT) attacks has led to the solutions that ubiquitously monitor system activities in each host, and perform timely attack investigation over the monitoring data for analyzing attack provenance. However, existing query systems based on relational databases and graph databases lack language constructs to express key properties of major attack behaviors, and often execute queries inefficiently since their semantics-agnostic design cannot exploit the properties of system monitoring data to speed up query execution. To address this problem, we propose a novel query system built on top of existing monitoring tools and databases, which is designed with novel types of optimizations to support timely attack investigation. Our system provides (1) domain-specific data model and storage for scaling the storage, (2) a domain-specific query language, Attack Investigation Query Language (AIQL) that integrates critical primitives for attack investigation, and (3) an optimized query engine based on the characteristics of the data and the semantics of the queries to efficiently schedule the query execution. We deployed our system in NEC Labs America comprising 150 hosts and evaluated it using 857 GB of real system monitoring data (containing 2.5 billion events). Our evaluations on a real-world APT attack and a broad set of attack behaviors show that our system surpasses existing systems in both efficiency (124x over PostgreSQL, 157x over Neo4j, and 16x over Greenplum) and conciseness (SQL, Neo4j Cypher, and Splunk SPL contain at least 2.4x more constraints than AIQL).

Citations (76)

Summary

  • The paper introduces a domain-specific language and data model tailored for efficiently expressing multi-step APT attack behaviors.
  • The paper presents an optimized query engine that leverages spatial and temporal data to execute queries up to 124 times faster than PostgreSQL and 157 times faster than Neo4j.
  • The paper validates Aiql with real-world evaluations on 857 GB of monitoring data across 150 hosts, demonstrating robust scalability for large datasets.

Aiql: Enabling Efficient Attack Investigation from System Monitoring Data

The paper presents Aiql, an innovative query system designed to facilitate timely and efficient investigation of Advanced Persistent Threat (APT) attacks using system monitoring data. Traditional databases like relational and graph databases often fall short in expressing and executing queries that capture complex attack behaviors due to their semantics-agnostic design. Aiql addresses these limitations by providing a domain-specific language and an optimized query engine, built on top of existing system monitoring tools, to enhance both the expressiveness and execution efficiency of APT attack investigations.

Core Contributions

The paper introduces three main contributions that distinguish Aiql from existing systems:

  1. Domain-Specific Language and Model: Aiql proposes a new query language specifically crafted for attack investigations. This language includes constructs to express critical patterns and relationships typical in attack scenarios, like multi-step attacks, dependency tracking, and anomaly detection. The language is supported by a domain-specific data model that leverages the inherent spatial and temporal characteristics of system monitoring data.
  2. Optimized Query Engine: The system incorporates an advanced query engine tailored to the specifics of attack investigation. It includes unique scheduling and optimization strategies that utilize the data’s spatial and temporal dimensions to enhance execution speed.
  3. Real-World Evaluation: Deployed across 150 hosts, Aiql was evaluated on 857 GB of real monitoring data, demonstrating substantial performance improvements over existing systems. The evaluations indicate that Aiql queries are not only more concise but also execute 124 times faster than PostgreSQL and 157 times faster than Neo4j for complex queries, affirming its capability to handle large datasets efficiently.

Methodological Innovations

Aiql introduces several methodological innovations:

  • Structuring Queries Around System Behaviors: Unlike conventional query languages, Aiql allows for the explicit specification of system behaviors using a subject-operation-object format. This is further extended through attributes and temporal relationships, enabling a more natural and concise expression of attack steps.
  • Relation-Based Query Scheduling: This optimization approach assigns a pruning score based on the number of constraints within an event pattern, guiding the execution to maximize efficiency by executing higher pruning power queries first.
  • Parallel and Distributed Execution: By partitioning data based on spatial and temporal attributes, Aiql efficiently scales on massively parallel processing (MPP) databases like Greenplum, achieving up to 16 times speedup over standard relational scheduling strategies.

Implications and Future Directions

The development and testing of Aiql underscore its practical significance for contemporary cybersecurity challenges. Its capability to process large volumes of data quickly and its precision in specifying complex attack patterns make it an invaluable tool for security analysts. The system's scalability suggests potential for accommodating even larger enterprise environments.

Looking forward, further refinement could focus on enhancing the pruning heuristic by incorporating data-driven statistical models. Additionally, expanding the model to include more types of system entities or more granular data could further improve the system's overall efficacy and scalability.

In conclusion, Aiql represents a significant advancement in cybersecurity analytics, providing a robust framework for timely and efficient threat investigations. Its deployment in real-world scenarios sets a foundation for future improvements and applications, particularly in large-scale system monitoring and forensics.