Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OneProvenance: Efficient Extraction of Dynamic Coarse-Grained Provenance from Database Logs [Technical Report] (2210.14047v2)

Published 25 Oct 2022 in cs.DB

Abstract: Provenance encodes information that connects datasets, their generation workflows, and associated metadata (e.g., who or when executed a query). As such, it is instrumental for a wide range of critical governance applications (e.g., observability and auditing). Unfortunately, in the context of database systems, extracting coarse-grained provenance is a long-standing problem due to the complexity and sheer volume of database workflows. Provenance extraction from query event logs has been recently proposed as favorable because, in principle, can result in meaningful provenance graphs for provenance applications. Current approaches, however, (a) add substantial overhead to the database and provenance extraction workflows and (b)~extract provenance that is noisy, omits query execution dependencies, and is not rich enough for upstream applications. To address these problems, we introduce OneProvenance: an efficient provenance extraction system from query event logs. OneProvenance addresses the unique challenges of log-based extraction by (a)~identifying query execution dependencies through efficient log analysis, (b) extracting provenance through novel event transformations that account for query dependencies, and (c)~introducing effective filtering optimizations. Our thorough experimental analysis shows that OneProvenance can improve extraction by up to ~18X compared to state-of-the-art baselines; our optimizations reduce the extraction noise and optimize performance even further. OneProvenance is deployed at scale by Microsoft Purview and actively supports customer provenance extraction needs (https://bit.ly/3N2JVGF).

Summary

We haven't generated a summary for this paper yet.