Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reinforced Approximate Exploratory Data Analysis (2212.06225v1)

Published 12 Dec 2022 in cs.LG, cs.AI, and cs.DB

Abstract: Exploratory data analytics (EDA) is a sequential decision making process where analysts choose subsequent queries that might lead to some interesting insights based on the previous queries and corresponding results. Data processing systems often execute the queries on samples to produce results with low latency. Different downsampling strategy preserves different statistics of the data and have different magnitude of latency reductions. The optimum choice of sampling strategy often depends on the particular context of the analysis flow and the hidden intent of the analyst. In this paper, we are the first to consider the impact of sampling in interactive data exploration settings as they introduce approximation errors. We propose a Deep Reinforcement Learning (DRL) based framework which can optimize the sample selection in order to keep the analysis and insight generation flow intact. Evaluations with 3 real datasets show that our technique can preserve the original insight generation flow while improving the interaction latency, compared to baseline methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Shaddy Garg (5 papers)
  2. Subrata Mitra (20 papers)
  3. Tong Yu (119 papers)
  4. Yash Gadhia (2 papers)
  5. Arjun Kashettiwar (1 paper)
Citations (4)

Summary

We haven't generated a summary for this paper yet.