Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fantastic Data and How to Query Them (2201.05026v1)

Published 13 Jan 2022 in cs.AI, cs.CV, and cs.DB

Abstract: It is commonly acknowledged that the availability of the huge amount of (training) data is one of the most important factors for many recent advances in AI. However, datasets are often designed for specific tasks in narrow AI sub areas and there is no unified way to manage and access them. This not only creates unnecessary overheads when training or deploying Machine Learning models but also limits the understanding of the data, which is very important for data-centric AI. In this paper, we present our vision about a unified framework for different datasets so that they can be integrated and queried easily, e.g., using standard query languages. We demonstrate this in our ongoing work to create a framework for datasets in Computer Vision and show its advantages in different scenarios. Our demonstration is available at https://vision.semkg.org.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Trung-Kien Tran (7 papers)
  2. Anh Le-Tuan (12 papers)
  3. Manh Nguyen-Duc (7 papers)
  4. Jicheng Yuan (5 papers)
  5. Danh Le-Phuoc (17 papers)
Citations (4)