Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DSDL: Data Set Description Language for Bridging Modalities and Tasks in AI Data (2405.18315v1)

Published 28 May 2024 in cs.AI and cs.PL

Abstract: In the era of artificial intelligence, the diversity of data modalities and annotation formats often renders data unusable directly, requiring understanding and format conversion before it can be used by researchers or developers with different needs. To tackle this problem, this article introduces a framework called Dataset Description Language (DSDL) that aims to simplify dataset processing by providing a unified standard for AI datasets. DSDL adheres to the three basic practical principles of generic, portable, and extensible, using a unified standard to express data of different modalities and structures, facilitating the dissemination of AI data, and easily extending to new modalities and tasks. The standardized specifications of DSDL reduce the workload for users in data dissemination, processing, and usage. To further improve user convenience, we provide predefined DSDL templates for various tasks, convert mainstream datasets to comply with DSDL specifications, and provide comprehensive documentation and DSDL tools. These efforts aim to simplify the use of AI data, thereby improving the efficiency of AI development.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Bin Wang (750 papers)
  2. Linke Ouyang (12 papers)
  3. Fan Wu (264 papers)
  4. Wenchang Ning (3 papers)
  5. Xiao Han (127 papers)
  6. Zhiyuan Zhao (55 papers)
  7. Jiahui Peng (7 papers)
  8. Yiying Jiang (1 paper)
  9. Dahua Lin (336 papers)
  10. Conghui He (114 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets