Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Data Collection and Labeling Techniques for Machine Learning (2407.12793v1)

Published 19 Jun 2024 in cs.DB, cs.AI, and cs.LG

Abstract: Data collection and labeling are critical bottlenecks in the deployment of machine learning applications. With the increasing complexity and diversity of applications, the need for efficient and scalable data collection and labeling techniques has become paramount. This paper provides a review of the state-of-the-art methods in data collection, data labeling, and the improvement of existing data and models. By integrating perspectives from both the machine learning and data management communities, we aim to provide a holistic view of the current landscape and identify future research directions.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Qianyu Huang (2 papers)
  2. Tongfang Zhao (2 papers)
Citations (1)