Papers
Topics
Authors
Recent
2000 character limit reached

CHG Shapley: Efficient Data Valuation and Selection towards Trustworthy Machine Learning (2406.11730v3)

Published 17 Jun 2024 in cs.GT and cs.LG

Abstract: Understanding the decision-making process of machine learning models is crucial for ensuring trustworthy machine learning. Data Shapley, a landmark study on data valuation, advances this understanding by assessing the contribution of each datum to model performance. However, the resource-intensive and time-consuming nature of multiple model retraining poses challenges for applying Data Shapley to large datasets. To address this, we propose the CHG (compound of Hardness and Gradient) utility function, which approximates the utility of each data subset on model performance in every training epoch. By deriving the closed-form Shapley value for each data point using the CHG utility function, we reduce the computational complexity to that of a single model retraining, achieving a quadratic improvement over existing marginal contribution-based methods. We further leverage CHG Shapley for real-time data selection, conducting experiments across three settings: standard datasets, label noise datasets, and class imbalance datasets. These experiments demonstrate its effectiveness in identifying high-value and noisy data. By enabling efficient data valuation, CHG Shapley promotes trustworthy model training through a novel data-centric perspective. Our codes are available at https://github.com/caihuaiguang/CHG-Shapley-for-Data-Valuation and https://github.com/caihuaiguang/CHG-Shapley-for-Data-Selection.

Citations (1)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 1 like about this paper.