Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
140 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Replication-Robust Data Markets (2310.06000v3)

Published 9 Oct 2023 in econ.GN, cs.GT, and q-fin.EC

Abstract: Despite widespread adoption of machine learning throughout industry, many firms face a common challenge: relevant datasets are typically distributed amongst market competitors that are reluctant to share information. Recent works propose data markets to provide monetary incentives for collaborative machine learning, where agents share features with each other and are rewarded based on their contribution to improving the predictions others. These contributions are determined by their relative Shapley value, which is computed by treating features as players and their interactions as a characteristic function game. However, in its standard form, this setup further provides an incentive for agents to replicate their data and act under multiple false identities in order to increase their own revenue and diminish that of others, restricting their use in practice. In this work, we develop a replication-robust data market for supervised learning problems. We adopt Pearl's do-calculus from causal reasoning to refine the characteristic function game by differentiating between observational and interventional conditional probabilities. By doing this, we derive Shapley value-based rewards that are robust to this malicious replication by design, whilst preserving desirable market properties.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Explaining individual predictions when features are dependent: More accurate approximations to shapley values. Artificial Intelligence, 298:103502, 2021. ISSN 0004-3702.
  2. Too much data: Prices and inefficiencies in data markets. American Economic Journal: Microeconomics, 14(4):218–256, 2022.
  3. A marketplace for data: An algorithmic solution. In Proceedings of the 2019 ACM Conference on Economics and Computation, pages 701–726, 2019.
  4. Priced oblivious transfer: How to sell digital goods. In International Conference on the Theory and Applications of Cryptographic Techniques, pages 119–135. Springer, 2001.
  5. Markets for information: An introduction. Annual Review of Economics, 11(1):85–107, 2019.
  6. True to the model or true to the data?, 2020.
  7. Explaining a series of models by propagating shapley values. Nature Communications, 13(1):4512, 2022.
  8. Explaining by removing: A unified framework for model explanation. The Journal of Machine Learning Research, 22(1):9477–9566, 2021.
  9. Truthful linear regression. In Peter Grünwald, Elad Hazan, and Satyen Kale, editors, Proceedings of the 28th Conference on Learning Theory, pages 448–483, Paris, France, 2015.
  10. Incentive compatible regression learning. Journal of Computer and System Sciences, 76(8):759–777, 2010.
  11. The wind integration national dataset (wind) toolkit. Applied Energy, 151:355–366, 2015.
  12. Value theory without efficiency. Mathematics of Operations Research, 6(1):122–128, 1981.
  13. Shapley explainability on the data manifold, 2020.
  14. Data shapley: Equitable valuation of data for machine learning. In Proceedings of the 36th International Conference on Machine Learning, pages 2242–2251, 09–15 Jun 2019.
  15. Replication robust payoff allocation in submodular cooperative games. IEEE Transactions on Artificial Intelligence, 4(5):1114–1128, 2023.
  16. Causal shapley values: Exploiting causal knowledge to explain individual predictions of complex models. Advances in Neural Information Processing Systems, 33:4778–4789, 2020.
  17. Feature relevance quantification in explainable ai: A causal problem. In International Conference on Artificial Intelligence and Statistics, pages 2907–2916. PMLR, 2020.
  18. Problems with shapley-value-based explanations as feature importance measures. In International Conference on Machine Learning, pages 5491–5500, 2020.
  19. Ehud Lehrer. An axiomatization of the banzhaf value. International Journal of Game Theory, 17:89–99, 1988.
  20. Jinfei Liu. Absolute shapley value, 2020.
  21. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30, 2017.
  22. Generalized integrated gradients: A practical method for explaining diverse ensembles, 2019.
  23. Collaborative machine learning markets with data-replication-robust payments, 2019. URL https://arxiv.org/abs/1911.09052.
  24. On shapley value for measuring importance of dependent inputs. SIAM/ASA Journal on Uncertainty Quantification, 5(1):986–1002, 2017.
  25. Judea Pearl. The do-calculus revisited. In Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, UAI’12, page 3–11, Arlington, Virginia, USA, 2012. AUAI Press. ISBN 9780974903989.
  26. Pierre Pinson. Very-short-term probabilistic forecasting of wind power with generalized logit–normal distributions. Journal of the Royal Statistical Society: Series C (Applied Statistics), 61(4):555–576, 2012.
  27. Regression markets and application to energy forecasting. TOP, 30(3):533–573, 2022.
  28. An upper bound on the bayesian error bars for generalized linear regression. In Mathematics of Neural Networks: Models, Algorithms and Applications, pages 295–299. Springer, 1997.
  29. Lloyd S Shapley. A value for n-person games. Classics in Game Theory, 69, 1997.
  30. The many shapley values for model explanation. In International Conference on Machine Learning, pages 9269–9278, 2020.
  31. Probabilistic forecasts of wind power generation accounting for geographically dispersed information. IEEE Transactions on Smart Grid, 5(1):480–489, 2013.
  32. Manifold restricted interventional shapley values. In International Conference on Artificial Intelligence and Statistics, pages 5079–5106. PMLR, 2023.

Summary

We haven't generated a summary for this paper yet.