The Shapley Value in Database Management (2401.06234v1)
Abstract: Attribution scores can be applied in data management to quantify the contribution of individual items to conclusions from the data, as part of the explanation of what led to these conclusions. In Artificial Intelligence, Machine Learning, and Data Management, some of the common scores are deployments of the Shapley value, a formula for profit sharing in cooperative game theory. Since its invention in the 1950s, the Shapley value has been used for contribution measurement in many fields, from economics to law, with its latest researched applications in modern machine learning. Recent studies investigated the application of the Shapley value to database management. This article gives an overview of recent results on the computational complexity of the Shapley value for measuring the contribution of tuples to query answers and to the extent of inconsistency with respect to integrity constraints. More specifically, the article highlights lower and upper bounds on the complexity of calculating the Shapley value, either exactly or approximately, as well as solutions for realizing the calculation in practice.
- A. Amarilli. Uniform reliability for unbounded homomorphism-closed graph queries. In ICDT, volume 255 of LIPIcs, pages 14:1–14:17, 2023.
- LearnShapley: Learning to predict rankings of facts contribution based on query logs. In CIKM, pages 4788–4792, 2022.
- The tractability of SHAP-score-based explanations for classification over deterministic and decomposable boolean circuits. In AAAI, pages 6670–6678, 2021.
- On the complexity of SHAP-score-based explanations: Tractability via knowledge compilation and non-approximability results. Journal of Machine Learning Research, 24(63):1–58, 2023.
- Consistent query answers in inconsistent databases. In PODS, pages 68–79. ACM Press, 1999.
- Databases with uncertainty and lineage. VLDB J., 17(2):243–264, 2008.
- L. Bertossi. Database repairs and consistent query answering: Origins and further developments. In D. Suciu, S. Skritek, and C. Koch, editors, PODS, pages 48–58. ACM, 2019.
- L. Bertossi. Repair-based degrees of database inconsistency. In LPNMR, volume 11481 of LNCS, pages 195–209. Springer, 2019.
- L. Bertossi. Specifying and computing causes for query answers in databases via database repairs and repair-programs. Knowl. Inf. Syst., 63(1):199–231, 2021.
- L. Bertossi. Attribution-scores and causal counterfactuals as explanations in artificial intelligence. In Bertossi, L., Xiao, G. (eds.) Reasoning Web. Causality, Explanations and Declarative Knowledge. Springer LNCS 13759, pages 1–23, 2023.
- Causality-based explanation of classification outcomes. In DEEM@SIGMOD, pages 6:1–6:10. ACM, 2020.
- L. Bertossi and B. Salimi. Causes for query answers from databases: Datalog abduction, view-updates, and integrity constraints. Int. J. Approx. Reason., 90:226–252, 2017.
- L. Bertossi and B. Salimi. From causes for database queries to repairs and model-based diagnosis and back. Theory Comput. Syst., 61(1):191–232, 2017.
- P. Buneman and W. Tan. Data provenance: What next? SIGMOD Rec., 47(3):5–16, 2018.
- N. Burkart and M. F. Huber. A survey on the explainability of supervised machine learning. J. Artif. Intell. Res., 70:245–317, 2021.
- Counting database repairs entailing a query: The case of functional dependencies. In PODS, pages 403–412. ACM, 2022.
- H. Chockler and J. Y. Halpern. Responsibility and blame: A structural-model approach. J. Artif. Intell. Res., 22:93–115, 2004.
- Towards consistency-based reliability assessment. In AAMAS, pages 1643–1644. ACM, 2015.
- N. Dalvi and D. Suciu. The dichotomy of probabilistic inference for unions of conjunctive queries. Journal of the ACM (JACM), 59(6):1–87, 2013.
- Probabilistic databases: Diamonds in the dirt. Commun. ACM, 52(7):86–94, 2009.
- A. Darwiche. New advances in compiling CNF to decomposable negation normal form. In Proceedings of ECAI, pages 328–332. Citeseer, 2004.
- ShapGraph: An holistic view of explanations through provenance graphs and Shapley values. In SIGMOD Conference, pages 2373–2376. ACM, 2022.
- Explanations for data repair through Shapley values. In CIKM, pages 362–371. ACM, 2021.
- Computing the Shapley value of facts in query answering. In SIGMOD, pages 1570–1583, 2022.
- Credit distribution in relational scientific databases. Information Systems, 109:102060, 2022.
- Property testing and its connection to learning and approximation. J. ACM, 45(4):653–750, 1998.
- J. Grant and A. Hunter. Measuring inconsistency in knowledgebases. J. Intell. Inf. Syst., 27(2):159–184, 2006.
- J. Grant and A. Hunter. Measuring consistency gain and information loss in stepwise inconsistency resolution. In ECSQARU, volume 6717 of LNCS, pages 362–373. Springer, 2011.
- J. Grant and A. Hunter. Distance-based measures of inconsistency. In ECSQARU, volume 7958 of LNCS, pages 230–241. Springer, 2013.
- J. Grant and A. Hunter. Analysing inconsistent information using distance-based measures. Int. J. Approx. Reasoning, 89:3–26, 2017.
- T. J. Green and V. Tannen. The semiring framework for database provenance. In E. Sallinger, J. V. den Bussche, and F. Geerts, editors, PODS, pages 93–99. ACM, 2017.
- A survey of methods for explaining black box models. ACM Comput. Surv., 51(5):93:1–93:42, 2019.
- J. Y. Halpern. Actual Causality. MIT Press, 2016.
- J. Y. Halpern and J. Pearl. Causes and explanations: A structural-model approach. part i: Causes. British Journal for the Philosophy of Science, 56(4):843–887, 2005.
- J. Y. Halpern and J. Pearl. Causes and explanations: A structural-model approach. part ii: Explanations. British Journal for the Philosophy of Science, 56(4):889–911, 2005.
- A. Hunter and S. Konieczny. Shapley inconsistency values. In KR, pages 249–259. AAAI Press, 2006.
- A. Hunter and S. Konieczny. Measuring inconsistency through minimal inconsistent sets. In KR, pages 358–366. AAAI Press, 2008.
- A. Hunter and S. Konieczny. On the measure of conflicts: Shapley inconsistency values. Artif. Intell., 174(14):1007–1026, 2010.
- Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press, 2015.
- M. Khalil and B. Kimelfeld. The complexity of the Shapley value for regular path queries. arXiv preprint arXiv:2212.07720, 2022.
- Maximizing conjunctive views in deletion propagation. In PODS, pages 187–198. ACM, 2011.
- Quantifying information and contradiction in propositional logic through test actions. In IJCAI, pages 106–111. Morgan Kaufmann, 2003.
- The Shapley value of tuples in query answering. Log. Methods Comput. Sci., 17(3), 2021.
- E. Livshits and B. Kimelfeld. Counting and enumerating (preferred) database repairs. In PODS, pages 289–301. ACM, 2017.
- E. Livshits and B. Kimelfeld. The Shapley value of inconsistency measures for functional dependencies. Log. Methods Comput. Sci., 18(2), 2022.
- Computing optimal repairs for functional dependencies. ACM Trans. Database Syst., 45(1):4: 1–4: 46, 2020.
- Counting subset repairs with functional dependencies. J. Comput. Syst. Sci., 117:154–164, 2021.
- Properties of inconsistency measures for databases. In SIGMOD, pages 1182–1194. ACM, 2021.
- From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell., 2(1):56–67, 2020.
- S. M. Lundberg and S. Lee. A unified approach to interpreting model predictions. In NIPS, pages 4765–4774, 2017.
- Internet economics: The use of Shapley value for ISP settlement. IEEE/ACM Trans. Netw., 18(3):775–787, 2010.
- The complexity of causality and responsibility for query answers and non-answers. Proc. VLDB Endow., 4(1):34–45, 2010.
- Explainable artificial intelligence: a comprehensive review. Artificial Intelligence Review, 55, 11 2021.
- C. Molnar. Interpretable Machine Learning. https://christophm.github.io/interpretable-ml-book/, 2019.
- M. Monet. Solving a special case of the intensional vs extensional conjecture in probabilistic databases. In Proceedings of PODS, pages 149–163, 2020.
- The class of microarray games and the relevance index for genes. Top, 15(2):256–280, 2007.
- R. Narayanam and Y. Narahari. A Shapley value-based approach to discover influential nodes in social networks. IEEE Trans Autom. Sci. Eng., 8(1):130–147, 2011.
- J. Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, 2nd edition, 2009.
- The impact of negation on the complexity of the Shapley value in conjunctive queries. In PODS, pages 285–297. ACM, 2020.
- A. E. Roth, editor. The Shapley value : essays in honor of Lloyd S. Shapley. Cambridge University Press, 1988.
- Quantifying causal effects on query answering in databases. In TaPP. USENIX Association, 2016.
- ProvSQL: Provenance and probability management in PostgreSQL. Proc. VLDB Endow., 11(12):2034–2037, 2018.
- L. S. Shapley. A value for n-person games. In H. W. Kuhn and A. W. Tucker, editors, Contributions to the Theory of Games II, pages 307–317. Princeton University Press, Princeton, 1953.
- P. Struss. Model-based problem solving. In Handbook of Knowledge Representation, 2008.
- Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011.
- M. Thimm. On the compliance of rationality postulates for inconsistency measures: A more or less complete picture. KI, 31(1):31–39, 2017.
- A new approximation method for the Shapley value applied to the WTC 9/11 terrorist attack. Soc. Netw. Anal. Min., 8(1):3:1–3:12, 2018.
- On the tractability of SHAP explanations. J. Artif. Intell. Res., 74:851–886, 2022.
- M. Y. Vardi. The complexity of relational query languages. In STOC, pages 137–146. ACM, 1982.
- Leopoldo Bertossi (57 papers)
- Benny Kimelfeld (57 papers)
- Ester Livshits (15 papers)
- Mikaël Monet (22 papers)