Query Refinement for Diverse Top-$k$ Selection (2403.17786v2)
Abstract: Database queries are often used to select and rank items as decision support for many applications. As automated decision-making tools become more prevalent, there is a growing recognition of the need to diversify their outcomes. In this paper, we define and study the problem of modifying the selection conditions of an ORDER BY query so that the result of the modified query closely fits some user-defined notion of diversity while simultaneously maintaining the intent of the original query. We show the hardness of this problem and propose a Mixed Integer Linear Programming (MILP) based solution. We further present optimizations designed to enhance the scalability and applicability of the solution in real-life scenarios. We investigate the performance characteristics of our algorithm and show its efficiency and the usefulness of our optimizations.
- Designing Fair Ranking Schemes. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, Peter A. Boncz, Stefan Manegold, Anastasia Ailamaki, Amol Deshpande, and Tim Kraska (Eds.). ACM, 1259–1276. https://doi.org/10.1145/3299869.3300079
- Ricardo Baeza-Yates. 2018. Bias on the web. Commun. ACM 61, 6 (2018), 54–61. https://doi.org/10.1145/3209581
- Analyzing data-centric applications: Why, what-if, and how-to. In 32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, May 16-20, 2016. IEEE Computer Society, 779–790. https://doi.org/10.1109/ICDE.2016.7498289
- Improving package recommendations through query relaxation. In Proceedings of the First International Workshop on Bringing the Value of ”Big Data” to Users, Data4U@VLDB 2014, Hangzhou, China, September 1, 2014, Rada Chirkova and Jun Yang (Eds.). ACM, 13. https://doi.org/10.1145/2658840.2658843
- Package queries: efficient and scalable computation of high-order constraints. VLDB J. 27, 5 (2018), 693–718. https://doi.org/10.1007/s00778-017-0483-4
- PackageBuilder: From Tuples to Packages. CoRR abs/1507.00942 (2015). arXiv:1507.00942 http://arxiv.org/abs/1507.00942
- Interventions for ranking in the presence of implicit bias. In FAT* ’20: Conference on Fairness, Accountability, and Transparency, Barcelona, Spain, January 27-30, 2020, Mireille Hildebrandt, Carlos Castillo, L. Elisa Celis, Salvatore Ruggieri, Linnet Taylor, and Gabriela Zanfir-Fortuna (Eds.). ACM, 369–380. https://doi.org/10.1145/3351095.3372858
- Ranking with Fairness Constraints. In 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, July 9-13, 2018, Prague, Czech Republic (LIPIcs, Vol. 107), Ioannis Chatzigiannakis, Christos Kaklamanis, Dániel Marx, and Donald Sannella (Eds.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 28:1–28:15. https://doi.org/10.4230/LIPIcs.ICALP.2018.28
- Abraham Charnes and William W Cooper. 1962. Programming with linear fractional functionals. Naval Research Logistics Quarterly 9, 3-4 (1962), 181–186.
- Why Not Yet: Fixing a Top-k Ranking that Is Not Fair to Individuals. Proc. VLDB Endow. 16, 9 (2023), 2377–2390. https://www.vldb.org/pvldb/vol16/p2377-chen.pdf
- Wesley W. Chu and Qiming Chen. 1994. A structured approach for cooperative query answering. IEEE Transactions on Knowledge and Data Engineering 6, 5 (1994), 738–749.
- Ting Deng and Wenfei Fan. 2014. On the Complexity of Query Result Diversification. ACM Trans. Database Syst. 39, 2 (2014), 15:1–15:46. https://doi.org/10.1145/2602136
- Caravan: Provisioning for What-If Analysis. In Sixth Biennial Conference on Innovative Data Systems Research, CIDR 2013, Asilomar, CA, USA, January 6-9, 2013, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2013/Papers/CIDR13_Paper100.pdf
- A Provenance Framework for Data-Dependent Process Analysis. Proc. VLDB Endow. 7, 6 (2014), 457–468. https://doi.org/10.14778/2732279.2732283
- Comparing Top k Lists. SIAM J. Discret. Math. 17, 1 (2003), 134–160. https://doi.org/10.1137/S0895480102412856
- Fairness-Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search. In SIGKDD. ACM.
- Sreenivas Gollapudi and Aneesh Sharma. 2009. An axiomatic approach for result diversification. In Proceedings of the 18th International Conference on World Wide Web, WWW 2009, Madrid, Spain, April 20-24, 2009, Juan Quemada, Gonzalo León, Yoëlle S. Maarek, and Wolfgang Nejdl (Eds.). ACM, 381–390. https://doi.org/10.1145/1526709.1526761
- Satisfying Complex Top-k Fairness Constraints by Preference Substitutions. Proc. VLDB Endow. 16, 2 (2022), 317–329. https://www.vldb.org/pvldb/vol16/p317-roy.pdf
- Richard M. Karp. 1972. Reducibility Among Combinatorial Problems. In Proceedings of a symposium on the Complexity of Computer Computations, held March 20-22, 1972, at the IBM Thomas J. Watson Research Center, Yorktown Heights, New York, USA (The IBM Research Symposia Series), Raymond E. Miller and James W. Thatcher (Eds.). Plenum Press, New York, 85–103. https://doi.org/10.1007/978-1-4684-2001-2_9
- M. G. Kendall. 1938. A New Measure of Rank Correlation. Biometrika 30, 1-2 (06 1938), 81–93. https://doi.org/10.1093/biomet/30.1-2.81 arXiv:https://academic.oup.com/biomet/article-pdf/30/1-2/81/423380/30-1-2-81.pdf
- Jon M. Kleinberg and Manish Raghavan. 2018. Selection Problems in the Presence of Implicit Bias. In 9th Innovations in Theoretical Computer Science Conference, ITCS 2018, January 11-14, 2018, Cambridge, MA, USA (LIPIcs, Vol. 94), Anna R. Karlin (Ed.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 33:1–33:17. https://doi.org/10.4230/LIPIcs.ITCS.2018.33
- Relaxing Join and Selection Queries. In VLDB.
- Counterfactual Fairness. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 4066–4076. https://proceedings.neurips.cc/paper/2017/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.html
- Query Refinement for Diversity Constraint Satisfaction. Proceedings of the VLDB Endowment 17, 2 (2023), 106–118.
- Erica: Query Refinement for Diversity Constraint Satisfaction. Proc. VLDB Endow. 16, 12 (2023), 4070–4073. https://doi.org/10.14778/3611540.3611623
- Scaling Package Queries to a Billion Tuples via Hierarchical Partitioning and Customized Optimization. CoRR abs/2307.02860 (2023). https://doi.org/10.48550/arXiv.2307.02860 arXiv:2307.02860
- Alexandra Meliou and Dan Suciu. 2012. Tiresias: the database oracle for how-to queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, Scottsdale, AZ, USA, May 20-24, 2012, K. Selçuk Candan, Yi Chen, Richard T. Snodgrass, Luis Gravano, and Ariel Fuxman (Eds.). ACM, 337–348. https://doi.org/10.1145/2213836.2213875
- Chaitanya Mishra and Nick Koudas. 2009. Interactive query refinement. In EDBT 2009, 12th International Conference on Extending Database Technology, Saint Petersburg, Russia, March 24-26, 2009, Proceedings (ACM International Conference Proceeding Series, Vol. 360), Martin L. Kersten, Boris Novikov, Jens Teubner, Vladimir Polutin, and Stefan Manegold (Eds.). ACM, 862–873. https://doi.org/10.1145/1516360.1516459
- Bias analysis and mitigation in data-driven tools using provenance. In Proceedings of the 14th International Workshop on the Theory and Practice of Provenance, TaPP 2022, Philadelphia, Pennsylvania, 17 June 2022. ACM, 1:1–1:4. https://doi.org/10.1145/3530800.3534528
- Bias analysis and mitigation in data-driven tools using provenance. In Proceedings of the 14th International Workshop on the Theory and Practice of Provenance, TaPP 2022, Philadelphia, Pennsylvania, 17 June 2022, Adriane Chapman, Daniel Deutch, and Tanu Malik (Eds.). ACM, 1:1–1:4. https://doi.org/10.1145/3530800.3534528
- Detection of Groups with Biased Representation in Ranking. CoRR abs/2301.00719 (2023). https://doi.org/10.48550/arXiv.2301.00719 arXiv:2301.00719
- Ion Muslea and Thomas J Lee. 2005. Online query relaxation via bayesian causal structures discovery. In AAAI. 831–836.
- The Synthetic data vault. In IEEE International Conference on Data Science and Advanced Analytics (DSAA). 399–410. https://doi.org/10.1109/DSAA.2016.49
- Effectiveness of medical school admissions criteria in predicting residency ranking four years later. Medical education 41, 1 (2007).
- Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: an Embeddable Analytical Database. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, Peter A. Boncz, Stefan Manegold, Anastasia Ailamaki, Amol Deshpande, and Tim Kraska (Eds.). ACM, 1981–1984. https://doi.org/10.1145/3299869.3320212
- Fairness-Aware Range Queries for Selecting Unbiased Data. In 38th IEEE International Conference on Data Engineering, ICDE 2022, Kuala Lumpur, Malaysia, May 9-12, 2022. IEEE, 1423–1436. https://doi.org/10.1109/ICDE53745.2022.00111
- Online Set Selection with Fairness and Diversity Constraints. In Proceedings of the 21st International Conference on Extending Database Technology, EDBT 2018, Vienna, Austria, March 26-29, 2018, Michael H. Böhlen, Reinhard Pichler, Norman May, Erhard Rahm, Shan-Hung Wu, and Katja Hose (Eds.). OpenProceedings.org, 241–252. https://doi.org/10.5441/002/edbt.2018.22
- Quoc Trung Tran and Chee-Yong Chan. 2010. How to conquer why-not questions. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 15–26.
- Query by output. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. 535–548.
- On query result diversification. In Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11-16, 2011, Hannover, Germany, Serge Abiteboul, Klemens Böhm, Christoph Koch, and Kian-Lee Tan (Eds.). IEEE Computer Society, 1163–1174. https://doi.org/10.1109/ICDE.2011.5767846
- QFix: Diagnosing Errors through Query Histories. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017, Semih Salihoglu, Wenchao Zhou, Rada Chirkova, Jun Yang, and Dan Suciu (Eds.). ACM, 1369–1384. https://doi.org/10.1145/3035918.3035925
- Linda F Wightman. 1998. LSAC National Longitudinal Bar Passage Study. LSAC Research Report Series. (1998).
- Balanced Ranking with Diversity Constraints. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, Sarit Kraus (Ed.). ijcai.org, 6035–6042. https://doi.org/10.24963/ijcai.2019/836
- Ke Yang and Julia Stoyanovich. 2017. Measuring Fairness in Ranked Outputs. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management, Chicago, IL, USA, June 27-29, 2017. ACM, 22:1–22:6. https://doi.org/10.1145/3085504.3085526
- Matching code and law: achieving algorithmic fairness with optimal transport. Data Min. Knowl. Discov. 34, 1 (2020), 163–200. https://doi.org/10.1007/s10618-019-00658-8
- Fairness in Ranking, Part I: Score-Based Ranking. ACM Comput. Surv. 55, 6 (2023), 118:1–118:36. https://doi.org/10.1145/3533379
- Fairness in Ranking, Part II: Learning-to-Rank and Recommender Systems. ACM Comput. Surv. 55, 6 (2023), 117:1–117:41. https://doi.org/10.1145/3533380
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.