Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 194 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 36 tok/s Pro
GPT-5 High 36 tok/s Pro
GPT-4o 106 tok/s Pro
Kimi K2 183 tok/s Pro
GPT OSS 120B 458 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Query Refinement for Diverse Top-$k$ Selection (2403.17786v2)

Published 26 Mar 2024 in cs.DB

Abstract: Database queries are often used to select and rank items as decision support for many applications. As automated decision-making tools become more prevalent, there is a growing recognition of the need to diversify their outcomes. In this paper, we define and study the problem of modifying the selection conditions of an ORDER BY query so that the result of the modified query closely fits some user-defined notion of diversity while simultaneously maintaining the intent of the original query. We show the hardness of this problem and propose a Mixed Integer Linear Programming (MILP) based solution. We further present optimizations designed to enhance the scalability and applicability of the solution in real-life scenarios. We investigate the performance characteristics of our algorithm and show its efficiency and the usefulness of our optimizations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Designing Fair Ranking Schemes. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, Peter A. Boncz, Stefan Manegold, Anastasia Ailamaki, Amol Deshpande, and Tim Kraska (Eds.). ACM, 1259–1276. https://doi.org/10.1145/3299869.3300079
  2. Ricardo Baeza-Yates. 2018. Bias on the web. Commun. ACM 61, 6 (2018), 54–61. https://doi.org/10.1145/3209581
  3. Analyzing data-centric applications: Why, what-if, and how-to. In 32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, May 16-20, 2016. IEEE Computer Society, 779–790. https://doi.org/10.1109/ICDE.2016.7498289
  4. Improving package recommendations through query relaxation. In Proceedings of the First International Workshop on Bringing the Value of ”Big Data” to Users, Data4U@VLDB 2014, Hangzhou, China, September 1, 2014, Rada Chirkova and Jun Yang (Eds.). ACM, 13. https://doi.org/10.1145/2658840.2658843
  5. Package queries: efficient and scalable computation of high-order constraints. VLDB J. 27, 5 (2018), 693–718. https://doi.org/10.1007/s00778-017-0483-4
  6. PackageBuilder: From Tuples to Packages. CoRR abs/1507.00942 (2015). arXiv:1507.00942 http://arxiv.org/abs/1507.00942
  7. Interventions for ranking in the presence of implicit bias. In FAT* ’20: Conference on Fairness, Accountability, and Transparency, Barcelona, Spain, January 27-30, 2020, Mireille Hildebrandt, Carlos Castillo, L. Elisa Celis, Salvatore Ruggieri, Linnet Taylor, and Gabriela Zanfir-Fortuna (Eds.). ACM, 369–380. https://doi.org/10.1145/3351095.3372858
  8. Ranking with Fairness Constraints. In 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, July 9-13, 2018, Prague, Czech Republic (LIPIcs, Vol. 107), Ioannis Chatzigiannakis, Christos Kaklamanis, Dániel Marx, and Donald Sannella (Eds.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 28:1–28:15. https://doi.org/10.4230/LIPIcs.ICALP.2018.28
  9. Abraham Charnes and William W Cooper. 1962. Programming with linear fractional functionals. Naval Research Logistics Quarterly 9, 3-4 (1962), 181–186.
  10. Why Not Yet: Fixing a Top-k Ranking that Is Not Fair to Individuals. Proc. VLDB Endow. 16, 9 (2023), 2377–2390. https://www.vldb.org/pvldb/vol16/p2377-chen.pdf
  11. Wesley W. Chu and Qiming Chen. 1994. A structured approach for cooperative query answering. IEEE Transactions on Knowledge and Data Engineering 6, 5 (1994), 738–749.
  12. Ting Deng and Wenfei Fan. 2014. On the Complexity of Query Result Diversification. ACM Trans. Database Syst. 39, 2 (2014), 15:1–15:46. https://doi.org/10.1145/2602136
  13. Caravan: Provisioning for What-If Analysis. In Sixth Biennial Conference on Innovative Data Systems Research, CIDR 2013, Asilomar, CA, USA, January 6-9, 2013, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2013/Papers/CIDR13_Paper100.pdf
  14. A Provenance Framework for Data-Dependent Process Analysis. Proc. VLDB Endow. 7, 6 (2014), 457–468. https://doi.org/10.14778/2732279.2732283
  15. Comparing Top k Lists. SIAM J. Discret. Math. 17, 1 (2003), 134–160. https://doi.org/10.1137/S0895480102412856
  16. Fairness-Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search. In SIGKDD. ACM.
  17. Sreenivas Gollapudi and Aneesh Sharma. 2009. An axiomatic approach for result diversification. In Proceedings of the 18th International Conference on World Wide Web, WWW 2009, Madrid, Spain, April 20-24, 2009, Juan Quemada, Gonzalo León, Yoëlle S. Maarek, and Wolfgang Nejdl (Eds.). ACM, 381–390. https://doi.org/10.1145/1526709.1526761
  18. Satisfying Complex Top-k Fairness Constraints by Preference Substitutions. Proc. VLDB Endow. 16, 2 (2022), 317–329. https://www.vldb.org/pvldb/vol16/p317-roy.pdf
  19. Richard M. Karp. 1972. Reducibility Among Combinatorial Problems. In Proceedings of a symposium on the Complexity of Computer Computations, held March 20-22, 1972, at the IBM Thomas J. Watson Research Center, Yorktown Heights, New York, USA (The IBM Research Symposia Series), Raymond E. Miller and James W. Thatcher (Eds.). Plenum Press, New York, 85–103. https://doi.org/10.1007/978-1-4684-2001-2_9
  20. M. G. Kendall. 1938. A New Measure of Rank Correlation. Biometrika 30, 1-2 (06 1938), 81–93. https://doi.org/10.1093/biomet/30.1-2.81 arXiv:https://academic.oup.com/biomet/article-pdf/30/1-2/81/423380/30-1-2-81.pdf
  21. Jon M. Kleinberg and Manish Raghavan. 2018. Selection Problems in the Presence of Implicit Bias. In 9th Innovations in Theoretical Computer Science Conference, ITCS 2018, January 11-14, 2018, Cambridge, MA, USA (LIPIcs, Vol. 94), Anna R. Karlin (Ed.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 33:1–33:17. https://doi.org/10.4230/LIPIcs.ITCS.2018.33
  22. Relaxing Join and Selection Queries. In VLDB.
  23. Counterfactual Fairness. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 4066–4076. https://proceedings.neurips.cc/paper/2017/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.html
  24. Query Refinement for Diversity Constraint Satisfaction. Proceedings of the VLDB Endowment 17, 2 (2023), 106–118.
  25. Erica: Query Refinement for Diversity Constraint Satisfaction. Proc. VLDB Endow. 16, 12 (2023), 4070–4073. https://doi.org/10.14778/3611540.3611623
  26. Scaling Package Queries to a Billion Tuples via Hierarchical Partitioning and Customized Optimization. CoRR abs/2307.02860 (2023). https://doi.org/10.48550/arXiv.2307.02860 arXiv:2307.02860
  27. Alexandra Meliou and Dan Suciu. 2012. Tiresias: the database oracle for how-to queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, Scottsdale, AZ, USA, May 20-24, 2012, K. Selçuk Candan, Yi Chen, Richard T. Snodgrass, Luis Gravano, and Ariel Fuxman (Eds.). ACM, 337–348. https://doi.org/10.1145/2213836.2213875
  28. Chaitanya Mishra and Nick Koudas. 2009. Interactive query refinement. In EDBT 2009, 12th International Conference on Extending Database Technology, Saint Petersburg, Russia, March 24-26, 2009, Proceedings (ACM International Conference Proceeding Series, Vol. 360), Martin L. Kersten, Boris Novikov, Jens Teubner, Vladimir Polutin, and Stefan Manegold (Eds.). ACM, 862–873. https://doi.org/10.1145/1516360.1516459
  29. Bias analysis and mitigation in data-driven tools using provenance. In Proceedings of the 14th International Workshop on the Theory and Practice of Provenance, TaPP 2022, Philadelphia, Pennsylvania, 17 June 2022. ACM, 1:1–1:4. https://doi.org/10.1145/3530800.3534528
  30. Bias analysis and mitigation in data-driven tools using provenance. In Proceedings of the 14th International Workshop on the Theory and Practice of Provenance, TaPP 2022, Philadelphia, Pennsylvania, 17 June 2022, Adriane Chapman, Daniel Deutch, and Tanu Malik (Eds.). ACM, 1:1–1:4. https://doi.org/10.1145/3530800.3534528
  31. Detection of Groups with Biased Representation in Ranking. CoRR abs/2301.00719 (2023). https://doi.org/10.48550/arXiv.2301.00719 arXiv:2301.00719
  32. Ion Muslea and Thomas J Lee. 2005. Online query relaxation via bayesian causal structures discovery. In AAAI. 831–836.
  33. The Synthetic data vault. In IEEE International Conference on Data Science and Advanced Analytics (DSAA). 399–410. https://doi.org/10.1109/DSAA.2016.49
  34. Effectiveness of medical school admissions criteria in predicting residency ranking four years later. Medical education 41, 1 (2007).
  35. Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: an Embeddable Analytical Database. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, Peter A. Boncz, Stefan Manegold, Anastasia Ailamaki, Amol Deshpande, and Tim Kraska (Eds.). ACM, 1981–1984. https://doi.org/10.1145/3299869.3320212
  36. Fairness-Aware Range Queries for Selecting Unbiased Data. In 38th IEEE International Conference on Data Engineering, ICDE 2022, Kuala Lumpur, Malaysia, May 9-12, 2022. IEEE, 1423–1436. https://doi.org/10.1109/ICDE53745.2022.00111
  37. Online Set Selection with Fairness and Diversity Constraints. In Proceedings of the 21st International Conference on Extending Database Technology, EDBT 2018, Vienna, Austria, March 26-29, 2018, Michael H. Böhlen, Reinhard Pichler, Norman May, Erhard Rahm, Shan-Hung Wu, and Katja Hose (Eds.). OpenProceedings.org, 241–252. https://doi.org/10.5441/002/edbt.2018.22
  38. Quoc Trung Tran and Chee-Yong Chan. 2010. How to conquer why-not questions. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 15–26.
  39. Query by output. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. 535–548.
  40. On query result diversification. In Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11-16, 2011, Hannover, Germany, Serge Abiteboul, Klemens Böhm, Christoph Koch, and Kian-Lee Tan (Eds.). IEEE Computer Society, 1163–1174. https://doi.org/10.1109/ICDE.2011.5767846
  41. QFix: Diagnosing Errors through Query Histories. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017, Semih Salihoglu, Wenchao Zhou, Rada Chirkova, Jun Yang, and Dan Suciu (Eds.). ACM, 1369–1384. https://doi.org/10.1145/3035918.3035925
  42. Linda F Wightman. 1998. LSAC National Longitudinal Bar Passage Study. LSAC Research Report Series. (1998).
  43. Balanced Ranking with Diversity Constraints. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, Sarit Kraus (Ed.). ijcai.org, 6035–6042. https://doi.org/10.24963/ijcai.2019/836
  44. Ke Yang and Julia Stoyanovich. 2017. Measuring Fairness in Ranked Outputs. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management, Chicago, IL, USA, June 27-29, 2017. ACM, 22:1–22:6. https://doi.org/10.1145/3085504.3085526
  45. Matching code and law: achieving algorithmic fairness with optimal transport. Data Min. Knowl. Discov. 34, 1 (2020), 163–200. https://doi.org/10.1007/s10618-019-00658-8
  46. Fairness in Ranking, Part I: Score-Based Ranking. ACM Comput. Surv. 55, 6 (2023), 118:1–118:36. https://doi.org/10.1145/3533379
  47. Fairness in Ranking, Part II: Learning-to-Rank and Recommender Systems. ACM Comput. Surv. 55, 6 (2023), 117:1–117:41. https://doi.org/10.1145/3533380
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.