Dice Question Streamline Icon: https://streamlinehq.com

Context selection for LLM input within HQDL

Determine a principled method for selecting which additional relational database attributes, beyond the minimal primary-key/foreign-key keys and predefined value lists, should be provided as context to the large language model in the HQDL pipeline when generating missing values for hybrid queries.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper introduces SWAN, a benchmark of beyond-database questions, and HQDL, a preliminary solution that generates missing columns or tables using LLMs and then executes hybrid SQL queries that join the generated data with the relational database.

HQDL currently supplies only minimal keys and predefined value lists in prompts to the LLM. The authors note that other attributes in the database may be relevant to improve generation accuracy and reliability, but how to choose such contextual attributes remains unresolved.

References

There are other attributes inside the relational database that may be relevant and it remains an open question on how to select the best context.

Hybrid Querying Over Relational Databases and Large Language Models (2408.00884 - Zhao et al., 1 Aug 2024) in Section 4.3, Limitations of HQDL