Effectiveness of LLMs for Retrieval from Semi-Structured Knowledge Bases

Determine the effectiveness of large language models (LLMs) when applied to retrieval tasks on semi-structured knowledge bases (SKBs) that integrate unstructured textual data (e.g., product descriptions, abstracts) with structured relational information (e.g., entity relations on knowledge graphs). Specifically, assess how well current LLM-driven retrieval systems handle the combined textual and relational requirements inherent in SKBs and characterize their performance and limitations in this setting.

Background

The paper highlights a gap in existing research: most prior work and benchmarks study either purely textual retrieval or structured (SQL/knowledge graph) retrieval, leaving the combined semi-structured setting underexplored. Semi-structured knowledge bases blend unstructured text with structured entity relations, making retrieval tasks more complex than in purely textual or purely relational domains.

LLMs have shown promise in information retrieval over general knowledge bases such as Wikipedia. However, SKBs often involve private or domain-specific data and require reasoning over both textual attributes and relational constraints. The authors therefore frame the question of how effectively LLMs can be applied to retrieval from SKBs as an open challenge motivating the creation of the STaRK benchmark.

References

Nevertheless, it remains an open question how effectively LLMs can be applied to the specific challenge of retrieval from SKBs.

STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases (2404.13207 - Wu et al., 19 Apr 2024) in Section 1 (Introduction), Limitations of prior works