Is Complex Query Answering Really Complex? (2410.12537v3)

Published 16 Oct 2024 in cs.LG and cs.AI

Abstract: Complex query answering (CQA) on knowledge graphs (KGs) is gaining momentum as a challenging reasoning task. In this paper, we show that the current benchmarks for CQA might not be as complex as we think, as the way they are built distorts our perception of progress in this field. For example, we find that in these benchmarks, most queries (up to 98% for some query types) can be reduced to simpler problems, e.g., link prediction, where only one link needs to be predicted. The performance of state-of-the-art CQA models decreases significantly when such models are evaluated on queries that cannot be reduced to easier types. Thus, we propose a set of more challenging benchmarks composed of queries that require models to reason over multiple hops and better reflect the construction of real-world KGs. In a systematic empirical investigation, the new benchmarks show that current methods leave much to be desired from current CQA methods.

Citations (1)

View on Semantic Scholar

Summary

The paper shows that most complex queries can be simplified to straightforward link prediction tasks using existing training data.
The paper finds that state-of-the-art models perform poorly on genuinely complex multi-hop queries requiring full logical inference.
The paper proposes new benchmarks that eliminate shortcut reasoning, ensuring evaluations truly reflect complex reasoning challenges.

Is Complex Query Answering Really Complex? An Academic Overview

The paper presented explores the domain of Complex Query Answering (CQA) on Knowledge Graphs (KGs), scrutinizing the perceived complexities of current benchmarks used within this domain. It posits that these benchmarks may not be as complex as widely assumed, thus distorting the apparent progress in the field. The authors demonstrate that a substantial proportion of queries, up to 98% for some, can be simplified to more straightforward problems like link prediction, substantially reducing their complexity and overinflating the performance claims of state-of-the-art (SoTA) models.

Key Findings

Query Simplification: It was found that the majority of queries in existing benchmarks could reduce to seeking a single link, making them effectively simple link prediction tasks rather than complex reasoning challenges. This reduction is achieved by leveraging already observed links in the training data.
Performance Analysis: When evaluated on true complex queries that require reasoning across multiple hops, the performance of existing CQA models significantly diminishes. The authors propose new benchmarks that ensure queries require inference beyond direct link prediction, thereby being more representative of the challenges posed by real-world KGs.
Proposed Benchmarks: New sets of more challenging benchmarks are introduced, termed \fbnewH and \nellH, which only include full-inference queries, i.e., queries that cannot be boiled down through training data insights. This approach aims to depict the actual logical reasoning capabilities of models.

Implications and Future Directions

The implications of this research are twofold:

Benchmark Design: The findings suggest a need to reevaluate how benchmarks are constructed and used in assessing AI models for tasks like CQA. Benchmarks should meaningfully challenge models to reason across multiple, non-trivial paths in a KG.
Model Development: The paper encourages developing models capable of genuine logical reasoning, beyond the simpler task of link prediction. The demonstration that more complex reasoning is required in full-inference scenarios opens pathways for innovative neural-symbolic approaches that could blend traditional logic-based methods with ML.

As the paper critiques the existing landscape, it callS for the AI community to align benchmark challenges with real-world complexity. This approach not only reassesses existing models but also stimulates advancements in models truly capable of complex reasoning.

Conclusion

The discussions and propositions outlined in this paper provide a critical reevaluation of the complexity involved in CQA tasks, suggesting a shift towards benchmarks and models that accurately reflect and tackle the reasoning demands synonymous with real-world applications. By ensuring models are tested under truly complex scenarios, the field can make genuine progress, fostering robust solutions that can intelligently navigate and infer conclusions from extensive and interconnected knowledge bases.