On the feasibility of semantic query metrics (2503.18214v1)
Abstract: We consider the problem of defining semantic metrics for relational database queries. Informally, a semantic query metric for a query language $L$ is a metric function $\delta:L\times L\to \mathbb{N}$ where $\delta(Q_1, Q_2)$ represents the length of a shortest path between queries $Q_1$ and $Q_2$ in a graph. In this graph, nodes are queries from $L$, and edges connect semantically distinct queries where one query is maximally semantically contained in the other. Since query containment is undecidable for first-order queries, we focus on the simpler language of conjunctive queries. We establish that defining a semantic query metric is impossible even for conjunctive queries. Given this impossibility result, we identify a significant subclass of conjunctive queries where such a metric is feasible, and we establish the computational complexity of calculating distances within this language.