Query2Box: Box Embeddings for KG Queries
- Query2Box is an embedding-based framework that represents queries as axis-aligned boxes to capture set-theoretic operations in knowledge graphs.
- The framework transforms complex EPFO queries into disjunctive normal form, enabling efficient modeling of conjunction, disjunction, and existential quantification.
- Empirical evaluations show that Query2Box outperforms point-based models by up to 25% in Hits@3, demonstrating robust multi-hop and complex query reasoning.
Query2Box is an embedding-based framework for answering arbitrary existential positive first-order (EPFO) logical queries over knowledge graphs (KGs), employing geometric structures—specifically axis-aligned boxes (hyperrectangles)—in vector space to represent both queries and their answer sets. Query2Box enables scalable and expressive reasoning over incomplete KGs by efficiently supporting conjunction (), disjunction (), and existential quantification (), offering considerable advantages in flexibility and empirical performance compared to prior point-based query embeddings (Ren et al., 2020).
1. Fundamental Principles and Motivation
Traditional KG embedding models represent entities and queries as vectors in a latent space. However, a fundamental limitation emerges when modeling complex queries: single-point embeddings inadequately capture the semantics of queries that denote sets of possible answers (i.e., multiple entities). Query2Box addresses this by representing queries as axis-aligned boxes, where the set of points inside the box geometrically encodes the possible answer entities. This enables the natural modeling of set-theoretic operations on queries, such as intersection (for conjunctions) and unions (for disjunctions).
Key principles:
- Entities are represented as vectors (degenerate boxes of zero size).
- Queries correspond to axis-aligned boxes in , parameterized by centers and offsets.
- Logical operations on queries map to geometric operations on boxes, with conjunction as intersection and disjunction as union (over multiple boxes).
- The framework is scalable to large KGs and arbitrary EPFO query structures via disjunctive normal form (DNF) transformation.
2. Box Embeddings: Query and Entity Representation
Each entity is embedded as a vector in . A query is embedded as a box parameterized by center and offset : where denotes element-wise comparison.
Geometric modeling of logical operations:
- Conjunction (): Intersection of boxes, computed via learned soft attention and elementwise minimum over offsets.
- Existential quantification (): Implicitly handled through variable elimination in the computational graph.
- Projection (relation traversal): Implemented as vector addition in box space, i.e., with the relation embedding.
3. Handling Disjunction and DNF Transformation
Directly embedding arbitrary unions of boxes (for disjunctions) is provably intractable unless embedding dimensions grow proportionally to the number of entities. Query2Box addresses this via systematic query normalization:
- Queries are rewritten into disjunctive normal form (DNF): any EPFO query is reformulated as a union of conjunctive subqueries.
- Each conjunctive subquery is embedded as a box as described above.
- The total query embedding is a collection of boxes; an entity is an answer if it lies inside any of the boxes.
Distance-based scoring:
where penalizes both outside and inside distances, and is the number of DNF components.
Training loss:
where is a positive answer, negative samples, and is a margin.
4. Empirical Evaluation and Performance Characteristics
Query2Box is evaluated on large KGs (FB15k, FB15k-237, NELL995), addressing query structures that span paths, intersections, and unions. The key experimental findings include:
- Consistent outperformance of prior point-based models (up to 25% relative improvement in Hits@3), particularly on queries combining intersection and union.
- Robust generalization to complex, multi-hop, and unseen query types.
- Improved learning from complex queries as compared to training solely on simple path queries.
- The attention-based (“soft”) intersection operation yields better accuracy than deterministic intersection.
- Adaptive offset (variable box size) per query type increases accuracy over fixed box size approaches.
Summary Table: Logical Operators in Query2Box
| Logic Operator | Geometric Representation | KG Set-Theoretic Equivalent |
|---|---|---|
| Conjunction () | Intersection of boxes | Set intersection |
| Existential () | Variable elimination/aggregation | Existential quantification |
| Disjunction () | Union via min distance over boxes | Set union (after DNF conversion) |
5. Theoretical Analysis and Scalability
Theoretical contributions of Query2Box include:
- Negative result for direct union embedding: Representing arbitrary disjunctions in low dimensions is impossible (VC-dimension scales with the number of answer sets).
- DNF tractability: Any EPFO query can be represented as a tractable collection of boxes after DNF conversion without requiring an embedding space of prohibitive dimension.
- The framework scales to large numbers of entities and complex query structures, as all geometric operations exploit efficient neural network modules (attention, DeepSets) and are amenable to minibatch optimization.
6. Influence, Extensions, and Relation to Successors
Query2Box has directly influenced follow-up work in set-valued KG embeddings, including Concept2Box (concepts as boxes, entities as points) (Huang et al., 2023) and dual box modeling for Description Logic ontologies (Jackermeier et al., 2023). The Query2Box paradigm has also influenced VQA and vision-language benchmarks for user-referred, box-grounded queries (as seen in Box-QAymo (Etchegaray et al., 1 Jul 2025)), where the explicit mapping from user intent to geometric set specification (bounding box) reflects Query2Box’s set-based semantics.
A plausible implication is that the core box-based geometric formalism supports a range of applications beyond logical querying in KGs, encompassing hierarchical concept modeling, uncertainty quantification (via box volume), and referential reasoning in vision.
7. Limitations and Contemporary Developments
While Query2Box enables tractable and expressive query answering over incomplete KGs, it requires millions of generated queries for training each query type and relies on precise DNF transformation for disjunctive queries. More recent neural link predictor-based frameworks (e.g., CQD (Arakelyan et al., 2020)) have demonstrated comparable or superior performance with orders of magnitude less training data by decomposing complex queries into differentiable objectives and leveraging pre-trained atomic link predictors, while also enhancing explainability. However, Query2Box remains a foundational model for geometrically faithful, set-theoretic query representations in knowledge graphs.
References
- "Query2Box: Reasoning over Knowledge Graphs in Vector Space using Box Embeddings" (Ren et al., 2020)
- "Concept2Box: Joint Geometric Embeddings for Learning Two-View Knowledge Graphs" (Huang et al., 2023)
- "Dual Box Embeddings for the Description Logic EL++" (Jackermeier et al., 2023)
- "Complex Query Answering with Neural Link Predictors" (Arakelyan et al., 2020)
- "Box-QAymo: Box-Referring VQA Dataset for Autonomous Driving" (Etchegaray et al., 1 Jul 2025)