Two-Level Queries Model
- Two-Level Queries Model is a framework that organizes queries into hierarchically related high-level groups and low-level items, ensuring consistent and interpretable computations.
- It employs formal techniques such as projection, de-projection, and consistency constraints to translate high-level aggregations into detailed low-level operations.
- Practical applications include optimizing database queries, enhancing machine learning explainability, and improving dynamic information retrieval with measurable performance gains.
A two-level queries model organizes query or explanation tasks into two hierarchically related layers, typically reflecting structure in the data, the tasks, or the semantics of the application domain. Such models appear throughout data management and machine learning, supporting increased expressiveness, clarity, and efficiency. They formalize the relationship and mutual constraints between different abstraction levels—such as high-level groups and low-level items, or structured sets and sequences—enabling consistent, interpretable, and optimizable computation and analytics.
1. Theoretical Foundations of Two-Level Queries
At its core, a two-level queries model distinguishes between two orthogonal or hierarchically related types of query or explanation:
- The first (higher) level targets groups, classes, sets, or abstractions (e.g., data subsets, high-level features, intents).
- The second (lower) level targets elements, members, or internal structure within those groups (e.g., individual records, sub-features, item sequences).
This framework arises in several paradigms:
- Relational and Concept-Oriented Models: In the Concept-Oriented Model (CoM) (0801.0131), nested ordered sets model both inclusion (hierarchies) and ordering (multi-dimensional relationships), enabling queries both at the group (concept) level and the instance level, with explicit operations for moving "up" (projection) and "down" (de-projection) the abstraction hierarchy.
- Logic and Deductive Databases: Query-subquery networks (QSQN) (Nguyen et al., 2012) directly instantiate a two-level model via explicit representation of queries (top-level goals) and subqueries (generated by rule bodies); all evaluation flows between these levels.
- Machine Learning Explainability: In hierarchical feature attribution models, such as C2FA (Yoshikawa et al., 23 May 2024), attributions are assigned simultaneously to high-level features (e.g., groups) and their low-level constituents, with mathematical constraints enforcing consistency between levels.
This dual-level structure sharply contrasts with "flat" models, where all items are treated uniformly, or "monolithic" approaches, where query/explanation mixes levels without structural awareness.
2. Formal Definitions and Algebraic Characterizations
Two-level models are usually accompanied by rigorous formalism:
Nested Ordered Sets (CoM)
An element in a nested labelled ordered set is represented as: where the are child elements (hierarchical structure/inclusion) and are references to super-elements along labelled dimensions (multi-dimensional/ordering).
Projection/De-projection (for moving across levels):
Surrogate Models and Consistency in ML Attribution
For nested inputs (groups with sub-features), feature attribution explanations are modeled as:
with the consistency property: ensuring explanations at two levels align (Yoshikawa et al., 23 May 2024).
Query-Subquery Nets
Each "subquery" is explicitly represented as a pair
and query evaluation propagates data and substitutions from queries to subqueries and back (Nguyen et al., 2012).
3. Architectural Realizations and Algorithms
Data Management Systems
- Partitioned Indexes: Two-level spatial indexing schemes (Tsitsigkos et al., 2020) split space into top-level partitions (tiles), then classify objects within each tile into secondary classes. This supports efficient, parallel, and duplicate-free spatial range queries.
- Multi-Model Data Query Engines: In Multi-SQL (Yan et al., 2020), query execution is partitioned into a Multi-SQL layer (logical/global optimization) and sub-queries delegated to native storage engines, which execute physical/local optimization; this two-level implementation maximizes the reuse of underlying engine capabilities.
Machine Learning and Explainability
- Hierarchical Feature Attribution: C2FA (Yoshikawa et al., 23 May 2024) fits surrogate models at both group and sub-feature level, with optimization enforcing the consistency property.
- Structured Sequence Generation: In RoomFormer (Yue et al., 2022), a transformer decoder outputs a set of sequences (polygon queries for rooms, each a sequence of corner positions), leveraging two-level queries for variable-size, variable-length prediction in structured output.
Information Retrieval
- Dynamic Rankings: Two-level dynamic ranking models (Raman et al., 2011) first display a diversified set of head results, then, as the user interacts with a result, dynamically expand with a sub-ranking tailored to the inferred intent—formally modeled via utility functions parameterized by both diversity and depth metrics.
4. Consistency and Rewriting Between Levels
A distinguishing property of two-level models is the formal consistency constraint that links the outputs or semantics at both levels:
- In feature attribution, the sum of low-level attributions within a group equals the group’s high-level attribution (Yoshikawa et al., 23 May 2024).
- In data models, projection and de-projection operations are compositional, enabling constraints, filters, or aggregations at one level to propagate or be summarized at the other (0801.0131, 0901.2224).
- For analytic (group-by) queries, a two-level model (e.g., in the context model (Spyratos, 2023)—data unavailable for details) would require that aggregation at the group level is defined by systematic summarization of measurements at the member level and that queries can be reformulated at either layer.
Such consistency enables query and explanation rewriting at both levels, often enabling optimization (e.g., propagating constraints down for filtering before aggregation, or merging surrogate fits for attribution).
5. Practical Applications and Empirical Results
| Domain/Task | Two-Level Approach | Quantitative Gains |
|---|---|---|
| Image & Text Feature Attribution (Yoshikawa et al., 23 May 2024) | Joint HiFA/LoFA with consistency property | ↑ NDCG/AUROC, ↓ error |
| Floorplan Reconstruction (Yue et al., 2022) | Polygon-level (room) queries + corner-level vertex queries | Room F1: 95.3% vs 73.4% |
| Dynamic Retrieval (Raman et al., 2011) | Diversified heads + personalized subrankings | Outperforms all static IR |
| SQL Synthesis (Baik et al., 2020) | NLQ + table sketch dual input, guided PQE | 62.5%↑ over NLQ baseline |
| Multi-model DB (Yan et al., 2020) | Multi-SQL logic + native engine optimization | Query speed gains |
Practical effects include human-intuitive explanations (at multiple granularities), scaling to large/complex data with parallel query processors (Tsitsigkos et al., 2020), and enablement of more expressive, sound, and efficient information retrieval systems.
6. Future Directions and Open Problems
Many research avenues remain in the optimization, expressivity, and generalization of two-level queries models:
- Extending consistency constraints to more complex or unbounded hierarchies.
- Designing query languages and data models with first-class two-level (or multi-level) operators, supporting compositional navigation and aggregation.
- Developing efficient, general algorithms for constrained multi-level attribution with strong theoretical guarantees.
- Exploring query rewriting and determinacy in richer logic/data systems (e.g., implicit-to-explicit synthesis for nested data (Benedikt et al., 2022)).
- Investigating how user specification at multiple levels (e.g., dual NLQ/PBE in Duoquest (Baik et al., 2020)) can improve system performance and usability.
A plausible implication is that, as data and tasks become more semantically structured and applications demand more transparent explanation and optimization, two-level and, more generally, multi-level query models will become central constructs in database, AI, and information retrieval systems.