Expert-Driven Query Design
- Expert-Driven Query Design is a paradigm that integrates domain expertise with algorithmic query synthesis to enhance expressivity and reliability in complex data environments.
- It leverages diverse modalities such as natural language, table sketches, and provenance inputs to accurately capture and refine expert intent in query construction.
- Empirical evaluations show that expert-guided systems can substantially reduce errors and improve query performance through interactive expansion and human-in-the-loop validation.
Expert-Driven Query Design refers to a class of query specification, construction, and evaluation paradigms in which domain experts play a central role in shaping, refining, or validating queries—often in complex, high-stakes data environments where standard, non-expert-driven methods yield suboptimal expressivity, coverage, or interpretability. This approach bridges expert intent, diverse input modalities (natural language, examples, sketches, or provenance), and algorithmic synthesis or expansion, yielding more robust, precise, and trustworthy information retrieval, database querying, or learning workflows.
1. Principles and Models of Expert-Driven Query Design
Expert-driven query design assumes that domain users (e.g., clinicians, scientific information specialists, data analysts) bring unique intent, domain knowledge, and interaction patterns that must be captured directly by the system. The paradigms that instantiate this approach fall into several families:
- Expert-in-the-loop NL→SQL Generation: Users articulate queries in natural language, with LLMs translating intent into executable SQL, incorporating human validation and editing before execution (Assor et al., 11 Sep 2025).
- Dual-Modal Specification: Users describe intention through a combination of natural language and lightweight formalism such as table sketches or programming-by-example (PBE), increasing both expressivity and disambiguation power (Baik et al., 2020).
- Provenance-Driven Query Learning: Users supply example outputs and further refine intent by directly identifying input tuples (“causes”), yielding concise, unique conjunctive queries via provenance semiring frameworks (Deutch et al., 2016).
- Interactive Query Expansion for Professional Search: Specialists iteratively refine structured Boolean queries with expansion recommendations, informed by linguistic context, embeddings, and ontologies, with the final scope under user control (Russell-Rose et al., 2021).
- Active Expert Query Selection: In sequential decision or imitation learning, the system actively queries experts for feedback only at points of maximal informativeness, guided by statistical or geometric criteria (e.g., conformal prediction) (Firouzkouhi et al., 29 Nov 2025).
- Evaluation Protocols for Expert-Finding: The construction of evaluative query sets—whether via short, generic topic labels or richer, document-derived queries—substantially affects benchmark outcome, interpretability, and real-world relevance (Brochier et al., 2018).
A defining attribute is the interplay between algorithmic components and user-driven inputs/explanations, shaping both specification and evaluation.
2. Methodologies: Input Modalities, Synthesis, Expansion, and Evaluation
Expert-driven query design supports multiple input modalities and follows synthesis algorithms tailored to maximize fidelity to expert intention, efficiency, and outcome quality.
Input Modalities
- Natural Language: Leveraged in LLM-driven NL→SQL systems, sometimes augmented by explicit explanations or feedback cycles (Assor et al., 11 Sep 2025).
- Table Sketches / PBE: Users provide partial output tables or selection sketches; systems synthesize queries constrained to these examples (Baik et al., 2020).
- Intuitive Provenance: Users annotate example outputs with specific input tuples considered causally responsible, which the system converts to formal provenance semirings for query inference (Deutch et al., 2016).
- Boolean Clause Structure: Professional searchers edit structured queries, expanding clauses with recommendations controlled at the sub-expression level (Russell-Rose et al., 2021).
Query Synthesis and Expansion
- Guided Partial Enumeration: Neural models suggest likely query fragments, which are incrementally verified using user constraints (e.g., matching example tuples), enabling efficient best-first synthesis even in large SQL search spaces, with early pruning on partial queries (Baik et al., 2020).
- Provenance-Guided Inference: Efficient algorithms (bipartite matching, incremental merging) compute conjunctive queries that minimally generalize the set of input-to-output explanation pairs, reducing hypothesis space and ambiguity (Deutch et al., 2016).
- Interactive Expansion: Sub-expression level expansion suggestions are generated using distributional LLMs and ontologies; system transparency is maintained by displaying provenance for each term, and interaction remains under expert control (Russell-Rose et al., 2021).
- Active Query Selection: Expert labeling budgets are optimized using statistical thresholds (nearest-neighbor distances, conformal quantiles) on visited data states, ensuring coverage and query efficiency in imitation learning settings (Firouzkouhi et al., 29 Nov 2025).
Query Set and Evaluation Protocols
- Topic-Query versus Document-Query: In expert finding, evaluation is sensitive to the nature of the query sets—short generic topic queries versus realistic document-derived queries—impacting observed algorithmic performance, standard deviation, and robustness (Brochier et al., 2018).
3. Empirical Results and Comparative Performance
Results across paradigms confirm the practical advantages and subtle tradeoffs introduced by expert-driven query design.
| System/Study | Paradigm | Performance Highlights |
|---|---|---|
| Duoquest (Baik et al., 2020) | Dual NLQ + Table Sketch | +62.5% over NL-to-SQL only; Top-1 acc. doubles on Spider |
| Query-By-Provenance (Deutch et al., 2016) | Provenance-guided CQ inference | 2–4 examples suffice for most tasks; >90% recall w/5 ex. |
| ParcoursVis (Assor et al., 11 Sep 2025) | LLM NL→SQL + human-in-the-loop | 1 edit/query needed; faster than traditional widgets |
| Query Expansion (Russell-Rose et al., 2021) | Term expansion in Boolean search | Strict pipelining F₁=0.086; recommendations inline |
| Expert Set Construction (Brochier et al., 2018) | Topic- vs. document-query eval. | Document-query: lower inter-topic std, more robust |
| CRSAIL (Firouzkouhi et al., 29 Nov 2025) | Selective expert query in AIL | Up to 96% fewer queries, expert-level RL performance |
Informed by these results, system designers are guided toward hybrid, multi-modal query collection and evaluation strategies, as well as fine-grained, interactive controls for expert users.
4. Algorithmic and System Design Patterns
Enabling expert-driven query design requires several architectural and algorithmic considerations:
- Fusion of Modalities: Combining natural language, structured sketches, provenance explanations, and dynamic widgets (sliders, checkboxes) allows for simultaneous expressivity and control, as exemplified by LLM-powered systems that parse intent and re-expose filtering widgets as needed (Assor et al., 11 Sep 2025).
- Human-In-The-Loop Editing: Expert inspection and modification of system-generated queries (SQL or otherwise) is mandatory for transparency and correctness; explanations are paired with code, surface mismatches, and assist rapid diagnosis and correction (Assor et al., 11 Sep 2025).
- Partial Query Verification and Pruning: Enumerators use neural confidence scores and staged SQL verification to prune infeasible candidates early, improving tractability for complex queries (Baik et al., 2020).
- Context-Sensitive Recommendation: Term expansion suggestions are scoped at the Boolean clause level and labeled with provenance (MeSH, word2vec, etc.), supporting both precision-oriented and recall-oriented workflows (Russell-Rose et al., 2021).
- Active Query Budgeting: In learning settings, active expert query selection is grounded in state-novelty metrics, with a global query threshold calibrated via conformal quantiles to control expected budget utilization (Firouzkouhi et al., 29 Nov 2025).
- Evaluation Reporting: Both intra-topic (within-topic) and inter-topic (across topics) variability must be reported, especially for expert finding, to reflect true robustness (Brochier et al., 2018).
5. Comparative Effects of Query Set Construction and Evaluation Protocols
The characteristics of the queries used in evaluation—in terms of richness, length, and provenance—affect all downstream measurements of system quality:
- Short, Topic-Label Queries: Classical evaluations use brief descriptors as queries, which may bias systems toward models optimized for semantic propagation or keyword matching in limited contexts (Brochier et al., 2018). These are unrepresentative of real user search behavior.
- Document-Derived (Rich) Queries: Using full abstracts or document text as queries, sampled from the expert's own output, reveals intra-topic variability and supports more realistic benchmarking. Performance metrics drop in absolute terms but become more robust and representative; voting models may outperform propagation-based models as query length increases (Brochier et al., 2018).
- Variance Reporting: It is crucial to decompose and report performance variability both across queries (intra-topic) and across topics (inter-topic) to fully characterize algorithmic stability.
6. Best Practices and Guidelines for Expert-Driven Query Frameworks
Synthesis of results yields several actionable recommendations:
- Multiple, Diverse Query Types Per Topic: Use a suite of refined queries per topic to expose variability and bias resistance (Brochier et al., 2018).
- Flexible Modality Fusion: Allow users to mix natural language, examples, table sketches, and provenance explanations; select query construction paradigm according to task type (Baik et al., 2020, Deutch et al., 2016).
- Transparency and Editable Output: All system-generated queries must be presented alongside natural language or provenance explanations, with direct editability (Assor et al., 11 Sep 2025).
- Contextualized Recommendation: Scope expansion and suggestions to sub-expressions; integrate domain ontologies for multi-word terms, and embeddings for unigrams (Russell-Rose et al., 2021).
- Active Expert Query Control: In sequential labelling or imitation learning, apply data-driven, calibrated thresholds for state selection, and prioritize budget efficiency (Firouzkouhi et al., 29 Nov 2025).
- Comprehensive Evaluation Reporting: Always report performance across both query variability and topic variability to characterize system robustness (Brochier et al., 2018).
- Curation and Validation: Where automatic query sampling is used, supplement with human-in-the-loop curation to avoid inclusion of off-topic or misleading exemplars (Brochier et al., 2018).
- Precision–Recall Controls: Implement tunable precision vs. recall settings in term expansion frameworks; allow experts to choose conservative vs. expansive strategies as needed (Russell-Rose et al., 2021).
7. Limitations and Open Directions
Despite its strengths, the expert-driven query paradigm entails several constraints:
- Dependency on Expert Proficiency: Efficacy diminishes if experts lack intuitive grasp of input-output causality (provenance) or cannot interpret or correct generated queries (Deutch et al., 2016, Assor et al., 11 Sep 2025).
- Modality and Domain Constraints: NL→SQL and dual-specification systems may falter on complex queries involving aggregation, nested logic, negation, or highly heterogeneous schemas (Baik et al., 2020, Deutch et al., 2016).
- Scalability: While partial enumeration and interactive pruning mitigate search complexity, extremely large schema spaces or example sets still pose performance challenges (Baik et al., 2020).
- Reliance on Model Quality: LLM-generated queries are susceptible to hallucination and schema-mapping errors, even when paired with explanations; curation and expert oversight remain necessary (Assor et al., 11 Sep 2025).
- Precision–Recall Trade-Offs: No single expansion or suggestion framework dominates universally; strict (precision-oriented) and loose (recall-oriented) controls must be offered, and rigorous evaluation protocols are needed (Russell-Rose et al., 2021).
- Limited Support for Complex Queries: Some frameworks do not yet support aggregates, recursion, set-valued results, or fine-grained provenance beyond select–project–join logic (Deutch et al., 2016).
A plausible implication is that further research must focus on richer schema-awareness, automated suggestion ranking via expert feedback logs, and support for advanced query constructs.
Expert-driven query design thus represents a multi-modal, user-centered paradigm unifying advances in human-computer interaction, formal query synthesis, algorithmic learning, and robustness evaluation. Its importance spans expert finding, information retrieval, data analytics, and sequential decision settings, with a central concern for capturing, operationalizing, and validating domain expertise at every stage of the query lifecycle (Brochier et al., 2018, Deutch et al., 2016, Assor et al., 11 Sep 2025, Firouzkouhi et al., 29 Nov 2025, Baik et al., 2020, Russell-Rose et al., 2021).