Query-Driven Text Summarization

Updated 29 August 2025

QDTS is a technique that generates concise summaries tailored to specific user queries by focusing on query-relevant content from diverse sources.
It utilizes both extractive methods (e.g., graph-based ranking, clustering) and abstractive models (e.g., dynamic query attention, pointer-generator networks) to optimize relevance.
Current research addresses challenges like data scarcity, complex reasoning, and evaluation gaps to enhance real-time, multi-modal summarization deployments.

Query-Driven Text Summarization (QDTS) refers to automated methods that generate concise summaries of textual (or, more broadly, multi-modal) data tailored specifically to a user-specified query. Unlike generic summarization—which aims to capture the overall salient content of documents—QDTS seeks to surface only those aspects of the data that directly address the user’s information need, providing targeted, high-utility condensation for search, QA, recommendation, and analytic systems.

1. Taxonomies and Task Formulation

QDTS methods are classified along several dimensions:

Summary type: Extractive (selecting existing text units) vs. Abstractive (generating novel sentences) (Yu et al., 2022).
Learning paradigm: Supervised (learning from annotated data) vs. Unsupervised (rule-based, clustering, or graph-based approaches).
Data scope: Single-document (SDS), multi-document (MDS), and, recently, multi-table and cross-modal summarization (Zhang et al., 8 May 2024, Zhao et al., 2023).
Query form: Keyword, natural language question, aspect, or hybrid forms (e.g., hierarchical section metadata (Zhu et al., 2019)).
Language modality: Recent research extends QDTS beyond English to Chinese (Zhao et al., 2020), as well as to structured (tabular) data (Zhao et al., 2023, Zhang et al., 8 May 2024).

The target of QDTS is to learn a mapping $(q, D) \rightarrow S$ , where $q$ is a query, $D$ is the document(s) (or table(s)), and $S$ is a summary such that $S$ is maximally relevant and faithful to $q$ and $D$ . For table-based QDTS, this extends to: $q, \mathcal{T} \rightarrow s$ where $\mathcal{T}$ is a set of tables (Zhang et al., 8 May 2024).

2. Key Methodological Paradigms

Extractive QDTS

Early QDTS models focused on query-aware selection of sentences or segments based on statistical or graph-theoretic relevance:

Graph-based ranking: Models such as TextRank (Thakkar et al., 2013) or hypergraph-based methods (Lierde et al., 2019) quantify sentence centrality and joint topical coverage. The latter formulate sentence selection as a hypergraph transversal problem (minimum-length set of sentences covering all query-relevant “themes”), leveraging submodular optimization for efficient greedy approximations and improved redundancy reduction compared to pairwise graph approaches.
Clustering: Query-directed clustering (QDC) filters documents/sentences by computing a normalized Google Distance (NGD) between document clusters and the query (Thakkar et al., 2013).
Pooling over pre-trained representation: Extractive methods built with BERT-based modules can implement query-focused pooling, weighting each sentence’s representation by its alignment to the query embedding (Zhu et al., 2019, Zhao et al., 2020).

Abstractive QDTS

Neural encoder-decoder architectures, building on seq2seq and attention mechanisms, dominate recent abstractive QDTS:

Dynamic query attention: Rather than conditioning on a static query embedding, models revisit query tokens at every decoding timestep via an attention layer, dynamically modulating generation based on query-word relevance (Nema et al., 2017). The attention mechanism employs a parameterized score at time $t$ : $a_{t,i}^q = v_q^{\top}\tanh(W_q s_t + U_q h_i^q)$ , with normalized weights $\alpha_{t,i}^q$ and a dynamic query vector $q_t$ entering both the context vector computation and the decoder input.
Pointer-generator networks: Hybrid models allow both novel word generation and direct copying from the source, critical for retaining precise query answers, named entities, and key facts (Hasselqvist et al., 2017). The generator/copy switch is trained jointly, with loss supervised for both pointer usage and attention focus.
Latent query models and plug-in queries: LaQSum (Xu et al., 2021) infers a distribution over “query relevance” for each document token, training with variational objectives and sequence tagging losses to discover content most likely implicated in human summaries. At inference, latent query posteriors can be updated (“plug-and-play”) using any form of user query without retraining.
Prefix-merging for few-shot QDTS: This method fuses prefix encodings learned from large text summarization and QA datasets; the merged prefix is then optimized for the QDTS downstream task using only a small set of target examples, efficiently leveraging knowledge transfer when annotated QDTS data are scarce (Yuan et al., 2022).

Template- and Reasoning-Augmented QDTS

Recent work in QDTS for structured data (e.g., tables) and multi-modal sources employs template-based or chain-of-thought prompting to explicitly extract and inject intermediate logical facts (e.g., via ReFactor (Zhao et al., 2023) or reason-then-summarize prompting (Zhang et al., 8 May 2024)). This helps overcome failures of neuro-generative models in multi-hop and cross-table reasoning.

3. Datasets and Benchmarking

The development of QDTS benchmarks is pivotal for progress:

Query-driven textual benchmarks: DUC 2005/2006/2007 are historical standards for MDS QDTS in English (Yu et al., 2022). Debatepedia and variants thereof are widely used for SDS QDTS (Nema et al., 2017).
Augmented/automatic datasets: WikiRef (Zhu et al., 2019) and AQuaMuSe (Kulkarni et al., 2020) generate QDTS examples automatically using Wikipedia references or question-answering resources, scaling to hundreds of thousands of query-document-summary tuples and addressing the bottleneck of manual annotation.
Chinese QDTS: QBSUM (Zhao et al., 2020), built from QQ Browser logs, is the first large-scale Chinese dataset (>49K samples), offering new benchmarking opportunities and reflecting real-world user queries.
Tables and cross-modal: QTSumm (Zhao et al., 2023) (over 7K query-table-summary instances) and QFMTS (Zhang et al., 8 May 2024) (4.9K multi-table, multi-query pairs) push into structured data summarization, where queries drive aggregation across heterogeneous sources.
Technical and interactive datasets: QuOTeS (Ramirez-Orta et al., 2023) introduces human-labeled, query-focused technical summarization sets, obtained via interactive user labeling.

4. Evaluation, Human Factors, and System Deployment

Evaluation metrics: ROUGE-N, ROUGE-L, BLEU, METEOR, PARENT (for table faithfulness), and GSB (Good vs. Same vs. Bad) are employed across works; however, there is an acknowledged gap between automated metrics and human evaluative criteria—especially regarding factual reliability, coverage, and alignment with compact or oral reference styles (Yu et al., 2022, Yang et al., 2023).

User studies and interaction: High-recall, interactive protocols (e.g., continuous active learning in QuOTeS (Ramirez-Orta et al., 2023)) empower users to refine summaries iteratively, demonstrating improvements in relevance and user satisfaction compared to batch inference or static extraction.

Production deployments: Modern industrial deployments must satisfy web-scale, real-time constraints. Recent frameworks integrate:

Knowledge distillation: Large LLMs (e.g., ERNIE-Lite-8K) are distilled into fast, lightweight models (e.g., “Hamburger”, a 0.1B parameter variant) (Xiong et al., 28 Aug 2025).
Direct preference optimization (DPO): Online A/B testing feeds real user click data directly into loss optimization, aligning summary generation with actual user preferences and boosting online engagement, search satisfaction, and click-through rate (Xiong et al., 28 Aug 2025).
Inference optimization: FP8 quantization and lookahead decoding (with speculative candidate generation and verification) enable sub-60 ms per-query latency at scales of ~50,000 queries/sec on modern GPU clusters, actualizing QDTS for industrial web platforms.

5. Current Challenges and Research Directions

Data paucity and generalization: The scarcity of large, high-quality, and language-diverse query-driven datasets remains pressing, spurring a shift toward self-supervised, automatic generation (e.g., WikiRef, AQuaMuSe), parameter-efficient adaptation (prefix-merging), and plug-and-play query architectures (Zhu et al., 2019, Yuan et al., 2022, Xu et al., 2021).
Complex reasoning and structure: QDTS over multi-document and multi-table collections demands models capable of explicit reasoning (joins, aggregation), hybridizing chain-of-thought prompting, explicit fact extraction, and neural generation (Zhao et al., 2023, Zhang et al., 8 May 2024).
Evaluation gaps: Human evaluation often uncovers unfaithful, overly verbose, or misaligned outputs even when text-based metrics are high. There is an ongoing need for table/data-specific metrics, hallucination detection, and explainable evaluation pipelines (Zhao et al., 2023).
Repetition, redundancy, and diversity: Abstractive models, unless equipped with diversity-promoting attention or coverage penalties, tend to repeat phrases or fail in joint topic coverage. Orthogonalization-based constraints (e.g., $d_t' = d_t - ((d_t^\top d'_{t-1}) / (d_{t-1}'^\top d_{t-1}')) d_{t-1}'$ ) directly address this (Nema et al., 2017, Yu et al., 2022).
Prompting, prefix-based adaptation, and interpretability: Leading research explores prefix-pretrained architectures for few-shot QDTS, chain-of-thought prompting for structured data, and integrated visualizations/attention analyses to interpret how models fuse QA and summarization signals (Yuan et al., 2022, Zhao et al., 2023).

6. Representative Innovations and Thematic Advances

System/Paper	Key Innovation	Notable Contribution
(Nema et al., 2017)	Query and diversity-driven attention	Prevents repetition, dynamic query encoding
(Lierde et al., 2019)	Hypergraph transversal for extraction	Joint theme coverage, 6%+ gains in ROUGE-SU4
(Xu et al., 2021)	Latent query model, plug-in adaptation	Zero-shot QDTS with active latent query calibration
(Yuan et al., 2022)	Prefix-merging, Fisher-aware adaptation	Few-shot learning via multi-task prefix configuration
(Zhao et al., 2023)	Fact extraction (ReFactor) for tables	Explicit reasoning/aggregation for query-focused tabular
(Xiong et al., 28 Aug 2025)	Distilled gen.model, DPO, lookahead	Web-scale, real-time QDTS with state-of-the-art metrics

These innovations directly target bottlenecks in redundancy, semantic relevance, scale, and deployment.

QDTS has evolved from early extractive and clustering-based systems into a multi-modal, model-agnostic field marked by flexible neural architectures, parameter- and data-efficient adaptation strategies, and increasing attention to structured data, large-scale deployment, and explainable evaluation. Ongoing research centers on bridging data and annotation gaps, fusing fine-grained reasoning with neural generation, and developing robust, human-aligned systems for real-world, query-centric applications.