Query Recommendation in Video Search

Updated 22 July 2025

The paper presents a novel trie-guided LLM framework that effectively constrains query generation to align with real user behavior.
Query recommendation in video search is defined as an automated process of suggesting refined search phrases based on video content and user history.
Methodologies integrate embedding-based retrieval with LLM generation and logits filtering to enhance literal quality and boost engagement metrics.

Query recommendation in video-related search refers to the automated process of suggesting related search queries—typically textual phrases—alongside or directly beneath displayed video content, aiming to help users refine, extend, or pivot their information seeking within short-video or large-scale video platforms. This capability has become a core feature of major video-centric applications, driven by the dual needs of enhancing user experience and facilitating efficient content discovery in repositories containing vast and ever-expanding video collections.

1. Foundations and Functional Objectives

Query recommendation in video-related search is fundamentally positioned at the intersection of information retrieval, recommender systems, and multimedia content analysis. Its primary objective is to bridge the gap between a user's often ambiguous information needs and the underlying, richly structured but semantically challenging video corpora. Unlike conventional search engines that rely solely on user-supplied queries, video-related search leverages contextual cues from both the video currently being viewed (item context) and the user's navigation history. The recommended queries are typically intended to:

Disambiguate user intent, especially when initial queries or video descriptions are vague.
Accelerate exploration by suggesting relevant, trending, or semantically adjacent queries.
Encourage continued engagement and session longevity by surfacing diverse facets of the video collection.

For short-video platforms such as Kuaishou or TikTok, this has emerged as the "item-to-query" (I2Q) recommendation scenario, wherein the system suggests queries in direct association with a currently displayed video, enabling "related search" at the point of consumption (Shao et al., 21 Jul 2025).

2. Technical Frameworks and Methodologies

Traditional Embedding-Based Retrieval

Early methods for matching videos with queries have relied on computing semantic similarity between vector embeddings of video items (e.g., derived from captions, ASR, or deep visual features) and candidate queries. For example, approaches such as SimCSE or BGE embed both sentences into a common space and retrieve the top-k nearest queries given a video context (Shao et al., 21 Jul 2025). While these methods offer strong baseline performance, they exhibit several limitations:

Lack of fine-grained control over literal quality (formal correctness, avoidance of typos or misinformation).
Insufficient alignment between semantic facets of video content and user intent, particularly for recommendations that require nuanced linguistic or contextual cues.
Cold-start issues when novel videos or queries lack enough historical data.

LLM-Guided Query Generation with Trie Constraints

Recent advances have introduced LLMs into the query generation process, substantially enhancing the semantic and literal quality of recommended queries. The GREAT framework (Shao et al., 21 Jul 2025) exemplifies this transition. Its core methodology tightly integrates high-quality, behaviorally validated queries—those with high exposure and click-through rates—into a trie structure. This trie, constructed from real user behavior logs, serves two purposes:

Training guidance: During fine-tuning, the trie is used in an auxiliary Next Token in Trie Prediction (NTTP) task. Here, the LLM is explicitly penalized if its next-token prediction diverges from the set of tokens valid according to the trie at each generation step. Formally:

$L_{\text{NTTP}} = \sum_{t_j \in \text{Trie}(s_{i-1})} \log(p_{t_j})$

and the total loss combines the standard Language Modeling loss ( $L_\text{NTP}$ ) with the weighted NTTP loss:

$L = L_\text{NTP} + \alpha \cdot L_\text{NTTP}$

Inference-time constraint: At each decoding step, the LLM's candidate tokens are dynamically restricted to those present in the trie at that position, effectively preventing the generation of low-quality or spurious queries.

Post-Processing for Literal Quality Assurance

To further enhance output quality, a Logits Filter is applied to generated queries post hoc. Two scores are computed:

Global Quality Score:

$F_G(q) = \frac{1}{n} \sum_{i=1}^n p_{t_i}$

Local Quality Score:

$F_L(q) = \min(p_{t_0}, p_{t_1}, \ldots, p_{t_n})$

where $q = t_0, t_1, \ldots, t_n$ is the generated query and $p_{t_i}$ is the model probability of token $t_i$ . Only queries exceeding pre-specified global ( $\theta_G$ ) and local ( $\theta_L$ ) thresholds are retained for presentation to users (Shao et al., 21 Jul 2025).

3. Data Collection, Datasets, and Evaluation

The development and evaluation of query recommendation algorithms for video-related search have been notably constrained by the absence of large-scale, domain-specific datasets. The introduction of the KuaiRS dataset (Shao et al., 21 Jul 2025) addresses this gap:

Sourced from Kuaishou, a platform with >400 million daily active users.
Contains 1.02 million real video–query pairs.
Each data point aggregates video captions (user-uploaded), OCR-extracted cover text, and user-clicked queries.
Quality control measures include filter thresholds for exposure and clicks, semantic similarity checks (MBVR thresholded at 0.44), and manual curation to remove queries with errors or rumors.

For evaluation, classical retrieval metrics such as Edit@k (average edit distance between generated queries and ground truth) are used, alongside human evaluations of both semantic relevance and literal quality. Online A/B experiments provide measures such as exposure rate, click-through rate (CTR), and CTR on the search results page, with GREAT achieving improvements of +0.251% in exposure, +0.174% in CTR, and +0.396% in results page CTR over strong baselines (Shao et al., 21 Jul 2025).

4. Addressing Core Challenges in Query Recommendation

Exposure, Relevance, and Literal Quality

One principal challenge is ensuring that recommended queries are both semantically relevant and of high literal quality (i.e., correct, concise phrasing without errors). Standard retrieval- or generation-only approaches frequently exhibit trade-offs: improving exposure or CTR at the expense of literal correctness, or vice versa. The tightly integrated trie-guided generation of GREAT overcomes this by:

Forcing the LLM to generate only queries that match high-quality, high-engagement user queries.
Regularly updating the trie (e.g., with a 15-day window) to reflect trending and timely user interests.
Applying the logits filter, which avoids presenting queries with tokens assigned low confidence by the model, thereby filtering out outputs with typos or rumors.

Cold Start and User Intent Alignment

By grounding recommendations in real, frequently-used queries (via the trie), GREAT addresses cold-start issues, ensuring that new videos are matched with queries that not only reflect their content but also align with actual user search patterns. The dialogue-style prompt used for LLM inference (e.g., incorporating “User: …” video context and “Assistant: …” for generation) improves the alignment with user intent (Shao et al., 21 Jul 2025).

Efficiency, Maintainability, and Adaptation

Trie-based decoding efficiently prunes the LLM's search space during generation, resulting in faster, safer query suggestion. The modular design of GREAT allows for independent updates to the underlying query pool or video representation encoders, ensuring maintainability and easy adaptation to changing user behaviors or platform content.

5. Empirical Results and Comparative Performance

Extensive evaluation demonstrates that the GREAT framework outperforms both retrieval-based and LLM-only generation baselines in query recommendation accuracy and user satisfaction:

Edit@1 and Edit@20 scores are lower (for Edit@1: 4.34 vs. 4.48–4.79; for Edit@20: 5.80 vs. up to 6.10), indicating closer matches to ground-truth queries.
Online deployment on Kuaishou yields measurable increases in exposure and CTR on related search entrypoints.
Human evaluation confirms increases in both query relevance (+5.5% to +7.5% over baselines) and literal quality (up to +6.0%).

These results are further supported by ablation studies that show cumulative effects of the NTTP loss, trie-guided decoding, and logits filter modules.

6. Limitations, Open Challenges, and Future Directions

Several open challenges persist in query recommendation for video-related search:

Balancing multiple objectives: Simultaneously optimizing for literal quality, semantic effectiveness, and exposure remains non-trivial. Multi-objective optimization strategies are suggested as future work (Shao et al., 21 Jul 2025).
Dynamic adaptation: The query-based trie must be updated frequently to reflect shifts in query popularity and trends in user interests.
Modal expansion: While GREAT primarily focuses on text, future frameworks may benefit from incorporating additional video/modal information (visual concept extraction, cross-modal embeddings) to further enhance query relevance.
More sophisticated reinforcement learning or other adaptive strategies may be investigated to directly optimize business metrics or user satisfaction.

Key Advances

GREAT represents a significant methodological advance by (1) leveraging behaviorally anchored trie-guided LLM generation, (2) introducing dedicated training and inference constraints that enforce both naturalness and quality, and (3) demonstrating effectiveness through both large-scale, real-world datasets and online production experiments (Shao et al., 21 Jul 2025).

7. Summary Table: GREAT System Components and Functions

Component	Description	Role in Recommendation
Query-based Trie	Tree of high-quality queries (tokens as nodes)	Guides candidate tokens during generation
NTTP Loss	Penalizes deviation from trie-approved tokens in training	Aligns model with curated user queries
Trie-based Decoding	At inference, restricts tokens to trie children	Prevents hallucinations, preserves quality
Logits Filter	Post-process based on token/global query probabilities	Removes low-confidence/poor-quality queries
Prompt Construction	Formats video context into a dialogue-style input to the LLM	Improves model alignment with video content
Daily Trie Update	Maintains trie freshness over a time window	Adapts recommendations to current trends

8. Conclusion

Query recommendation in video-related search has evolved into a sophisticated, multi-stage process combining LLMs, strong behavioral signals from usage logs, and formally constrained decoding strategies. The GREAT framework, as instantiated and validated on real-world industrial data, provides robust evidence that integrating trie-anchored query pools and LLM generation, supported by targeted training/inference constraints and post-generation filtering, can significantly enhance the relevance, quality, and effectiveness of query suggestions. This approach addresses chronic limitations of prior embedding-only or unconstrained generation approaches and opens pathways for further research into multi-objective and dynamic query recommendation systems tailored to the needs of massive, rapidly evolving video search platforms (Shao et al., 21 Jul 2025).

PDF Markdown Chat (Pro)

References (1)

GREAT: Guiding Query Generation with a Trie for Recommending Related Search about Video at Kuaishou (2025)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Query Recommendation in Video-Related Search.