CardRewriter: Query Rewriting Framework

Updated 18 October 2025

CardRewriter is an LLM-driven framework that uses multi-source knowledge cards to reformulate long-tail queries on short-video platforms, enhancing relevance and retrieval.
It employs a two-stage pipeline that aggregates multi-modal signals and uses dedicated models for knowledge card construction and query rewriting.
Deployed at scale on Kuaishou, CardRewriter demonstrates significant improvements in retrieval metrics, user experience, and content matching through tailored training and reward strategies.

CardRewriter is an LLM-driven framework engineered for domain-specific long-tail query rewriting on short-video platforms, featuring the construction of multi-source knowledge cards to guide query reformulation. It directly addresses the challenges posed by the mismatch between user intent and proprietary content retrieval, circumventing limitations in LLM pretraining by incorporating platform-native heterogeneous signals. Since September 2025, CardRewriter has been deployed at scale on Kuaishou, serving hundreds of millions of users, and demonstrating significant improvements in user experience and retrieval metrics (Gong et al., 11 Oct 2025).

1. Architecture and High-Level Workflow

CardRewriter operates in a two-stage pipeline: knowledge card construction and query rewriting, both optimized via dedicated models. Given a user-issued query $x$ , the system aggregates multi-source platform knowledge $M$ —including videos, live streams, micro dramas, and external documents—then invokes a card generation model $\mathcal{C}_{th}(x, M)$ to summarize $M$ as a single knowledge card $c$ . This card $c$ and original query $x$ are subsequently input to the rewriting model $\mathcal{G}_{th}(x, c)$ , yielding a rewritten query $y$ that serves as the final input to the retrieval engine. The formal process:

$y = \mathcal{G}_{th}(x, c), \qquad c = \mathcal{C}_{th}(x, M)$

This mechanism injects platform-specific signals, enabling better correction of spelling errors, resolution of query ambiguity, and normalization toward retrievable proprietary content.

2. Multi-Source Knowledge Card Construction

The knowledge aggregation step encompasses:

Platform Retrieval: Top- $k$ relevant videos $v_i$ are gathered via in-platform search.
Multi-Modal Extraction: For each video, both visual ( $v_i^{vision} = \{key_1, key_2, key_3\}$ ) and textual components ( $v_i^{text} = \{$ title, caption, OCR, author, background music $\}$ ) are extracted.
High-Supply Query Expansion: The system retrieves similar queries using Q2Q (rule-based) and EMB (embedding-based) approaches, collecting associated videos for context expansion.
Open-Domain Augmentation: Relevant documents are fetched when proprietary data is sparse.

After duplicate elimination, the resultant knowledge set $M$ is summarized by the card generation model $\mathcal{C}_{th}$ into a compact knowledge card, distilling salient signals, resolving conflicting information, and producing a clean semantic context for rewriting guidance.

3. Two-Stage Training Pipeline

Both the card generation and rewriting models are trained via a staged approach:

A. Supervised Fine-Tuning (SFT):

Training data $D_{s(o)} = \{(x, K, y)\}$ is curated from platform search logs, with $K$ denoting either multi-source knowledge (for the card model) or generated cards (for the rewriting model).
Quality filtering uses a relevance judge $\mathcal{R}_{Rel}$ and system preference signals.
The SFT loss is standard cross-entropy:

$\mathcal{L}_{SFT}(\theta) = -\mathbb{E}_{(x, K, y) \in D_{s(o)}} [\log \pi_{SFT}(y|x, K)]$

B. Group Relative Policy Optimization (GRPO):

Post-SFT, GRPO applies reinforcement learning. For each query $x$ in dataset $D_{GRPO}$ , the model generates $G$ rollout trajectories $\{y_i\}$ .
The objective maximizes advantage-weighted probability ratio, penalized by KL divergence from reference policy:

$J_{GRPO}(\theta) = \mathbb{E}_{x, K; \{y_i\}} \left\{ \frac{1}{G} \sum_i \min\left[ r_i \cdot \hat{A}_i, \text{clip}(r_i, 1-\epsilon, 1+\epsilon) \cdot \hat{A}_i \right] - \beta KL[\pi_{GRPO} \| \pi_{ref}] \right\}$

with $r_i = \frac{\pi_{GRPO}(y_i|x,K)}{\pi_{old}(y_i|x,K)}$ .

4. Tailored Reward System

Training optimization relies on a composite reward $\mathcal{R}_{Overall}$ , balancing:

Semantic Relevance ( $\mathcal{R}_{Rel}$ ): Binary judge-based scoring for alignment of rewritten queries and knowledge cards to original intent.
System-Level Retrieval Effectiveness ( $\mathcal{R}_{Sys}$ ): Quantifies improvements in retrieval outcomes (e.g., hitrate, clicks).

When immediate system feedback is unavailable, a Bradley-Terry reward model approximates preference probabilities between candidate rewrites:

$P(rq^+ \succ rq^-|x) = \frac{\exp(\mathcal{R}_{Sys}(x, rq^+))}{\exp(\mathcal{R}_{Sys}(x, rq^+)) + \exp(\mathcal{R}_{Sys}(x, rq^-))}$

Overall reward is defined piecewise:

$\mathcal{R}_{Overall} = \mathcal{R}_{Sys}$ if $\mathcal{R}_{Sys} > 0$
$\mathcal{R}_{Overall} = 0.1$ if $\mathcal{R}_{Sys} = 0$ and $\mathcal{R}_{Rel} > 0$
$\mathcal{R}_{Overall} = 0$ otherwise

This design ensures that rewriting is not only semantically faithful but also tuned for improved retrieval efficacy.

5. Performance Metrics and Experimental Outcomes

Both offline and online evaluations employ multi-faceted metrics:

Offline:

Relevance for knowledge cards (QC-Rel) and rewritten queries (QR-Rel), judged by advanced LLMs (e.g., Qwen3-235B-A22B).
Retrieval increment:

$\text{Increment} = \frac{|\mathcal{V}_x \cup \mathcal{V}_y| - |\mathcal{V}_x|}{|\mathcal{V}_x|}$

Hitrate@K: Fraction where ground-truth video is present in top-K results.

Online:

Long-View Rate (LVR): Proportion of queries yielding long-form views.
Click-Through Rate (CTR): Click ratio per query.
Initiative Query Reformulation Rate (IQRR): Percentage of queries users manually reformulate.

Reported results include $>85\%$ QR‑Rel and substantial increases in hitrate. A/B tests yield +1.853% in LVR, +3.729% CTR, and -2.630% IQRR on covered traffic.

6. Deployment Strategy and System Impact

Due to strict latency requirements, CardRewriter adopts a near-line deployment. Targeted queries—those with moderate search volume, ambiguous intent, and low retrieval performance—undergo offline processing. The corresponding knowledge cards and rewritten queries (or pre-fetched video results) are cached in an online key-value store. When such a query occurs in real time, the system serves cached results for immediate response. This architecture facilitates large-scale deployment without compromising latency or relevance.

CardRewriter has tangibly improved query rewriting and retrieval effectiveness on Kuaishou, enhancing user satisfaction and reducing the burden of manual query reformulation. The methodology demonstrates the feasibility of incorporating multi-modal, domain-specific knowledge for robust query rewriting in environments where user intent and content distribution are misaligned with generic LLM pretraining.

7. Technical Significance and Future Directions

CardRewriter’s principal innovation lies in the use of knowledge cards—a distilled, query-relevant summary of platform-specific data—to steer LLM-driven query rewriting. Combined with a principled two-stage training pipeline and a tailored reward design, it achieves strong results for proprietary content retrieval.

A plausible implication is that the approach is extensible beyond short-video platforms to other retrieval-intensive domains where user queries are long-tailed and platform content falls outside conventional LLM coverage. Future work may further refine knowledge aggregation, explore low-latency online rewriting, or integrate real-time user feedback to adapt cards and rewrite policies dynamically.

PDF Markdown Chat (Pro)

References (1)

CardRewriter: Leveraging Knowledge Cards for Long-Tail Query Rewriting on Short-Video Platforms (2025)

Follow Topic

Get notified by email when new papers are published related to CardRewriter.