Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 186 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 34 tok/s Pro
GPT-5 High 32 tok/s Pro
GPT-4o 65 tok/s Pro
Kimi K2 229 tok/s Pro
GPT OSS 120B 441 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

CardRewriter: Query Rewriting Framework

Updated 18 October 2025
  • CardRewriter is an LLM-driven framework that uses multi-source knowledge cards to reformulate long-tail queries on short-video platforms, enhancing relevance and retrieval.
  • It employs a two-stage pipeline that aggregates multi-modal signals and uses dedicated models for knowledge card construction and query rewriting.
  • Deployed at scale on Kuaishou, CardRewriter demonstrates significant improvements in retrieval metrics, user experience, and content matching through tailored training and reward strategies.

CardRewriter is an LLM-driven framework engineered for domain-specific long-tail query rewriting on short-video platforms, featuring the construction of multi-source knowledge cards to guide query reformulation. It directly addresses the challenges posed by the mismatch between user intent and proprietary content retrieval, circumventing limitations in LLM pretraining by incorporating platform-native heterogeneous signals. Since September 2025, CardRewriter has been deployed at scale on Kuaishou, serving hundreds of millions of users, and demonstrating significant improvements in user experience and retrieval metrics (Gong et al., 11 Oct 2025).

1. Architecture and High-Level Workflow

CardRewriter operates in a two-stage pipeline: knowledge card construction and query rewriting, both optimized via dedicated models. Given a user-issued query xx, the system aggregates multi-source platform knowledge MM—including videos, live streams, micro dramas, and external documents—then invokes a card generation model Cth(x,M)\mathcal{C}_{th}(x, M) to summarize MM as a single knowledge card cc. This card cc and original query xx are subsequently input to the rewriting model Gth(x,c)\mathcal{G}_{th}(x, c), yielding a rewritten query yy that serves as the final input to the retrieval engine. The formal process:

y=Gth(x,c),c=Cth(x,M)y = \mathcal{G}_{th}(x, c), \qquad c = \mathcal{C}_{th}(x, M)

This mechanism injects platform-specific signals, enabling better correction of spelling errors, resolution of query ambiguity, and normalization toward retrievable proprietary content.

2. Multi-Source Knowledge Card Construction

The knowledge aggregation step encompasses:

  • Platform Retrieval: Top-kk relevant videos viv_i are gathered via in-platform search.
  • Multi-Modal Extraction: For each video, both visual (vivision={key1,key2,key3}v_i^{vision} = \{key_1, key_2, key_3\}) and textual components (vitext={v_i^{text} = \{title, caption, OCR, author, background music}\}) are extracted.
  • High-Supply Query Expansion: The system retrieves similar queries using Q2Q (rule-based) and EMB (embedding-based) approaches, collecting associated videos for context expansion.
  • Open-Domain Augmentation: Relevant documents are fetched when proprietary data is sparse.

After duplicate elimination, the resultant knowledge set MM is summarized by the card generation model Cth\mathcal{C}_{th} into a compact knowledge card, distilling salient signals, resolving conflicting information, and producing a clean semantic context for rewriting guidance.

3. Two-Stage Training Pipeline

Both the card generation and rewriting models are trained via a staged approach:

A. Supervised Fine-Tuning (SFT):

  • Training data Ds(o)={(x,K,y)}D_{s(o)} = \{(x, K, y)\} is curated from platform search logs, with KK denoting either multi-source knowledge (for the card model) or generated cards (for the rewriting model).
  • Quality filtering uses a relevance judge RRel\mathcal{R}_{Rel} and system preference signals.
  • The SFT loss is standard cross-entropy:

LSFT(θ)=E(x,K,y)Ds(o)[logπSFT(yx,K)]\mathcal{L}_{SFT}(\theta) = -\mathbb{E}_{(x, K, y) \in D_{s(o)}} [\log \pi_{SFT}(y|x, K)]

B. Group Relative Policy Optimization (GRPO):

  • Post-SFT, GRPO applies reinforcement learning. For each query xx in dataset DGRPOD_{GRPO}, the model generates GG rollout trajectories {yi}\{y_i\}.
  • The objective maximizes advantage-weighted probability ratio, penalized by KL divergence from reference policy:

JGRPO(θ)=Ex,K;{yi}{1Gimin[riA^i,clip(ri,1ϵ,1+ϵ)A^i]βKL[πGRPOπref]}J_{GRPO}(\theta) = \mathbb{E}_{x, K; \{y_i\}} \left\{ \frac{1}{G} \sum_i \min\left[ r_i \cdot \hat{A}_i, \text{clip}(r_i, 1-\epsilon, 1+\epsilon) \cdot \hat{A}_i \right] - \beta KL[\pi_{GRPO} \| \pi_{ref}] \right\}

with ri=πGRPO(yix,K)πold(yix,K)r_i = \frac{\pi_{GRPO}(y_i|x,K)}{\pi_{old}(y_i|x,K)}.

4. Tailored Reward System

Training optimization relies on a composite reward ROverall\mathcal{R}_{Overall}, balancing:

  • Semantic Relevance (RRel\mathcal{R}_{Rel}): Binary judge-based scoring for alignment of rewritten queries and knowledge cards to original intent.
  • System-Level Retrieval Effectiveness (RSys\mathcal{R}_{Sys}): Quantifies improvements in retrieval outcomes (e.g., hitrate, clicks).

When immediate system feedback is unavailable, a Bradley-Terry reward model approximates preference probabilities between candidate rewrites:

P(rq+rqx)=exp(RSys(x,rq+))exp(RSys(x,rq+))+exp(RSys(x,rq))P(rq^+ \succ rq^-|x) = \frac{\exp(\mathcal{R}_{Sys}(x, rq^+))}{\exp(\mathcal{R}_{Sys}(x, rq^+)) + \exp(\mathcal{R}_{Sys}(x, rq^-))}

Overall reward is defined piecewise:

  • ROverall=RSys\mathcal{R}_{Overall} = \mathcal{R}_{Sys} if RSys>0\mathcal{R}_{Sys} > 0
  • ROverall=0.1\mathcal{R}_{Overall} = 0.1 if RSys=0\mathcal{R}_{Sys} = 0 and RRel>0\mathcal{R}_{Rel} > 0
  • ROverall=0\mathcal{R}_{Overall} = 0 otherwise

This design ensures that rewriting is not only semantically faithful but also tuned for improved retrieval efficacy.

5. Performance Metrics and Experimental Outcomes

Both offline and online evaluations employ multi-faceted metrics:

Offline:

  • Relevance for knowledge cards (QC-Rel) and rewritten queries (QR-Rel), judged by advanced LLMs (e.g., Qwen3-235B-A22B).
  • Retrieval increment:

Increment=VxVyVxVx\text{Increment} = \frac{|\mathcal{V}_x \cup \mathcal{V}_y| - |\mathcal{V}_x|}{|\mathcal{V}_x|}

  • Hitrate@K: Fraction where ground-truth video is present in top-K results.

Online:

  • Long-View Rate (LVR): Proportion of queries yielding long-form views.
  • Click-Through Rate (CTR): Click ratio per query.
  • Initiative Query Reformulation Rate (IQRR): Percentage of queries users manually reformulate.

Reported results include >85%>85\% QR‑Rel and substantial increases in hitrate. A/B tests yield +1.853% in LVR, +3.729% CTR, and -2.630% IQRR on covered traffic.

6. Deployment Strategy and System Impact

Due to strict latency requirements, CardRewriter adopts a near-line deployment. Targeted queries—those with moderate search volume, ambiguous intent, and low retrieval performance—undergo offline processing. The corresponding knowledge cards and rewritten queries (or pre-fetched video results) are cached in an online key-value store. When such a query occurs in real time, the system serves cached results for immediate response. This architecture facilitates large-scale deployment without compromising latency or relevance.

CardRewriter has tangibly improved query rewriting and retrieval effectiveness on Kuaishou, enhancing user satisfaction and reducing the burden of manual query reformulation. The methodology demonstrates the feasibility of incorporating multi-modal, domain-specific knowledge for robust query rewriting in environments where user intent and content distribution are misaligned with generic LLM pretraining.

7. Technical Significance and Future Directions

CardRewriter’s principal innovation lies in the use of knowledge cards—a distilled, query-relevant summary of platform-specific data—to steer LLM-driven query rewriting. Combined with a principled two-stage training pipeline and a tailored reward design, it achieves strong results for proprietary content retrieval.

A plausible implication is that the approach is extensible beyond short-video platforms to other retrieval-intensive domains where user queries are long-tailed and platform content falls outside conventional LLM coverage. Future work may further refine knowledge aggregation, explore low-latency online rewriting, or integrate real-time user feedback to adapt cards and rewrite policies dynamically.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to CardRewriter.