Cold-Start Item Recommendations

Updated 25 November 2025

Cold-Start Item Recommendations is the task of ranking new items with limited or no user interactions by integrating collaborative and content-based signals.
Hybrid methods combine side-information, meta-learning, and LLM reasoning to generate adaptive embeddings that boost metrics like NDCG and AUC.
Active exploration and adaptive querying strategies further refine recommendations, proving effective in streaming media, e-commerce, and content platforms.

Cold-start item recommendation is the task of inferring personalized rankings or relevance scores for newly introduced items that lack any, or possess very limited, user interaction history. This scenario undermines classical collaborative filtering methods, which depend on user–item co-occurrence. As such, the cold-start challenge has become central in the design of industrial-scale recommendation systems across streaming media, e-commerce, and content platforms. Addressing this problem involves hybridizing collaborative and content-based signals, leveraging explicit side-information, employing meta-learning, using LLMs, and integrating active exploration strategies. The following sections delineate formal models, key methodological advances, prominent learning paradigms, practical deployment considerations, and empirical findings in state-of-the-art research.

1. Formal Models and Problem Definitions

The canonical cold-start item problem is defined over user set $U$ , item set $I$ , and observed interactions $R \subset U \times I$ . Cold-start items $I_\mathrm{cold}$ are those for which $|\{u \mid (u, i) \in R\}| \leq \tau$ , typically $\tau=0$ or $1$. The aim is to learn a function $f: U \times I \to \mathbb{R}$ ranking cold items for users, maximizing metrics such as Hit Rate (HR@ $K$ ), NDCG@ $K$ , or Recall@ $K$ , on test data where cold items have not appeared in training (Pembek et al., 25 Jul 2025, Zhang et al., 2023, Huang et al., 14 Feb 2024).

Two prevalent settings dominate:

Sequential Recommendation: Given a user’s history $S^u = [i_1, ..., i_t]$ and a cold set $I_\mathrm{cold}$ , predict the next item $i_{t+1} \in I_\mathrm{cold}$ (Pembek et al., 25 Jul 2025).
General Ranking: For each $(u, i_\ast) \in$ held-out cold item test pairs, ensure $f(u, i_\ast)$ ranks $i_\ast$ high among a candidate item pool (Li et al., 23 Nov 2025).

In the most challenging setting, no side information is assumed. In practice, side-information vectors $x_i$ (e.g. metadata, text, images) are available for cold items, motivating hybrid approaches (Zhang et al., 2023, Zhao et al., 2022, Kim et al., 22 Apr 2024). The problem extends to streaming and continuous cold-start scenarios, where items may repeatedly enter a cold state due to nonstationarity and temporal sparsity (Bernardi et al., 2015).

2. Content-Driven and Hybrid Embedding Approaches

Embedding-based models represent users and items in a shared $\mathbb{R}^d$ space, scoring relevance via $f(u, i) = u^\top v_i$ . Cold-start solutions in this regime fall into two principal categories:

Content-to-Embedding Mapping: Leverage side-information $x_i$ to synthesize or initialize $v^c_i$ . Conditional variational autoencoders (CVAE/CVAE-like) generate warm-up embeddings $v'_i = \mathrm{Decoder}(\mathrm{PriorEnc}(x_i))$ minimizing reconstruction and alignment losses (Zhao et al., 2022, Zhang et al., 2023). Distributional constraints (e.g., 2-Wasserstein, adversarial) ensure generated $v'_i$ aligns with warm-item embedding distributions (Zhang et al., 2023).
Bounded Adaptation with Content Initialization: Rather than freezing or fully fine-tuning the content-based $v^c_i$ , a bounded-norm delta $d^i$ is learned, yielding $v^i = c^i + d^i$ with $||d^i|| \leq \delta_{max}$ (Pembek et al., 25 Jul 2025). The bound trades off semantic anchoring (from content) with adaptation to collaborative signals. Appropriate norm constraints (e.g., $\delta_{max}$ chosen so $\sqrt{1-\delta_{max}^2} \approx 0.85-0.95$ ) preserve cold-item expressivity while preventing drift.
Multimodal Representation Learning: Modern architectures fuse raw content in multiple modalities (text, image, audio, video) using Transformer encoders, concatenated or cross-attended across modalities, yielding an item embedding $v_j = \mathrm{MLP}([z_j^{(1)} \| ... \| z_j^{(C)}])$ . End-to-end contrastive ranking and inter-modal alignment losses drive semantic consistency and enable scoring of unseen items solely from their content (Kim et al., 22 Apr 2024).

Robust empirical evidence demonstrates that such strategies surpass both pure content-based and naïve embedding-initialization baselines on MovieLens, Yahoo Movies, Taobao, and news domains, with AUC lifts of 4–6 points and NDCG gains up to 40% in the cold-start phase (Zhao et al., 2022, Kim et al., 22 Apr 2024, Zhang et al., 2023, Pembek et al., 25 Jul 2025).

3. Meta-Learning, Prompting, and Active Exploration

Meta-learning and prompt-learning have emerged to tackle extreme few-shot and zero-shot cold item regimes:

Meta-Learning for Sequential/Streaming Environments: Meta-learning frameworks (e.g., Mecos, PAM) structure training as N-way K-shot tasks, extracting transferable knowledge from K support interactions per cold item and learning adaptation strategies that generalize to unseen items. Gradient-based or metric-based meta-learners, as well as task-fixing by item popularity, dynamically reweight behavior-driven and content-driven features via learned gates in different popularity regimes and leverage data augmentation and self-supervision to stabilize cold-task optimization (Zheng et al., 2020, Luo et al., 18 Nov 2024).
Prompt Tuning with Pinnacle Feedback: Rather than using only content prompts, encoding high-value positive feedback (pinnacle feedback) from selected users—those with maximal combined dwell time and engagement—bridges the semantic gap between auxiliary item descriptors and collaborative signals. Personalized prompt networks synthesize this feedback into adaptable item-conditional networks, yielding substantial improvements in click prediction for cold items (Jiang et al., 24 Dec 2024).
Active Learning and Exploration: User selection for cold item queries can be cast as a constrained optimization—balancing high-probability responders, diversity, objectivity, and representativeness—via a joint quadratic program or greedy optimal design. Such querying strategies ensure maximum information gain per feedback budget, allowing rapid model updates and improved population-level predictions (Zhu et al., 2018, Anava et al., 2014).

In modern deployments, item-centric exploration reframes the focus: Bayesian Beta posteriors estimate each item's latent satisfaction; items are stratified by uncertainty and only offered to users whose personal satisfaction prediction exceeds the item’s empirical mean minus two standard deviations, facilitating adaptive exploitation and efficient cold-start corpus growth (Wang et al., 12 Jul 2025).

4. LLMs and Knowledge-Guided Reasoning

LLM-based approaches now lead in zero-/few-shot cold item settings, particularly in domains where item side-information is rich but collaborative overlap is minimal:

Retrieval-Augmented Generation (RAG): Structured knowledge graphs are dynamically constructed from item profiles; LLMs traverse these graphs via multi-hop reasoning, leveraging both semantic and relational evidence to assemble candidate pools and score items in user history–aware prompts. This strategy (e.g., ColdRAG) exceeds both zero-shot and fine-tuned LLM baselines in Recall@10 and NDCG@10 by up to 56% (Yang et al., 27 May 2025).
LLM Interaction Simulation: LLMs are fine-tuned to predict the click potential between user histories and cold item content, using instruction-tuning and a two-stage funnel (embedding-based top- $K$ , then prompt-based filtering) to efficiently match cold items to likely users. The resulting synthetic interactions populate a standard collaborative filtering model, unifying warm and cold item representations. Gains of 20–40% in Recall@20 over prior methods are reported (Huang et al., 14 Feb 2024).
LLM Reasoning with Fine-Tuning: To surpass off-the-shelf LLM chain-of-thought prompting, supervised and reinforcement learning fine-tuning (e.g., SFT, GRPO) are employed. Reasoning paths based on explicit user taste factors (actors, directors, genres) and soft self-consistency across diverse LLM outputs yield improved cold item recall. Fine-tuned models deliver up to 8% gains in recall over Netflix’s production ranker, specifically in discovery of truly novel content (Li et al., 23 Nov 2025).

Empirical evidence confirms that LLMs exploit item semantics in ways classical CF cannot and that fine-tuning on explicit explanation datasets further enhances the alignment between LLM outputs and user preference signals in cold-start settings.

5. Active Elicitation, User Selection, and Region-Based Inference

Rating elicitation and region estimation strategies optimize cold-start recommendations by informing which users/items to query and how to represent ensuing uncertainty:

Personalized Embedding Region Elicitation (PERE): Rather than point-estimating user preference vectors, new users are localized to polyhedral regions in item embedding space through a two-phase active querying process: diverse seed selection via DPP, then information-driven adaptive queries maximizing region reduction (VOI). The Chebyshev center and associated ranking over region centers lead to 3–10% relative NDCG@10 improvements over previous elicitation methods (Nguyen et al., 3 Jun 2024).
Optimal Design: The selection of user raters for a new item is formalized via supermodular minimization of the mean-squared error in latent factor inference. Backward greedy algorithms (BGS1/BGS2) offer theoretical $1-1/e$–style approximation. Empirical Netflix experiments demonstrate superiority over random, clustering, and purely variance-driven heuristics (Anava et al., 2014).
Attribute-Driven Active Learning: Joint optimization of high-response probability, diversity, objectivity, and representativeness—cast as a convex/relaxed quadratic program—enables optimized user allocations per cold item, followed by fine-tuning of factorization machine (FM) predictors. Substantial RMSE and ranking metric improvements, both in simulated and live e-commerce platforms, validate these approaches (Zhu et al., 2018).

6. Extensions: Context, Multimodality, and Industry Considerations

Modern cold-start approaches transcend purely static models:

Context and Nonstationarity: In continuous cold-start (CoCoS) settings, time-dependent item and user factors track evolving preferences, context-aware popularity (clustered by immediate context) offsets temporal drifts, and contextual-bandit or MDP frameworks balance exploration–exploitation in nonstationary domains (Bernardi et al., 2015).
Multimodal and General Representation Learning: Transformer-based models trained end-to-end solely on interaction data, with multi-scale architecture fusion of image, audio, video, and textual features, generalize to new domains and item types. Empirically, late- and cross-modality fusion consistently yield superior OOTB performance versus pre-extracted features or classification-proxy pretraining (Kim et al., 22 Apr 2024).
Integration in Social and Streaming Systems: Hybrid frameworks, such as SocRipple, combine social-graph seeding of fresh items (high-precision distribution amongst creator’s followers) with embedding-based KNN “ripple” expansion to maximize recall among latent-similar users through a scalable two-stage cascade. In video platform deployments, this yields a 120% improvement in Recall@200 for coldest items and a 36% increase in cold-item share with no degradation in engagement rates (Jaspal et al., 10 Aug 2025).

7. Summary Table: Method Families and Performance Contexts

Method Family	Key Component	Empirical Findings / Domains
Content-to-Embedding CVAE	Side-info $\to$ embedding	AUC/NDCG lifts on MovieLens, Taobao, Tencent News
Bounded-Delta Over Init.	Frozen content + small trainable shift	21% NDCG gain on Amazon-M2 cold items, music datasets
Meta/PAM/Prompt Frameworks	Meta-tasks, soft-prompt, feedback enc.	20–99% HR@10 lift across ML, Yelp, Book, streaming
LLM RAG / Simulator / Reasoning	KG, multi-hop, prompt simulation	+20–55% Recall for Amazon, Netflix, movie streaming
Active Elicitation/Design	D-optimal/VOI query, region LP	3–10% NDCG lift, 2–8% RMSE reduction, Netflix, Amazon
SocRipple, Item-centric Expl.	Social/embedding expansion, Beta-Bayes	+36% impressions, +50% user satisfaction, live A/B

This taxonomy highlights that state-of-the-art cold-start solutions are hybrid, adaptive, and often require multimodal or meta/LLM-driven strategies. Continuous adaptation, exploration mechanisms, and efficient use of available side-information remain active research areas. The field is trending toward knowledge-guided reasoning, agentic pipelines, and dynamic, context-aware inference at production scales.