Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
136 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
50 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

LLM-Driven Dual-Level Multi-Interest Modeling

Updated 16 July 2025
  • LLM-Driven Dual-Level Multi-Interest Modeling is a hierarchical recommendation framework that leverages LLMs to disentangle and align individual and crowd-level interest representations.
  • It integrates LLM-informed semantic clustering with capsule network-based collaborative learning to adapt granularity and address data sparsity in user behavior.
  • The framework employs contrastive learning and max covering optimization to boost prediction accuracy and robustness across diverse recommendation scenarios.

A LLM-Driven Dual-Level Multi-Interest Modeling Framework is a hierarchical architecture for recommendation systems that exploits the semantic knowledge and reasoning capabilities of LLMs to extract, fuse, and disentangle users’ multi-interest representations at both the individual and population (“crowd”) levels for improved prediction, granularity control, and robustness in the face of data sparsity (Wang et al., 15 Jul 2025). This approach addresses fundamental limitations in traditional multi-interest learning—such as heuristic co-occurrence grouping and inability to generalize interest representations under real-world user behavior distributions—by unifying LLM-informed semantic clustering with collaborative multi-interest learning and population-level aggregation.

1. Framework Architecture and Workflow

LDMI (LLM-Driven Dual-Level Multi-Interest Modeling) operates on two levels:

  • User-Individual Level: Each user’s engagement sequence (e.g., list of item titles) is processed by an LLM, which flexibly allocates items to semantic clusters that represent distinct user interests. Since LLMs may produce over-fine or -coarse granularity, an alignment module is introduced. This module assigns LLM semantic clusters to collaboratively learned interests (extracted via a capsule network trained on global user-item co-interactions), allowing automatic granularity adjustment.
  • User-Crowd Level: To alleviate the sparsity of individual user data, the framework synthesizes “users” by aggregating cliques—groups of users with overlapping behavior. A max covering problem is formulated to optimally select synthesized users whose collective behavior is as representative and compact as possible. LLMs analyze these synthesized, richer behavior records to generate multi-interest clusters, which are further refined via contrastive learning to disentangle item representations between distinct interests.

Typical workflow steps:

  1. For a target user, gather the sequence of item interactions (including item titles).
  2. Apply an LLM with a dedicated prompt to segment items into semantic interest clusters.
  3. Independently, use a collaborative learning mechanism (e.g., dynamic routing in a capsule network) to extract global multi-interest vectors.
  4. Fuse LLM-derived clusters and capsule-based interests using attention-based alignment: semantic representations are softly assigned to collaborative interests, and fused vectors are formed per interest slot.
  5. Aggregate similar users into cliques, identify representative synthesized users with max covering optimization, and repeat the LLM-driven clustering and fusion.
  6. Apply contrastive learning to maximize intra-cluster similarity and inter-cluster separation across synthesized users’ interests.
  7. Use the resulting user interest representations as input for recommendation or ranking modules.

2. User-Individual Level: LLM-Driven Semantic Clustering and Alignment

At the user-individual level, the LLM operates on the titles of items that a user has interacted with. This involves:

  • Constructing a prompt that asks the LLM to divide the item list into several semantically coherent clusters:

C1i,,CFi=LLM(prompt,tui)\mathcal{C}_1^i, \ldots, \mathcal{C}_F^i = \text{LLM}(\text{prompt}, t_{u_i})

where tuit_{u_i} is the sequence of item titles for user uiu_i, and each Cfi\mathcal{C}_f^i is a semantic interest cluster.

  • Aggregating items within each cluster into a representation using attention pooling:

hfi=vjCfiαjvjwhereαj=exp(wTvj+b)vkCfiexp(wTvk+b)h_f^i = \sum_{v_j \in \mathcal{C}_f^i} \alpha_j v_j \quad \text{where} \quad \alpha_j = \frac{\exp(w^T v_j + b)}{\sum_{v_k \in \mathcal{C}_f^i} \exp(w^T v_k + b)}

(vjv_j is the embedding of item jj, ww, bb are learnable.)

  • Extracting KK collaborative multi-interest representations using capsule networks with dynamic routing:

mki=jbjkWvjim_k^i = \sum_j b_{jk} W v_j^i

(bjkb_{jk} are routing weights derived from an iterative softmax over agreement scores.)

  • Aligning semantic clusters and collaborative interests by computing attention between them:

zki=f=1Fαkfhfiz_k^i = \sum_{f=1}^{F} \alpha_{kf} h_f^i

with αkf\alpha_{kf} derived from comparing mkim_k^i and hfih_f^i.

  • Forming the hybrid representation for each interest as oki=mki+zkio_k^i = m_k^i + z_k^i. This fusion adaptively corrects the granularity and leverages both LLM and collaborative perspectives.

3. User-Crowd Level: Crowd Synthesis and Max Covering Optimization

To counteract the limitations imposed by data sparsity at the individual level, the framework synthesizes representative users by aggregating “cliques” of similar real users. The process is as follows:

  • Synthesize each user uiu_i^\prime from the union of behaviors of users in a clique c(ui)c(u_i^\prime) (users with similar preferences).
  • Formulate a max covering problem (MCP) to select a compact set of synthesized users that maximize coverage of distinct items:

maxxjI(i=1MxiAij1)wj\max_x \sum_j I\left(\sum_{i=1}^M x_i A_{ij} \geq 1\right) w_j

Subject to x{0,1}Mx \in \{0,1\}^M, x0Z\|x\|_0 \leq Z, where Aij=1A_{ij}=1 if item vjv_j appears in uiu_i^\prime, wjw_j is an item weight (e.g., popularity), and ZZ is a cardinality constraint.

  • Analyze these denser synthesized behaviors with LLMs, generating more robust multi-interest clusters at the population level.

4. Contrastive Learning and Interest Disentanglement

Contrastive learning is applied between clusters obtained from synthesized users to sharpen the distinction between interests:

  • For each synthesized user, encourage the item embeddings within the same cluster to be similar, and those from different clusters to be separated. The loss is:

Luicst=vjc(ui)logexp(vjTvj/τ)vjc(ui)exp(vjTvj/τ)\mathcal{L}^{\text{cst}}_{u_i^\prime} = - \sum_{v_j \in c(u_i^\prime)} \log \frac{\exp(v_j^T v_{j^*} / \tau)}{\sum_{v_{j^\prime} \notin c(u_i^\prime)} \exp(v_j^T v_{j^\prime} / \tau)}

where vjv_{j^*} is a positive sample in the same cluster and τ\tau is a temperature parameter.

  • The final multi-task training objective is:

L=Lrec+λLcst\mathcal{L} = \mathcal{L}^{\text{rec}} + \lambda \mathcal{L}^{\text{cst}}

Here, Lrec\mathcal{L}^{\text{rec}} is a recommendation loss and λ\lambda balances the contrastive term.

5. Addressing Granularity and Sparsity in Multi-Interest Modeling

Two main challenges are explicitly addressed:

  • Granularity Control: LLMs may generate interest clusters that are either too fine- or coarse-grained. The alignment module resolves this by mapping LLM clusters into the collaborative interest space via attention and adaptive fusion, guaranteeing granularity that matches the user’s behavioral context.
  • Data Sparsity: Since most users have limited interaction history, synthesized user generation via optimal clique aggregation (MCP) provides the LLMs with denser usage histories, improving robustness. This, augmented with contrastive learning, sharpens representations and reduces noise.

6. Experimental Evaluation and Empirical Findings

The LDMI framework was tested on multiple Amazon review datasets (Beauty, Books, Video Games), using recall, NDCG, and hit rate metrics at cutoffs 20 and 50:

  • LDMI consistently outperformed standard single-interest (GRU4Rec, Pop) and multi-interest (MIND, ComiRec, REMI, DisMIR, EIMF) methods, as well as other LLM-based multi-interest models, across all datasets evaluated.
  • Ablation studies confirmed that both the LLM-driven semantic clustering at the user-individual level and the max covering user synthesis at the crowd-level were essential to the observed performance gains.
  • The contrastive learning module facilitated effective interest disentanglement.

7. Significance and Future Implications

This dual-level, LLM-integrated methodology provides a mathematically grounded, scalable, and empirically validated paradigm for recommendation tasks involving nuanced user interests and large-scale, sparse behavioral datasets. Key properties of practical significance include:

  • The ability to adjust interest granularity per-user in a data-driven fashion.
  • Enhanced robustness and expressiveness via crowd-informing synthesis.
  • Improved disentanglement of interests through contrastive objectives.
  • Applicability to a wide range of real-world data regimes and potentially extensibility to further modalities (e.g., textual, multi-modal, or temporal signals).

This framework establishes a clear pathway for integrating LLMs as semantic engines in the next generation of interpretable and personalized recommendation systems (Wang et al., 15 Jul 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)