Papers
Topics
Authors
Recent
2000 character limit reached

HGLMRec Recommender Model

Updated 13 December 2025
  • HGLMRec is a recommender system architecture that integrates hypergraph convolution with specialized LLM agents to model complex multi-behavior user–item interactions.
  • It employs dynamic multi-agent routing and token-restricted retrieval to achieve significant cost savings and improved performance on large-scale e-commerce benchmarks.
  • Empirical evaluations on datasets like Taobao, IJCAI, and Tianchi demonstrate higher accuracy and efficiency compared to conventional single-LLM and graph-based models.

HGLMRec is a recommender system architecture that integrates a hypergraph convolutional encoder with a hierarchical multi-LLM agent architecture, designed to model complex multi-behavior user–item interactions while minimizing inference costs. It stands out by leveraging token-efficient inference and dynamic multi-agent LLM routing, outperforming prior baselines on multiple large-scale e-commerce benchmarks in both accuracy and computational efficiency (Mukande et al., 6 Dec 2025).

1. Hypergraph Encoder for Multi-Behavior Interaction

HGLMRec represents user–item interactions as a hypergraph, where each node corresponds to a user or item, and each hyperedge encodes a multi-way group interaction tied to a specific behavior (e.g., views, purchases) within a session or temporal window. Mathematically:

  • For node set V=UIV = U \cup I (users UU, items II), hyperedges E\mathcal{E} are constructed such that eEe \in \mathcal{E} connects all items vkv_k engaged by user uu under behavior bb in a given window.
  • The incidence matrix H{0,1}V×EH \in \{0,1\}^{|V|\times|\mathcal{E}|} encodes memberships: Hv,e=1H_{v,e} = 1 if vev \in e.
  • Degree matrices are DvD_v (vertex) and DeD_e (hyperedge), with option for hyperedge-weight matrix WW.
  • The normalized hypergraph Laplacian is L=IDv1/2HWDe1HTDv1/2L = I - D_v^{-1/2} H W D_e^{-1} H^T D_v^{-1/2}.

HGLMRec applies two layers of hypergraph convolution to update node representations: hv(l+1)=LayerNorm[ReLU(ev1euehu(l)W(l))]h_v^{(l+1)} = \mathrm{LayerNorm}\left[\mathrm{ReLU}\left(\sum_{e\ni v} \frac{1}{|e|} \sum_{u\in e} h_u^{(l)} W^{(l)}\right)\right] After convolution, adaptive attention-based pooling aggregates node embeddings to kk-token summaries (“graph tokens”): αv=exp(aTtanh(Wahv))uVexp(aTtanh(Wahu)),G=MLP(vαvhv)Rk×d\alpha_v = \frac{\exp(a^T \tanh(W_a h_v))}{\sum_{u\in V} \exp(a^T \tanh(W_a h_u))}, \quad G = \mathrm{MLP}\left(\sum_v \alpha_v h_v\right)\in \mathbb{R}^{k \times d} These graph tokens encode the structured user-item context for downstream token-based inference.

2. Hierarchical Multi-LLM Mixture-of-Agents

HGLMRec’s inference pipeline uses a three-layer Mixture-of-Agents (MoA), where at each layer ii there are nn frozen LLM “agents” Ai,1,...,Ai,nA_{i,1},...,A_{i,n}, each designed for specialization (e.g., retrieval, reasoning, ranking):

  • Each agent receives the current token embedding matrix xix_i and outputs Zi,j=Ai,j(xi)Z_{i,j} = A_{i,j}(x_i).
  • Cross-agent attention aggregates their contributions: βi,j=softmaxj(Tr[Zi,jWq(ZˉiWk)T]+b),Zˉi=concatjZi,j\beta_{i,j} = \mathrm{softmax}_j(\mathrm{Tr}[Z_{i,j} W_q (Z̄_i W_k)^T] + b), \quad Z̄_i = \mathrm{concat}_j Z_{i,j}
  • Layer-wise progression: yi=jβi,jZi,j+x1y_i = \sum_j \beta_{i,j} Z_{i,j} + x_1, with xi+1=yix_{i+1} = y_i.

Experimental configurations include Qwen2-7B LLMs at intermediate layers and a LLaMA-3-8B model at the output aggregation layer. The architecture enables dynamic reweighting and correction of agent outputs across layers, yielding refined candidate rankings.

3. Token-Efficient Retrieval and Inference

Unlike conventional LLM-based recommenders relying on full-vocabulary softmax, HGLMRec restricts inference-time attention to a compact set: the kk graph tokens GG plus mm prompt tokens PP (e.g., task description or user query). At each decoding step:

  • The active token pool is GPG\cup P, reducing the search space from Vocab|\mathrm{Vocab}| to k+mk + m.
  • Token scoring: s(eqt)=qted,p(eqt)=softmaxe(s(eqt))s(e|q_t) = \frac{q_t \cdot e}{\sqrt{d}}, \qquad p(e|q_t) = \mathrm{softmax}_e(s(e|q_t))
  • Computational complexity is O((k+m)d)\mathcal{O}((k+m)d) per step, a major reduction versus traditional O(Vocabd)\mathcal{O}(|\mathrm{Vocab}|d) approaches.

This selective attention framework substantially decreases TFLOPs and monetary cost at inference. Empirically, HGLMRec achieves an order-of-magnitude efficiency improvement (Mukande et al., 6 Dec 2025).

4. Training Objectives and Optimization

The main optimization consists of:

  • Recommendation accuracy loss (multi-behavior cross-entropy): Lrec=1Nu=1NbBvIru,vblogr^u,vbL_\mathrm{rec} = -\frac{1}{N} \sum_{u=1}^{N} \sum_{b \in B} \sum_{v \in I} r_{u,v}^b \log \hat{r}_{u,v}^b where ru,vbr_{u,v}^b is the ground truth for behavior bb.
  • L2L^2 parameter regularization and an explicit computational cost penalty: L=Lrec+λΘ2+μCostL = L_\mathrm{rec} + \lambda \|\Theta\|^2 + \mu \cdot \mathrm{Cost} Cost is instantiated as TFLOPs or API call expenditures for deployed systems.
  • The overall workflow is detailed in the provided inference pseudocode, supporting reproducibility for implementation (Mukande et al., 6 Dec 2025).

5. Empirical Performance and Ablation Analysis

HGLMRec demonstrates statistically significant improvement across three e-commerce datasets (Taobao, IJCAI, Tianchi):

Dataset HR@10 (HGLMRec) NDCG@10 (HGLMRec) Best Baseline NDCG@10 Cost Savings (%)
Taobao 0.865 0.726 0.708 40–60 vs GPT-4o LLM
IJCAI 0.932 0.812 0.798
Tianchi 0.814 0.613 0.601
  • At fixed TFLOPs or API budget, HGLMRec achieves the best performance, outperforming single-LLM (e.g., GPT-4o), TALLRec, and KDA (Mukande et al., 6 Dec 2025).
  • Removing the hypergraph module decreases NDCG@10 significantly (e.g., 0.726→0.645 on Taobao).
  • Collapsing MoA to a single expert reduces performance, and stacking more MoA layers steadily increases HR and NDCG.
  • The architecture is robust to ablation of the core components, affirming the necessity of both the hypergraph encoder and hierarchical multi-agent aggregation.

6. Comparison to Other Hypergraph-LLM Frameworks

Contrast with other hypergraph- or semantic-enhanced systems:

  • LGHRec (also termed HGLMRec in earlier work (Luo et al., 18 May 2025)) uses LLM-generated semantic IDs and harmonized group policy optimization for contrastive learning, but unlike the dynamic, token-efficient LLM agent inference of HGLMRec, it does not deploy a MoA or retrieval-limited tokenization pipeline.
  • HERec (Ma et al., 21 Nov 2024) incorporates LLM-derived semantic profiles into hyperbolic graph collaborative filtering for exploration–exploitation tradeoffs, but does not exploit dynamic multi-agent LLM routing or token-limited inference.

This suggests HGLMRec uniquely combines higher-order behavioral modeling and efficient large-model integration, setting it apart from prior GNN, hyperbolic, or LLM-semantic models.

7. Architectural Significance and Future Implications

HGLMRec exemplifies a new class of recommender systems that synthesize lightweight hypergraph modeling (capturing session- and behavior-structured high-order relations) with scalable inference by modular, frozen LLM ensembles. Performance improvements are attributed to three factors:

  1. Hypergraph-aware graph tokenization yielding succinct interaction summaries.
  2. Layered, specialized multi-LLM agent pipelines allowing for division of labor without full fine-tuning.
  3. Token-restricted retrieval enabling sublinear inference scaling relative to vocabulary size.

A plausible implication is that future generative recommender systems will further optimize inference granularity and agent specialization, drawing on the HGLMRec design for scalable, context-rich decision-making (Mukande et al., 6 Dec 2025).

Whiteboard

Follow Topic

Get notified by email when new papers are published related to HGLMRec Model.