Collaborative Topic Regression (CTR)

Updated 10 March 2026

Collaborative Topic Regression is a framework that combines latent topic modeling (via LDA) with probabilistic matrix factorization to jointly capture item content and user interactions.
It addresses cold-start and data sparsity by coupling content signals with collaborative user feedback, thereby enhancing recommendation precision.
Extensions incorporating social network influences and cognitive attention constraints further boost its performance in large-scale, dynamic environments.

Collaborative Topic Regression (CTR) is a probabilistic, machine-learning framework that integrates latent topic modeling of item content with collaborative filtering based on user–item interactions. CTR addresses the cold-start and data sparsity problems in recommender systems by modeling the generative process of user adoption events and item content simultaneously. It extends traditional matrix factorization with LDA-style topic representations for items, and supports further extension to social and cognitive factors such as social network influence and limited human attention.

1. Model Foundations

CTR builds upon two foundational components: probabilistic matrix factorization (PMF) and Latent Dirichlet Allocation (LDA). In the canonical CTR model, each item (e.g., document, media file) is represented by a topic-proportion vector $\theta_j$ , drawn from a Dirichlet prior; user latent interest vectors $u_i$ are sampled from a zero-mean Gaussian prior. The item latent vector is defined as $v_j = \theta_j + \epsilon_j$ , where $\epsilon_j$ accounts for collaborative signal not captured by content, with $\epsilon_j \sim \mathcal{N}(0,\lambda_v^{-1}I_K)$ . The user–item interactions (ratings/adoptions) are modeled as:

$r_{ij} \sim \mathcal{N}(u_i^\top v_j, c_{ij}^{-1})$

with unknown binary or continuous ratings $r_{ij}$ and confidence weights $c_{ij}$ . Item content generation follows standard LDA:

For each word $m$ in item $j$ , topic assignment $z_{jm} \sim \mathrm{Multinomial}(\theta_j)$ .
Word $w_{jm} \sim \mathrm{Multinomial}(\beta_{z_{jm}})$ , where $\beta$ is the topic-word distribution.

This joint formulation couples collaborative and content-based signals in the shared latent space.

CTR has been generalized to incorporate social network effects and psycho-cognitive constraints:

Social Matrix Factorization (CTR+SMF): Incorporates observed user–user links $q_{ik}$ via a social latent vector $s_k$ for each friend $k$ , with an additional likelihood matching observed friendships as $p(q_{ik}|u_i,s_k) = \mathcal{N}(q_{ik}|g(u_i^\top s_k), d_{ik}^{-1})$ , and $g(\cdot)$ a logistic mapping. This coupling allows the model to infer the degree of 'peer influence' on each user's preferences (Purushotham et al., 2012).
Limited Attention CTR (LA-CTR): Models cognitive constraints by introducing an attention vector $\phi_{il}$ for user $i$ 's allocation of attention to friend $l$ over topics. Each $\phi_{il}\sim \mathcal{N}(g_\phi(s_{il}u_i), (c_{il}^\phi\lambda_{\phi})^{-1}I_K)$ , encoding both baseline influence and topical focus of attention. The user’s adoption of item $j$ via friend $l$ is modeled by $r_{ijl} \sim \mathcal{N}(\phi_{il}^\top v_j, (c_{ijl}^r)^{-1})$ . This captures both social diffusion and cognitive processing constraints (Kang et al., 2013).

3. Objective Functions and Inference Methods

CTR and its extensions optimize a joint (log-)posterior, regularizing the solution through Gaussian and Dirichlet priors:

$\ell_{\mathrm{CTR}} = -\frac{\lambda_u}{2}\sum_i \|u_i\|^2 - \frac{\lambda_v}{2}\sum_j \|v_j-\theta_j\|^2 + \sum_{j,m}\log\Bigl(\sum_k \theta_{jk}\beta_{k,w_{jm}}\Bigr) - \frac{1}{2}\sum_{i,j}c_{ij}(r_{ij}-u_i^\top v_j)^2$

LA-CTR extends this with additional regularization and social/attention likelihoods, introducing terms penalizing deviations of attention from the product of influence and interest, and matching observed adoption paths via friends.

Model estimation is performed using MAP–EM style algorithms, alternating between:

E-step: Variational approximations for topic assignments, e.g., $\psi_{jmk} \propto \theta_{jk}\, \beta_{k,w_{jm}}$ for LDA components.
M-step: Closed-form updates for user/item/attention/social vectors, e.g.,

$u_i\leftarrow(\lambda_u I+\sum_j c_{ij} v_jv_j^\top)^{-1} \sum_j c_{ij} r_{ij} v_j$

with analogous updates for $v_j$ , $s_i$ , and $\phi_{il}$ .

Hyperparameters ( $\lambda$ terms, confidence weights) are selected via held-out cross-validation.

4. Online and Jointly Coupled Learning

The original CTR is a batch method, computing topic proportions and rating factors separately. Online Bayesian CTR (OBCTR) proposes a streaming variational inference framework, updating posteriors incrementally with each incoming $(i,j,r_{ij},\mathbf{w}_j)$ tuple (Liu et al., 2016). Key properties include:

Mean-field posteriors for $u_i$ , $v_j$ (Gaussian), topic distributions $\theta_j$ , $\phi_k$ (Dirichlet), and word-topic assignments $z_{jn}$ .
Per-sample updates using BayesPA-style minimization: each new sample nudges the latent factors, and rating signals directly influence topic assignments via the collaborative–content coupling term.
Constant per-sample memory and computation, suitable for streaming and large-scale contexts.
Empirically, OBCTR achieves improved RMSE and held-out likelihood compared to both batch CTR and less tightly coupled online variants.

5. Empirical Evaluation and Performance

CTR and its variants have been extensively benchmarked on public datasets:

Model	Lastfm (recall@250)	Delicious (recall@250)	Digg 2009 (recall@100)
PMF (CF)	0.42	0.36	—
CTR	0.45	0.39	~0.11
CTR+SMF	0.48	0.43	~0.17
LA-CTR $_U$	—	—	~0.19
LA-CTR $_\phi$	—	—	~0.22

On social bookmarking (Delicious), incorporating social structure yields 3–5% higher recall than relying on content alone (Purushotham et al., 2012).
On music (Lastfm), content (tags) is more predictive, with social signals providing a complementary boost.
On social news voting (Digg), LA-CTR $_\phi$ demonstrates relative improvements of 20–30% over CTR+SMF, confirming that modeling limited, non-uniform attention sharpens prediction of social adoptions (Kang et al., 2013).
In all cases, baselines using only collaborative or content signals underperform joint models.

6. Limitations and Prospective Directions

Model Scalability: Extensions such as LA-CTR introduce $O(N^2D)$ latent variables in the worst case; in practice, sparsity of social graphs and interactions makes these models computationally