Papers
Topics
Authors
Recent
2000 character limit reached

Group Soft-Impute SVD for Matrix Completion

Updated 21 November 2025
  • Group Soft-Impute SVD is a matrix completion technique that incorporates a pseudo-user to capture aggregated group preferences in sparse rating matrices.
  • It iteratively applies soft-thresholded SVD to recover low-rank structures while balancing fidelity to observed ratings with nuclear-norm regularization.
  • Empirical results on datasets like Goodbooks and Movielens demonstrate improved recall and efficient rank recovery compared to traditional group recommendation methods.

Group Soft-Impute SVD (GSI-SVD) is a nuclear-norm regularized matrix completion technique designed to enhance group recommendations by modeling collective user preferences within sparse, high-dimensional user–item rating datasets. This approach appends group-aggregated preferences as a weighted pseudo-user row to the rating matrix and iteratively performs singular value thresholding to recover low-rank structure, thereby providing robust recommendations for groups of varying sizes (Ibrahim et al., 14 Nov 2025).

1. Problem Formulation and Notation

GSI-SVD operates on the user–item rating matrix RRm×nR \in \mathbb{R}^{m \times n}, where mm is the number of users U={u1,,um}U = \{u_1, \ldots, u_m\} and nn denotes the set of items V={v1,,vn}V = \{v_1, \ldots, v_n\}. Observed ratings reside in the subset Ω{1,,m}×{1,,n}\Omega \subset \{1,\ldots,m\} \times \{1,\ldots,n\}, with projection operator PΩ(R)ij=RijP_\Omega(R)_{ij} = R_{ij} if (i,j)Ω(i,j) \in \Omega, $0$ otherwise.

For a target group GUG \subseteq U and recommendation size kk, the system aims to select VGVV_G \subset V that best aligns with the group’s collective taste. Aggregated group ratings rG,jr_{G,j} are computed for each item jj rated by any member: rG,j=1GiGRi,j,r_{G,j} = \frac{1}{|G|} \sum_{i \in G} R_{i,j}, weighted by

wG,j={iG:Ri,j0}G11+σG,j,w_{G,j} = \frac{|\{i\in G : R_{i,j} \neq 0\}|}{|G|} \cdot \frac{1}{1+\sigma_{G,j}},

yielding the entry rG,jwG,jr_{G,j} \cdot w_{G,j}. An extended matrix Xnew=[R;rGwG]R(m+1)×nX_{new} = [R ; r_G \odot w_G] \in \mathbb{R}^{(m+1) \times n} incorporates this pseudo-user. The group recommendation problem is thus recast as a low-rank matrix completion task.

2. Nuclear-Norm Regularized Matrix Completion

The GSI-SVD objective seeks XR(m+1)×nX \in \mathbb{R}^{(m+1) \times n} minimizing: f(X)=PΩ(R)PΩ(X)F2+λX,f(X) = ||P_\Omega(R) - P_\Omega(X)||_F^2 + \lambda \|X\|_*, where F2||\cdot||_F^2 denotes fidelity to known ratings and the nuclear norm X\|X\|_* encourages low-rank solutions. The trade-off parameter λ>0\lambda > 0 regularizes the rank of the predicted ratings.

This convex program addresses the dual challenges of sparse observations and high ambient dimensions, leveraging the tightest convex relaxation of matrix rank for structure recovery. The aggregation of group preferences directly into XnewX_{new} allows individual and collective tastes to jointly inform completion.

3. Soft-Impute SVD Algorithm

The core iterative algorithm fills in missing values by alternately projecting onto the observed locations and shrinking singular values. At iteration tt:

  1. Construct Y(t)=PΩ(R)+PΩ(X(t))Y^{(t)} = P_\Omega(R) + P_{\Omega^\perp}(X^{(t)}), where PΩP_{\Omega^\perp} fills missing entries.
  2. Compute the truncated SVD: Y(t)=UΣVTY^{(t)} = U \Sigma V^T, Σ=diag(σ1,...,σr)\Sigma = \mathrm{diag}(\sigma_1, ..., \sigma_r), retaining only singular values above the current λ\lambda.
  3. Apply soft-thresholding: Dλ(Σ)=diag((σiλ)+)D_\lambda(\Sigma) = \mathrm{diag}((\sigma_i - \lambda)_+) with (x)+=max(x,0)(x)_+ = \max(x, 0).
  4. Update: X(t+1)=UDλ(Σ)VTX^{(t+1)} = U D_\lambda(\Sigma) V^T.

Convergence is determined by the relative Frobenius norm change: ZnewZF2/ZF2<ϵ||Z_{new} - Z||_F^2/||Z||_F^2 < \epsilon. A decreasing grid of λ\lambda values is used from λmax=σmax(PΩ(R))\lambda_{max} = \sigma_{max}(P_\Omega(R)) to λmin1\lambda_{min} \approx 1, with warm starting to accelerate iterative refinement.

Step Operation Purpose
Aggregation Compute rGr_G and wGw_G; append to RR Capture group collective taste
Imputation Iterative SVD and shrinkage over extended matrix Low-rank recovery
Convergence Relative norm criterion, warm start for each λ\lambda Efficient optimization

4. Incorporation of Group Preferences

The group preference vector, embedded as a pseudo-user row weighted by wGw_G, integrates individual and aggregate ratings in a unified framework. The nuclear-norm completion exploits correlations across both users and groups to inform missing values.

Post-convergence, the final row of the completed matrix ZZ contains predicted group ratings across all items, enabling high-fidelity group recommendations even when data is exceedingly sparse or dimensionally broad. The alignment of individual and group signals within the same low-rank structure is key to robustness.

5. Computational Complexity and Convergence Analysis

Each algorithm iteration involves a rank-rr partial SVD on an (m+1)×n(m+1) \times n matrix, costing O((m+1)nr)O((m+1)nr). The projections PΩP_\Omega and PΩP_{\Omega^\perp} scale as O(Ω)O(|\Omega|). Overall per-iteration runtime is O(Ω+n(m+1)r)O(|\Omega| + n(m+1)r), feasible for large mm, nn when rmin(m,n)r \ll \min(m, n).

Soft-thresholding on singular values is non-expansive in the Frobenius norm: Dλ(A)Dλ(B)FABF,\|D_\lambda(A) - D_\lambda(B)\|_F \leq \|A - B\|_F, giving rise to geometric (linear) convergence empirically once iterates enter the neighborhood of the low-rank solution. Convergence to accuracy ϵ\epsilon requires O(log(1/ϵ))O(\log(1/\epsilon)) iterations, as observed in exponential decay of the error on log scale.

6. Empirical Evaluation and Results

Experiments utilized three datasets:

  • Goodbooks-10K: 2,0002{,}000 users ×\times $200$ books, 90%90\% sparsity.
  • Movielens (100K): $943$ users ×\times $500$ items, 86%86\% sparsity.
  • Synthetic: 2,0002{,}000 users ×\times $200$ items, 25%25\% observed.

Group sizes evaluated were $5$, $15$, $20$, and $25$. Baseline methods included WBF (“weighted before factorization” matrix factorization aggregator) and AF (“after factorization” latent-factor aggregator). Metrics were precision@K, recall@K, and F1@K for K=20K=20.

Dataset Group Size GSI-SVD Recall Baseline Recall Rank Recovery
Goodbooks 5 Higher Lower Much lower rank
Goodbooks 15–25 Comparable Comparable Lower rank
Movielens 5 Higher Lower Much lower rank
Synthetic All Highest Lower Lower rank

GSI-SVD outperformed baselines in recall for small groups on real datasets, retained comparability for larger groups, and yielded the highest precision, recall, and F1 scores on synthetic data. A key observation is that GSI-SVD achieves matrix completion at substantially lower effective rank as λ\lambda increases compared to WBF/AF, which remain closer to full rank. This suggests improved capacity for structure recovery in high-dimensional, sparse scenarios.

7. Practical Considerations and Implementation Notes

Optimal performance of GSI-SVD depends on initialization and selection of hyperparameters:

  • λmax=σmax(PΩ(R))\lambda_{max} = \sigma_{max}(P_\Omega(R)), λmin1\lambda_{min} \approx 1.
  • Grid size K10K \approx 10–$20$; tolerance ϵ105\epsilon \approx 10^{-5}.

Warm starting λ\lambda and tracking rank decay via a logarithmic grid is effective. Implementation benefits notably from batched partial SVD via randomized algorithms (e.g., PROPACK, ARPACK) and GPU acceleration (such as PyTorch). Retention of nonzero (σiλ)(\sigma_i - \lambda) components in each step provides memory and computational efficiency.

Batched singular-value thresholding and the integration of both individual and group-level aggregation make GSI-SVD well suited for settings exhibiting extreme sparsity and high dimensionality.

This suggests broader applicability in group recommendation domains where rating matrices are large and incomplete, and a plausible implication is enhanced rank adaptivity compared to standard MF approaches.

8. Summary and Context

Group Soft-Impute SVD offers a principled convex optimization method for group recommendation in sparse, high-dimensional environments, leveraging group-aggregated preference rows and nuclear-norm regularization to enable low-rank recovery and robust recommendations. Its empirical effectiveness is confirmed on multiple real-world and synthetic datasets, outperforming matrix factorization-based group aggregators in recall for small groups and achieving favorable trade-offs in recall and precision while automatically recovering lower matrix ranks (Ibrahim et al., 14 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Group Soft-Impute SVD.