Papers
Topics
Authors
Recent
Search
2000 character limit reached

Efficient Learning Algorithm (eALS)

Updated 26 March 2026
  • eALS is a matrix factorization framework that efficiently optimizes weighted squared error objectives with adaptive non-uniform weights for implicit feedback.
  • It employs an element-wise coordinate descent approach and advanced caching strategies to significantly reduce computational complexity compared to classical ALS.
  • Empirical results on datasets like Yelp and Amazon show eALS delivers improved recommendation accuracy and scalability for large-scale systems.

Efficient Learning Algorithm (eALS) is a matrix factorization (MF) framework designed to efficiently optimize weighted squared error objectives for implicit feedback under non-uniform weighting schemes, with particular emphasis on the full exploitation of missing data as negative signals. eALS extends classical Alternating Least Squares (ALS) methods to support per-entry, non-uniform weights—including adaptive schemes based on item popularity or side information—crucially improving both the fidelity of modeling user behavior and computational scalability in large-scale recommendation systems. The framework uses an element-wise coordinate descent procedure, advanced caching strategies, and compact low-rank representations of the missing data weights to achieve computational costs competitive with or superior to uniform-weight alternatives (He et al., 2017, He et al., 2018).

1. Weighted Matrix Factorization with Non-Uniform Missing Data Weights

Let RRM×NR \in \mathbb{R}^{M \times N} denote a user–item interaction matrix with observed entries (u,i)R(u,i) \in \mathcal{R}, where ruir_{ui} denotes implicit feedback (e.g., rui=1r_{ui} = 1 for observed (u,i)(u,i), $0$ otherwise). Conventional MF for implicit feedback uses a weighted squared error loss of the form: J(P,Q)=u=1Mi=1Nwui(ruipuqi)2+λ(u=1Mpu22+i=1Nqi22)J(P, Q) = \sum_{u=1}^M \sum_{i=1}^N w_{ui}(r_{ui} - p_u^\top q_i)^2 + \lambda \left( \sum_{u=1}^M \|p_u\|_2^2 + \sum_{i=1}^N \|q_i\|_2^2 \right) where pu,qiRKp_u, q_i \in \mathbb{R}^K are learned factors and wuiw_{ui} are entry-specific non-negative weights.

Non-uniform weighting strategies address two key issues: (1) most implicit-feedback entries are missing, so including them as negative signal with adaptive weights improves fidelity; (2) real-world exposure and popularity induce substantial heterogeneity that is poorly modeled by uniform priors. For example, missing-entry weights cic_i can be set proportional to item popularity fif_i via ci=c0(fi)α/j(fj)αc_i = c_0\,(f_i)^\alpha / \sum_j (f_j)^\alpha, with c0>0c_0 > 0, α[0,1]\alpha \in [0,1] (He et al., 2017).

More generally, missing-entry weights wuiw_{ui} can be parameterized as a low-rank product: wui=aubiw_{ui} = a_u^\top b_i, allowing the modeling of arbitrary patterns using compact SVD-based factors ARM×Z, BRN×ZA \in \mathbb{R}^{M \times Z},\ B \in \mathbb{R}^{N \times Z} (He et al., 2018).

2. Element-wise ALS (eALS): Coordinate Descent Approach

eALS employs coordinate descent on individual scalar factors pu,f, qi,fp_{u,f},\ q_{i,f}, unlike classical ALS which updates full vectors via K×KK \times K solves. For each user uu and scalar component ff: pu,f=iRuwui(ruir^uif)qi,fkfpu,kskfqiRu(wuici)qi,f2+sffq+λp_{u,f} = \frac{ \sum_{i \in \mathcal{R}_u} w_{ui} (r_{ui} - \hat{r}_{ui}^f) q_{i,f} - \sum_{k \neq f} p_{u,k} s^q_{kf} }{ \sum_{i \in \mathcal{R}_u} (w_{ui} - c_i) q_{i,f}^2 + s^q_{ff} + \lambda } where r^uif=puqipu,fqi,f\hat{r}_{ui}^f = p_u^\top q_i - p_{u,f} q_{i,f} and skfqs^q_{kf} is an entry of the "cache" matrix Sq=i=1NciqiqiS^q = \sum_{i=1}^N c_i q_i q_i^\top. Analogous forms apply for item updates qi,fq_{i,f} with Sp=PPS^p = P^\top P (He et al., 2017). When missing-entry weights are expressed with SVD factors, all necessary sums involving wuiw_{ui} can be decomposed as inner products and tensor contractions, leveraging specific cache tensors St,f,kq,St,f,kpS^q_{t,f,k}, S^p_{t,f,k} for efficient recomputation (He et al., 2018).

3. Computational Efficiency and Caching Strategies

The elementary step of eALS updates requires O(K+Ru)O(K + |\mathcal{R}_u|) per user (or O(K+Ri)O(K + |\mathcal{R}_i|) per item), after caches have been constructed. Specifically, cache matrices/tensors (e.g., Sq,Sp,St,f,kqS^q, S^p, S^q_{t,f,k}) aggregate contributions over missing entries without enumerating the full M×NM \times N space, exploiting sparsity and low-rank structure. For simple popularity-based weights (Z=1Z=1), the per-iteration cost is O((M+N)K2+RK)O((M+N)K^2 + |\mathcal{R}| K) (He et al., 2017); for general low-rank weights, the cost is O((M+N)K2Z+RKZ)O((M+N) K^2 Z + |\mathcal{R}| K Z) (He et al., 2018).

Compared to vector-wise ALS (which requires O((M+N)K3+RK2)O((M+N) K^3 + |\mathcal{R}| K^2)), and to naïve element-wise approaches (which may require O(MNK)O(MNK) per sweep), eALS achieves a significant reduction in both asymptotic and observed runtime. Experimental results confirm speed-ups by factors of KK vs. classical ALS and by orders of magnitude vs. naïve element-wise methods, while matching or improving recommendation quality.

4. Online and Incremental Model Updates

eALS supports efficient online updates by refreshing only those user and item factors involved in new interactions, plus the caches required for coordinate updates. When a new interaction (u,i)(u,i) arrives, the following steps are performed:

  1. If uu or ii is new, random initialize pup_u or qiq_i.
  2. Update pup_u and qiq_i via one (or a few) coordinate-descent passes, recomputing only relevant cache entries.
  3. Refresh the associated elements in Sp,SqS^p, S^q.

Each interaction is absorbed in O(K2+KRu)O(K^2 + K|R_u|) (user) and O(K2+KRi)O(K^2 + K|R_i|) (item) time, independent of M,N,RM, N, |\mathcal{R}| (He et al., 2017). Empirically, one online iteration per new tuple suffices to maintain model quality.

5. Empirical Performance and Benchmark Results

On large implicit-feedback datasets (Yelp, Amazon-Movies), eALS demonstrates both superior recommendation accuracy and significant speedup. Key metrics include:

  • On Yelp (M25,000M \approx 25,000, N26,000N \approx 26,000, R7.3×105|\mathcal{R}| \approx 7.3 \times 10^5, K=128K=128): eALS achieves HR@100 0.242\approx 0.242, NDCG@100 0.144\approx 0.144, outperforming RCD, classical ALS, and BPR (He et al., 2017, He et al., 2018).
  • Training time per iteration: with K=128K=128, ALS requires \sim221s, RCD \sim10s, eALS \sim13s on Yelp; for Amazon with M117,000M \approx 117,000, N75,000N \approx 75,000, R5×106|\mathcal{R}| \approx 5 \times 10^{6}, eALS runs in \sim72s vs. 1260s (ALS) and 42s (RCD).
  • Non-uniform missing weights (item popularity) yield up to 10–20% relative improvement in HR@100 and NDCG@100 compared to uniform-weighted baselines; all accuracy gains are statistically significant at p<0.01p<0.01.
  • For online protocols, eALS updates raise HR from \sim0.08 (cold start) to \sim0.22 after a single incremental pass, with the best online weighting wneww_\text{new} improving NDCG by \sim5% (He et al., 2017).

6. Applicability, Extensions, and Implications

eALS allows MF to exploit all missing entries as informative negative signal with adaptive weighting, removing the need for negative sampling or uniformity constraints. The low-rank weight decomposition enables encoding of arbitrary patterns in missingness, including item popularity, user activity, and exposure information (He et al., 2018). The eALS caching and coordinate update strategies can be extended to other loss functions (e.g., weighted hinge) and incorporated into neural or higher-order factorization models. This approach offers a scalable, negative-aware MF solution for large-scale recommender systems, handling matrices with hundreds of millions of missing entries efficiently.

7. Summary Table: Cost and Functional Comparison

Method Missing Weights Per-Iteration Complexity
ALS Uniform O((M+N)K3+RK2)O((M+N)K^3 + |\mathcal{R}| K^2)
RCD Uniform O((M+N)K2+RK)O((M+N)K^2 + |\mathcal{R}| K)
eALS Non-uniform, low-rank O((M+N)K2Z+RKZ)O((M+N) K^2 Z + |\mathcal{R}| K Z)

When Z=1Z=1 (popularity-based weighting), eALS matches the most efficient known solvers while modeling off-diagonal heterogeneity in missing entries (He et al., 2018, He et al., 2017). For higher-rank weighting schemes, the cost scales linearly in ZZ but remains practical for small ZZ.


Efficient Learning Algorithm (eALS) thus provides a theoretically-grounded, computationally efficient, and empirically proven framework for large-scale matrix factorization on implicit feedback, supporting rich, non-uniform negative signal modeling and fast, incremental updates (He et al., 2017, He et al., 2018).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Efficient Learning Algorithm (eALS).