Domain Adaptation of Multilingual Sentence Embeddings for Cross-lingual News Recommendation
The paper presents a focused investigation into the challenges of recommending multilingual news articles using neural news recommender systems (NNRs) enhanced with multilingual sentence embeddings (SEs). The authors highlight two main challenges: the performance degradation in zero-shot cross-lingual transfer (ZS-XLT) scenarios and the computational infeasibility of fine-tuning backbone LLMs (LMs) in low-data environments such as few-shot recommendation and cold-start setups.
Contributions
The key contributions of the paper include the development of a news-adapted sentence encoder (NaSE) derived from a pretrained massively multilingual SE, and the construction of two multilingual news-specific corpora: PolyNews and PolyNewsParallel. The authors propose a simplified, yet robust, baseline for news recommendation utilizing frozen NaSE embeddings combined with late click-behavior fusion.
Methodology
News-Adapted Sentence Encoder (NaSE)
The authors initiate NaSE from a general-purpose multilingual SE, LaBSE, and specialize it using denoising autoencoding (DAE) and machine translation (MT) objectives on the PolyNews and PolyNewsParallel corpora. Four distinct training strategies for NaSE are explored: DAE, MT, a combined DAE+MT, and sequential DAE followed by MT (NaSE\textsubscript{SEQ}).
Training Data and Process
PolyNews consists of approximately 3.9 million multilingual news texts across 77 languages. PolyNewsParallel, on the other hand, contains around 5.4 million news translations across 833 language pairs. The training data distribution is adjusted for language resource levels to ensure balanced learning. The NaSE variants are trained for 50,000 steps with a learning rate of 3e-5 using AdamW optimizer, with validation setup based on cross-lingual news recommendation tasks leveraging the xMIND dataset, which translates English MIND into 14 languages.
Evaluation
Neural News Recommenders (NNRs)
Seven diverse NNR architectures are evaluated:
- NAML
- MINS
- CAUM
- MANNeR
- LFRec-CE
- LFRec-SCL
- CAT (text-agnostic as baseline)
Results
The evaluation on the small variants of MIND and xMIND reveals that while SE-based NNRs outperform text-agnostic baselines, integrating NaSE as the NE results in superior performance over fine-tuned LaBSE and non-specialized multilingual LMs, especially when the NE remains frozen.
Key numerical results include:
- NaSE achieves an nDCG@10 of 39.01% in English and 38.23% averaged across 14 xMIND languages in frozen NE configurations, illustrating the efficacy of news-specific domain adaptation.
- NaSE consistently shows reduced performance losses in ZS-XLT scenarios compared to LaBSE, with relative improvements in ranking metrics such as MRR and nDCG@10.
The detailed assessment of few-shot learning scenarios (10, 50, and 100 shots) further underscores NaSE's robustness, where it consistently outperforms LaBSE, especially in extreme low-data setups.
Implications and Future Directions
Practically, this research highlights the feasibility of using robust domain-adapted SEs without extensive and computationally expensive fine-tuning. Theoretically, it opens new avenues for leveraging pretrained multilingual models for domain-specific tasks through task-agnostic adaptation strategies.
Future research may explore expanding NaSE's language coverage and enhancing the domain adaptation process using larger and more diverse news corpora. Investigating the integration of external user behavior or contextual signals could also further improve the accuracy and relevance of multilingual news recommendations.
The findings set a precedent for next-generation multilingual news recommenders, emphasizing efficiency and cross-lingual capability critical for real-world applications where resource constraints and language diversity pose significant challenges.