Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 150 tok/s

Gemini 2.5 Pro 47 tok/s Pro

GPT-5 Medium 33 tok/s Pro

GPT-5 High 34 tok/s Pro

GPT-4o 113 tok/s Pro

Kimi K2 211 tok/s Pro

GPT OSS 120B 444 tok/s Pro

Claude Sonnet 4.5 37 tok/s Pro

2000 character limit reached

Cross-attention Secretly Performs Orthogonal Alignment in Recommendation Models (2510.09435v1)

Published 10 Oct 2025 in cs.LG and cs.IR

Abstract: Cross-domain sequential recommendation (CDSR) aims to align heterogeneous user behavior sequences collected from different domains. While cross-attention is widely used to enhance alignment and improve recommendation performance, its underlying mechanism is not fully understood. Most researchers interpret cross-attention as residual alignment, where the output is generated by removing redundant and preserving non-redundant information from the query input by referencing another domain data which is input key and value. Beyond the prevailing view, we introduce Orthogonal Alignment, a phenomenon in which cross-attention discovers novel information that is not present in the query input, and further argue that those two contrasting alignment mechanisms can co-exist in recommendation models We find that when the query input and output of cross-attention are orthogonal, model performance improves over 300 experiments. Notably, Orthogonal Alignment emerges naturally, without any explicit orthogonality constraints. Our key insight is that Orthogonal Alignment emerges naturally because it improves scaling law. We show that baselines additionally incorporating cross-attention module outperform parameter-matched baselines, achieving a superior accuracy-per-model parameter. We hope these findings offer new directions for parameter-efficient scaling in multi-modal research.

Summary

The paper demonstrates that cross-attention naturally performs orthogonal alignment by extracting complementary, non-redundant information across domains.
The introduction of the Gated Cross-Attention module improves recommendation accuracy and parameter efficiency through effective latent space fusion.
Experimental findings show significant improvements in NDCG@10 and AUC scores, underscoring the method’s robustness against noisy, cross-domain data.

Cross-attention Secretly Performs Orthogonal Alignment in Recommendation Models

This paper investigates the mechanisms underlying cross-attention in cross-domain sequential recommendation (CDSR) systems, arguing that cross-attention not only facilitates residual alignment but also discovers novel orthogonal information, termed "Orthogonal Alignment." This insight emerges naturally and is linked to improved parameter scaling in recommendation models.

Introduction

Cross-domain sequential recommendation systems leverage interaction sequences across different platforms to enhance recommendation accuracy. Traditional models suffer from noisy and redundant data integration, leading to performance degradation. Cross-attention mechanisms have been widely adopted to address these challenges, facilitating the alignment and projection of representations from various domains into a unified latent space.

The conventional understanding of cross-attention focuses on residual alignment, where cross-attention predominantly refines the input by filtering irrelevant information. However, this paper introduces the concept of Orthogonal Alignment, where cross-attention naturally discovers and integrates orthogonal, non-redundant information that significantly enhances model performance without additional parameter reliance.

Figure 1: Residual alignment and orthogonal alignment in cross-domain recommendation models.

Gated Cross-Attention Module (GCA)

The core innovation presented is the Gated Cross-Attention ($\gca$) module, designed to extract complementary orthogonal information during the alignment of sequences from different domains. The $\gca$ module operates by allowing the integration of novel information uncovered during cross-attention between domain sequences, thereby improving the model's representational capacity.

Formulation

The $\gca$ module can be formulated as follows:

def gca(X_A, X_B, ff, ca):
    X_A_prime = ca(query=X_A, key=X_B, value=X_B)
    gated_output = ff([X_A, X_B]) * X_A_prime
    return layernorm(X_A + gated_output)

Where:

$X_A$ and $X_B$ are the input sequences from domain $A$ and $B$ respectively.
$ff$ is a feedforward network aimed to control the integration of information.
$ca$ refers to cross-attention operation with query $X_A$ and key/value $X_B$ .
Figure 2: In cross-domain sequential recommendation, various fusion structures form the backbone, where $\gca$ is applied.

Experimental Findings

Observation 1: Performance Improvement with $\gca$

The experimental results consistently show that $\gca$ modules substantially enhance recommendation accuracy across a variety of baseline models. The introduction of $\gca$ results in improved NDCG@10 and AUC scores across multiple domain recommendations.

Observation 2: Natural Emergence of Orthogonal Alignment

Analysis reveals a negative correlation between the cosine similarity of cross-attention inputs and outputs, underscoring the orthogonality phenomenon. The orthogonal alignment naturally emerges from the $\gca$ integration, providing robust performance improvements.

Figure 3: Placement of $\gca$ modules within baseline architectures.

Observation 3: Parameter-Efficient Model Scaling

The incorporation of $\gca$ allows for parameter-efficient scaling, outperforming parameter-matched baselines. This demonstrates that the orthogonal alignment mechanism offers a viable path to deploy large models effectively without linearly increasing parameters.

Figure 4: CDSRNP models benefit from increased representation capacity via orthogonal alignment.

Conclusion

This paper uncovers a significant insight into cross-domain sequential recommendation systems by establishing Orthogonal Alignment as a natural phenomenon within cross-attention mechanisms. The findings challenge prevailing interpretation norms and open pathways for advanced scaling strategies that maintain performance without proportionately larger models. This work prompts further exploration into orthogonal alignment across models, potentially expanding applications beyond recommendation systems into broader multi-modal and cross-domain challenges. The integration of orthogonal alignment insights into future AI models promises to unlock new dimensions of efficiency and effectiveness.