Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 150 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 34 tok/s Pro
GPT-4o 113 tok/s Pro
Kimi K2 211 tok/s Pro
GPT OSS 120B 444 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Cross-attention Secretly Performs Orthogonal Alignment in Recommendation Models (2510.09435v1)

Published 10 Oct 2025 in cs.LG and cs.IR

Abstract: Cross-domain sequential recommendation (CDSR) aims to align heterogeneous user behavior sequences collected from different domains. While cross-attention is widely used to enhance alignment and improve recommendation performance, its underlying mechanism is not fully understood. Most researchers interpret cross-attention as residual alignment, where the output is generated by removing redundant and preserving non-redundant information from the query input by referencing another domain data which is input key and value. Beyond the prevailing view, we introduce Orthogonal Alignment, a phenomenon in which cross-attention discovers novel information that is not present in the query input, and further argue that those two contrasting alignment mechanisms can co-exist in recommendation models We find that when the query input and output of cross-attention are orthogonal, model performance improves over 300 experiments. Notably, Orthogonal Alignment emerges naturally, without any explicit orthogonality constraints. Our key insight is that Orthogonal Alignment emerges naturally because it improves scaling law. We show that baselines additionally incorporating cross-attention module outperform parameter-matched baselines, achieving a superior accuracy-per-model parameter. We hope these findings offer new directions for parameter-efficient scaling in multi-modal research.

Summary

  • The paper demonstrates that cross-attention naturally performs orthogonal alignment by extracting complementary, non-redundant information across domains.
  • The introduction of the Gated Cross-Attention module improves recommendation accuracy and parameter efficiency through effective latent space fusion.
  • Experimental findings show significant improvements in NDCG@10 and AUC scores, underscoring the method’s robustness against noisy, cross-domain data.

Cross-attention Secretly Performs Orthogonal Alignment in Recommendation Models

This paper investigates the mechanisms underlying cross-attention in cross-domain sequential recommendation (CDSR) systems, arguing that cross-attention not only facilitates residual alignment but also discovers novel orthogonal information, termed "Orthogonal Alignment." This insight emerges naturally and is linked to improved parameter scaling in recommendation models.

Introduction

Cross-domain sequential recommendation systems leverage interaction sequences across different platforms to enhance recommendation accuracy. Traditional models suffer from noisy and redundant data integration, leading to performance degradation. Cross-attention mechanisms have been widely adopted to address these challenges, facilitating the alignment and projection of representations from various domains into a unified latent space.

The conventional understanding of cross-attention focuses on residual alignment, where cross-attention predominantly refines the input by filtering irrelevant information. However, this paper introduces the concept of Orthogonal Alignment, where cross-attention naturally discovers and integrates orthogonal, non-redundant information that significantly enhances model performance without additional parameter reliance. Figure 1

Figure 1

Figure 1

Figure 1: Residual alignment and orthogonal alignment in cross-domain recommendation models.

Gated Cross-Attention Module (GCA)

The core innovation presented is the Gated Cross-Attention ($\gca$) module, designed to extract complementary orthogonal information during the alignment of sequences from different domains. The $\gca$ module operates by allowing the integration of novel information uncovered during cross-attention between domain sequences, thereby improving the model's representational capacity.

Formulation

The $\gca$ module can be formulated as follows:

1
2
3
4
def gca(X_A, X_B, ff, ca):
    X_A_prime = ca(query=X_A, key=X_B, value=X_B)
    gated_output = ff([X_A, X_B]) * X_A_prime
    return layernorm(X_A + gated_output)

Where:

  • XAX_A and XBX_B are the input sequences from domain AA and BB respectively.
  • ffff is a feedforward network aimed to control the integration of information.
  • caca refers to cross-attention operation with query XAX_A and key/value XBX_B. Figure 2

    Figure 2: In cross-domain sequential recommendation, various fusion structures form the backbone, where $\gca$ is applied.

Experimental Findings

Observation 1: Performance Improvement with $\gca$

The experimental results consistently show that $\gca$ modules substantially enhance recommendation accuracy across a variety of baseline models. The introduction of $\gca$ results in improved NDCG@10 and AUC scores across multiple domain recommendations.

Observation 2: Natural Emergence of Orthogonal Alignment

Analysis reveals a negative correlation between the cosine similarity of cross-attention inputs and outputs, underscoring the orthogonality phenomenon. The orthogonal alignment naturally emerges from the $\gca$ integration, providing robust performance improvements. Figure 3

Figure 3: Placement of $\gca$ modules within baseline architectures.

Observation 3: Parameter-Efficient Model Scaling

The incorporation of $\gca$ allows for parameter-efficient scaling, outperforming parameter-matched baselines. This demonstrates that the orthogonal alignment mechanism offers a viable path to deploy large models effectively without linearly increasing parameters. Figure 4

Figure 4

Figure 4: CDSRNP models benefit from increased representation capacity via orthogonal alignment.

Conclusion

This paper uncovers a significant insight into cross-domain sequential recommendation systems by establishing Orthogonal Alignment as a natural phenomenon within cross-attention mechanisms. The findings challenge prevailing interpretation norms and open pathways for advanced scaling strategies that maintain performance without proportionately larger models. This work prompts further exploration into orthogonal alignment across models, potentially expanding applications beyond recommendation systems into broader multi-modal and cross-domain challenges. The integration of orthogonal alignment insights into future AI models promises to unlock new dimensions of efficiency and effectiveness.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 5 likes.

Upgrade to Pro to view all of the tweets about this paper: