OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender (2510.26104v1)

Published 30 Oct 2025 in cs.IR

Abstract: In recommendation systems, scaling up feature-interaction modules (e.g., Wukong, RankMixer) or user-behavior sequence modules (e.g., LONGER) has achieved notable success. However, these efforts typically proceed on separate tracks, which not only hinders bidirectional information exchange but also prevents unified optimization and scaling. In this paper, we propose OneTrans, a unified Transformer backbone that simultaneously performs user-behavior sequence modeling and feature interaction. OneTrans employs a unified tokenizer to convert both sequential and non-sequential attributes into a single token sequence. The stacked OneTrans blocks share parameters across similar sequential tokens while assigning token-specific parameters to non-sequential tokens. Through causal attention and cross-request KV caching, OneTrans enables precomputation and caching of intermediate representations, significantly reducing computational costs during both training and inference. Experimental results on industrial-scale datasets demonstrate that OneTrans scales efficiently with increasing parameters, consistently outperforms strong baselines, and yields a 5.68% lift in per-user GMV in online A/B tests.

Summary

The paper introduces OneTrans, merging sequence modeling and feature interaction into a unified Transformer for enhanced recommender performance.
It employs a unified tokenizer and mixed parameterization to efficiently process both sequential and static features, reducing computational overhead.
Experimental results reveal significant improvements in offline (AUC/UAUC) and online metrics (order/u, GMV/u) compared to conventional methods.

OneTrans: Unified Feature Interaction and Sequence Modeling with a Single Transformer in Industrial Recommender Systems

Introduction

The paper "OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender" introduces a novel approach for enhancing recommender systems by leveraging a unified Transformer architecture. Traditional recommender systems use distinct modules for sequence modeling and feature interaction, often limiting the ability for bidirectional information exchange and unified optimization. This paper proposes a transformative backbone called OneTrans that consolidates these tasks, which leads to improved performance and efficiency.

Motivation and Background

Industrial recommender systems typically follow a multi-stage pipeline consisting of recall and ranking stages. Conventional models for the ranking stage use either sequence modeling or feature interaction independently. The paper identifies key limitations in this traditional approach, including restricted information flow and increased computational latency due to separate execution paths for modeling and interaction modules. Inspired by the scaling success of LLMs such as those described in the works of Kaplan et al. [kaplan2020scaling], this paper aims to explore similar scaling principles in the context of recommender systems.

Methodology

OneTrans employs a unified Transformer framework that integrates sequence modeling and feature interaction into a single architecture. It utilizes a unified tokenizer that maps both sequential attributes (user behaviors) and non-sequential attributes (static features such as user profile) into a cohesive token sequence suitable for Transformer processing.

Architecture

The core architectural components of OneTrans include:

Unified Tokenizer: Converts inputs into token sequences for the Transformer to handle seamlessly. Sequential features are converted into S-tokens, while non-sequential attributes are mapped to NS-tokens.
Mixed Parameterization: The model assigns shared parameters for sequential tokens while providing token-specific parameters for non-sequential tokens, allowing the model to capture diverse feature interactions effectively.
Figure 1: System Architecture. (a)
Causal Attention and Cross-Request KV Caching: Enables precomputation and intermediate result caching, significantly reducing computational costs during both training and inference.
Pyramid Stack: A hierarchical reduction mechanism where token sequences are progressively shortened, focusing computational resources on the most pertinent tokens.

Experimental Evaluation

The paper conducts comprehensive experiments using industrial-scale data to evaluate OneTrans against traditional baselines, showing substantial improvements in both offline metrics like AUC and UAUC, and online business metrics such as order/u and GMV/u.

Key Findings

Performance Gains: OneTrans\textsubscript{L}, the larger configuration of the model, outperformed conventional models like RankMixer+Transformer in AUC/UAUC metrics, confirming its superiority in capturing complex interactions and sequences.
Efficiency: Leveraging LLM optimizations such as FlashAttention and mixed-precision training resulted in a significant reduction in FLOPS and memory usage, enabling practical deployment in high-throughput environments.

Figure 2: Trade-off: FLOPs vs.\ $\Delta$ UAUC

Implications

The integration of sequence modeling and feature interaction within a single Transformer architecture not only simplifies the modeling pipeline but also enhances performance and scalability. The findings indicate a strong potential for adopting similar architectures in other domains where large-scale data and complex feature interactions are prevalent.

Conclusion

OneTrans presents a robust and scalable solution for industrial recommender systems by unifying sequence modeling and feature interaction into a single Transformer framework. This design not only achieves improved prediction accuracy but also aligns with contemporary computational optimizations, making it suitable for real-world deployment. Future work will likely explore adaptive mechanisms and further scalability improvements to augment OneTrans' capabilities across diverse application scenarios.