Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

149 tokens/sec

GPT-4o

9 tokens/sec

Gemini 2.5 Pro Pro

47 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

RXTX Algorithm for Efficient XXᵀ Computation

Updated 2 July 2025

RXTX Algorithm is an AI-derived recursive block algorithm that efficiently computes matrix-times-transpose products while reducing multiplications and additions.
It employs a 4×4 block partitioning and integrates MILP with reinforcement learning to optimize bilinear product selection and minimize redundant arithmetic.
Empirical results show up to 9% faster runtimes and notable efficiency gains across various matrix sizes, proving its practical advantage.

The RXTX algorithm is an AI-discovered recursive block algorithm for the efficient computation of the matrix-times-transpose product, specifically $XX^{t}$ for a real matrix $X \in \mathbb{R}^{n \times m}$ . RXTX achieves a reduction of both multiplications and total arithmetic operations (additions plus multiplications) by approximately 5% compared to previous state-of-the-art (SotA) approaches, with improvements holding across all matrix sizes—including small matrices ( $n = 4$ )—and compounding at larger scales. RXTX was developed using a combination of machine learning-guided search techniques and combinatorial optimization.

1. Algorithmic Structure and Recursion

RXTX proceeds by recursively partitioning the input matrix $X$ into $4 \times 4$ blocks: $X = \begin{pmatrix} X_1 & X_2 & X_3 & X_4 \ X_5 & X_6 & X_7 & X_8 \ X_9 & X_{10} & X_{11} & X_{12} \ X_{13} & X_{14} & X_{15} & X_{16} \end{pmatrix}.$ At each recursive step, RXTX computes $C = XX^{t}$ using 8 recursive calls for subblocks and 26 multiplications on block submatrices, followed by a recombination through optimized additions.

The recurrence for RXTX is: $R(n) = 8R(n/4) + 26M(n/4)$ where $R(n)$ is the number of multiplications required by RXTX and $M(n)$ is the number in Strassen-Winograd's general matrix multiplication.

The previous SotA recursive Strassen algorithm for $XX^{t}$ follows: $S(n) = 4S(n/2) + 2M(n/2)$ This block partitioning and customized recursion allow RXTX to exploit structure unique to products of the form $XX^{t}$ .

2. Explicit Operation Counts and Base Case

The recurrence resolves to explicit formulas for both RXTX and the prior SoTA: $R(n) = \frac{26}{41} M(n) + \frac{15}{41} n^{3/2} = \frac{26}{41} n^{\log_2 7} + \frac{15}{41} n^{3/2}$

$S(n) = \frac{2}{3} M(n) + \frac{1}{3} n^2 = \frac{2}{3} n^{\log_2 7} + \frac{1}{3} n^2$

The leading coefficient for RXTX, $26/41 \approx 0.6341$ , is approximately 5% lower than $2/3 \approx 0.6666$ for the previous SotA, reducing asymptotic operation counts.

At the $4 \times 4$ base case, RXTX computes 26 specific bilinear products $m_i$ , such as: $\begin{aligned} m_1 &= (-X_2 + X_3 - X_4 + X_8)\,(X_8 + X_{11})^T \ m_2 &= (X_1 - X_5 - X_6 + X_7)\,(X_{15} + X_5)^T \ &\vdots \ m_{26} &= (X_6 + X_{10} + X_{12})\,X_{10}^T \end{aligned}$ and 8 symmetric (diagonal) block products $s_j = X_j\,X_j^T$ for $j=1$ to $16$.

Recombination into the final output blocks, e.g.,

$C_{11} = s_1 + s_2 + s_3 + s_4$

$C_{12} = m_2 - m_5 - m_7 + m_{11} + m_{12} + m_{13} + m_{19}$

proceeds according to optimized addition schemes.

3. Arithmetic and Computational Efficiency

RXTX requires only 26 multiplications at the $4 \times 4$ base case level (contrast: 38 for Strassen), and this advantage compounds through recursion, yielding for large $n$ : $R(n) = \frac{26}{41} n^{\log_2 7} + \frac{15}{41} n^{3/2}$ The addition scheme is also optimized; the number of required additions at each recursive step is reduced from 139 to 100 via common subexpression elimination.

The total operation count (additions plus multiplications) is: $R_+(n) = \frac{156}{41} n^{\log_2 7} - \frac{615}{164} n^2 + \frac{155}{164} n^{3/2}$ By comparison, recursive Strassen for $XX^t$ requires: $S_+(n) = 4 n^{\log_2 7} - \frac{7}{4} n^2 \log_2 n - 3 n^2$ RXTX demonstrates a simultaneous reduction in both multiplication and total operation counts.

4. Discovery via Machine Learning and Combinatorial Optimization

RXTX was discovered through an AI-driven approach integrating reinforcement learning (RL) and combinatorial optimization. The process consists of two main components:

RL-guided Large Neighborhood Search: An RL agent generates candidate sets of bilinear (rank-1) products.
Mixed-Integer Linear Programming (MILP): Two MILP stages:
- MILP-A: For each target expression in $XX^{T}$ , enumerate linear combinations of the candidate products that realize the target.
- MILP-B: Find the minimal subset of candidates whose spans cover all targets.

Optimization proceeds by alternately sampling new candidate products and using MILP solvers to achieve compact representations. This approach restricts the search to potential bilinear forms, reducing the size of the combinatorial space and enabling feasible optimization on mid-sized matrices. This contrasts with tensor-based searches (e.g., AlphaTensor), which operate over significantly larger spaces.

5. Empirical Performance and Practical Thresholds

RXTX yields theoretical and practical advantages:

For large $n$ , RXTX uses roughly 95% of the multiplications of the previous SotA.
On $6144 \times 6144$ real matrices, with a one-level RXTX application and subsequent BLAS block multiplications, RXTX achieved an average runtime of 2.524s, approximately 9% faster than the baseline BLAS routine (2.778s), outperforming in 99% of runs.
Performance thresholds indicate:
- RXTX outperforms recursive Strassen for $n \geq 256$ .
- RXTX overtakes the naive implementation for $n \geq 1024$ .
- With optimal recursive cutoffs, RXTX can outperform other methods at sizes as small as $n \approx 32$ , though this is hardware and implementation dependent.

Algorithm	Base Recursion	Asymptotic Constant	$4\times 4$ Rank
Previous SotA	$S(n) = 4S(n/2)+2M(n/2)$	$2/3 \approx 0.6666$	38
RXTX	$R(n) = 8R(n/4)+26M(n/4)$	$26/41 \approx 0.6341$	34

6. Enablers and Structural Innovations

Several key factors contribute to the efficiency of RXTX:

Structure Exploitation: RXTX is tailored to the symmetry of $XX^{T}$ , distinguishing it from generic matrix multiplication methods.
Block Recursion: Employing $4 \times 4$ block splitting (versus $2 \times 2$ in previous methods) permits more flexible and efficient recombination of products.
Optimized Additions: Automated search for common subexpressions minimizes redundant arithmetic, reducing total additions required.
Hybrid AI/Optimization Discovery: The integration of RL sampling and MILP-based combinatorial optimization enables the discovery of schemes that surpass those achieved by exhaustive search or human design.

RXTX is thus an AI-discovered, recursively-defined algorithm that systematically reduces arithmetic complexity for $XX^{t}$ computation, combining theoretical reduction with practical performance gains for a wide range of matrix sizes. The results demonstrate the productive intersection of machine learning search and combinatorial optimization in algorithmic discovery.

PDF Markdown Chat (Upgrade)