RXTX Algorithm for Efficient XXᵀ Computation
- RXTX Algorithm is an AI-derived recursive block algorithm that efficiently computes matrix-times-transpose products while reducing multiplications and additions.
- It employs a 4×4 block partitioning and integrates MILP with reinforcement learning to optimize bilinear product selection and minimize redundant arithmetic.
- Empirical results show up to 9% faster runtimes and notable efficiency gains across various matrix sizes, proving its practical advantage.
The RXTX algorithm is an AI-discovered recursive block algorithm for the efficient computation of the matrix-times-transpose product, specifically for a real matrix . RXTX achieves a reduction of both multiplications and total arithmetic operations (additions plus multiplications) by approximately 5% compared to previous state-of-the-art (SotA) approaches, with improvements holding across all matrix sizes—including small matrices ()—and compounding at larger scales. RXTX was developed using a combination of machine learning-guided search techniques and combinatorial optimization.
1. Algorithmic Structure and Recursion
RXTX proceeds by recursively partitioning the input matrix into blocks: At each recursive step, RXTX computes using 8 recursive calls for subblocks and 26 multiplications on block submatrices, followed by a recombination through optimized additions.
The recurrence for RXTX is: where is the number of multiplications required by RXTX and is the number in Strassen-Winograd's general matrix multiplication.
The previous SotA recursive Strassen algorithm for follows: This block partitioning and customized recursion allow RXTX to exploit structure unique to products of the form .
2. Explicit Operation Counts and Base Case
The recurrence resolves to explicit formulas for both RXTX and the prior SoTA:
The leading coefficient for RXTX, , is approximately 5% lower than for the previous SotA, reducing asymptotic operation counts.
At the base case, RXTX computes 26 specific bilinear products , such as: and 8 symmetric (diagonal) block products for to $16$.
Recombination into the final output blocks, e.g.,
proceeds according to optimized addition schemes.
3. Arithmetic and Computational Efficiency
RXTX requires only 26 multiplications at the base case level (contrast: 38 for Strassen), and this advantage compounds through recursion, yielding for large : The addition scheme is also optimized; the number of required additions at each recursive step is reduced from 139 to 100 via common subexpression elimination.
The total operation count (additions plus multiplications) is: By comparison, recursive Strassen for requires: RXTX demonstrates a simultaneous reduction in both multiplication and total operation counts.
4. Discovery via Machine Learning and Combinatorial Optimization
RXTX was discovered through an AI-driven approach integrating reinforcement learning (RL) and combinatorial optimization. The process consists of two main components:
- RL-guided Large Neighborhood Search: An RL agent generates candidate sets of bilinear (rank-1) products.
- Mixed-Integer Linear Programming (MILP): Two MILP stages:
- MILP-A: For each target expression in , enumerate linear combinations of the candidate products that realize the target.
- MILP-B: Find the minimal subset of candidates whose spans cover all targets.
Optimization proceeds by alternately sampling new candidate products and using MILP solvers to achieve compact representations. This approach restricts the search to potential bilinear forms, reducing the size of the combinatorial space and enabling feasible optimization on mid-sized matrices. This contrasts with tensor-based searches (e.g., AlphaTensor), which operate over significantly larger spaces.
5. Empirical Performance and Practical Thresholds
RXTX yields theoretical and practical advantages:
- For large , RXTX uses roughly 95% of the multiplications of the previous SotA.
- On real matrices, with a one-level RXTX application and subsequent BLAS block multiplications, RXTX achieved an average runtime of 2.524s, approximately 9% faster than the baseline BLAS routine (2.778s), outperforming in 99% of runs.
- Performance thresholds indicate:
- RXTX outperforms recursive Strassen for .
- RXTX overtakes the naive implementation for .
- With optimal recursive cutoffs, RXTX can outperform other methods at sizes as small as , though this is hardware and implementation dependent.
Algorithm | Base Recursion | Asymptotic Constant | Rank |
---|---|---|---|
Previous SotA | 38 | ||
RXTX | 34 |
6. Enablers and Structural Innovations
Several key factors contribute to the efficiency of RXTX:
- Structure Exploitation: RXTX is tailored to the symmetry of , distinguishing it from generic matrix multiplication methods.
- Block Recursion: Employing block splitting (versus in previous methods) permits more flexible and efficient recombination of products.
- Optimized Additions: Automated search for common subexpressions minimizes redundant arithmetic, reducing total additions required.
- Hybrid AI/Optimization Discovery: The integration of RL sampling and MILP-based combinatorial optimization enables the discovery of schemes that surpass those achieved by exhaustive search or human design.
RXTX is thus an AI-discovered, recursively-defined algorithm that systematically reduces arithmetic complexity for computation, combining theoretical reduction with practical performance gains for a wide range of matrix sizes. The results demonstrate the productive intersection of machine learning search and combinatorial optimization in algorithmic discovery.