Full automation of the tensor core model refinement algorithm

Develop a fully automated version of the approximation-and-refinement procedure for determining accurate models of NVIDIA GPU tensor core matrix multipliers (Algorithm 1 in Section 3.3), eliminating the current manual step of inspecting mismatches between hardware and model outputs and modifying model features, so that the process autonomously iterates until bit-accurate agreement is achieved.

Background

The paper proposes an iterative method to approximate and refine software models of NVIDIA tensor cores. It begins with Generalised Numerical Feature Testing (GNFT) to identify key numerical characteristics, then uses an Input Space Search Method (ISSM) with randomized inputs to compare the model against hardware results. When mismatches occur, an expert manually inspects the failure cases and updates the model, and the process repeats until the model is sufficiently accurate.

While the GNFT step accelerates initial approximation, the authors note that automating the modification step—where hypotheses are formed and model features are adjusted based on discrepancies—remains challenging. Fully automating this procedure would enable rapid, repeatable determination of tensor core features across future GPU architectures without human intervention.

References

However, the full automation of Algorithm~\ref{alg:refine-model} is an open problem which we leave for future research.

Accurate Models of NVIDIA Tensor Cores (2512.07004 - Khattak et al., 7 Dec 2025) in Section 3.3 (Matrix Multiplier Model Approximation and Refinement), after Algorithm 1