Generalized Numerical Feature Testing (GNFT)

Updated 17 December 2025

GNFT is a comprehensive framework defining numerical features in both high-dimensional statistical models and floating-point hardware.
It employs analytic test-vector generation, precise input parametrization, and deterministic sampling to reveal rounding modes, normalization, and accumulator precision.
In statistical settings, GNFT enables robust multiple hypothesis testing with permutation-based procedures, ensuring strict control of error rates.

Generalized Numerical Feature Testing (GNFT) encompasses a rigorous class of methodologies for systematically ascertaining either statistical or numerical properties of high-dimensional data analysis or floating-point hardware, depending on domain context. In statistical settings, GNFT refers to a permutation-based, robust pipeline for multiple hypothesis testing over many generalized linear models. In hardware analysis, GNFT is associated with a universal, architecture-agnostic approach to detecting and characterizing numerical features—such as rounding, normalization, and internal accumulator precision—of floating-point matrix multipliers. Both strands are unified by the general philosophy of feature-agnostic, high-throughput, and statistically robust test procedures.

1. Formal Definitions of Numerical Features

In the domain of floating-point hardware, the GNFT methodology formally defines the relevant features for matrix multiplier analysis as follows (Khattak et al., 3 Sep 2025):

Rounding Modes:
- RM-BFMA: The rounding rule applied after block-FMA (Fused Multiply Add) or fused accumulation.
- RM-MBFMA: The rounding rule for summing results from distinct block-FMAs.
- All IEEE-754 rounding modes are tested: Round-to-Nearest-Even (RNE), Toward-Zero (RZ), Toward-+∞ (RU), Toward–∞ (RD), and truncation.
Normalization Behavior:
- Immediate normalization: Normalization occurs after every binary addition or FMA.
- Deferred normalization: Intermediate sums are maintained unnormalized, accumulating carries in an extended format.
Internal Accumulator Precision:
- n_eab: Number of extra alignment (guard) bits for aligning significands.
- n_ecb: Number of extra carry bits beyond the input precision.
- Block-FMA width (N): Maximum length of segments which are fused exactly in an extended accumulator before rounding.
- The effective accumulator width is $p_{acc} = p_{in} + n_{eab} + n_{ecb}$ .

Subnormal support, blockwise arithmetic behavior, and error metrics—absolute and relative error—complete the feature set.

In statistical contexts with many generalized linear models (GLMs), numerical features pertain to the number of hypotheses (m), dependence structure, and control of error rates such as FWER or FDR under non-ideal variance specifications (Santis et al., 2024).

2. General Algorithms for Test-Vector Generation

The GNFT methodology for hardware characterization relies on a parameterized, analytically driven test vector generation algorithm that does not require device-specific constants or exhaustive brute-force enumeration (Khattak et al., 3 Sep 2025). The central steps are:

Input parametrization: Use only $p_{in}$ and $p_{out}$ (significand widths) to define all variable ranges and target behaviors.
Sampling strategy: For each test, analytic patterns of magnitudes and exponent offsets are constructed to deterministically expose underflow, overflow, tie, carry chain, or alignment events.
Algorithmic structure:
- Test for subnormal support in input and output.
- Deduce extra alignment bits $n_{eab}$ via result patterns of aligned summands.
- Identify block-FMA width $N$ and extra carry bits $n_{ecb}$ by observing deviation from exactness in constructed sums.
- Detect normalization behavior by using overflow/underflow patterns that discriminate between immediate and deferred normalization.
- Infer rounding modes through LSB pattern detection in block sum outputs.

Empirical error is always computed as $\epsilon = d - d_{true}$ , with relative error $e_{rel} = |d - d_{true}| / |d_{true}|$ where needed.

The following table summarizes the main steps for hardware GNFT:

Step	Purpose	Depends On
Subnormal support test	Detects support for denormals	$p_{in}$ , $p_{out}$
Alignment bits ( $n_{eab}$ )	Measures guard bit usage	$p_{in}$ , prior tests
Block-FMA width ( $N$ ), carries	Measures accumulation properties	$p_{in}$ , $p_{out}$
Normalization behavior	Detects immediate/deferred policy	Constructed sum patterns
Rounding mode(s)	Identifies IEEE variant	Constructed tie inputs

3. Detecting and Quantifying Numerical Features

Detection procedures for each feature are explicitly formulated in GNFT (Khattak et al., 3 Sep 2025):

Rounding mode: By constructing inputs that lie exactly on rounding ties or in bit patterns differentiable by each IEEE mode, the observed result uniquely identifies the mode employed.
Normalization: By creating sum patterns designed to overflow the accumulator under deferred normalization but not immediate normalization, observing the output exposes the policy in use.
Alignment Bits: By careful exponent manipulation and alignment of inputs, the precise number of extra alignment bits is deduced by examining the LSB of the output.
Carry Bits: After fixing $N$ , carry bit extent is measured by observing sum accuracy as a function of additive term count, up to the theoretical limit $n_{ecb.max} = \lfloor \log_2(N \cdot (2 - 2^{-(p_{in}-1)})) \rfloor$ .

These tests are iteratively constructed: once $n_{eab}$ is detected, subsequent tests for $n_{ecb}$ and $N$ are conditioned on it. All detection formulas are fully parameterized by $p_{in}$ , $p_{out}$ , and previously detected feature values.

4. GNFT in High-Dimensional Statistical Testing

In settings where $m$ numerical features are each modeled as a GLM, GNFT denotes a pipeline for robust multiple hypothesis testing using permutation-based standardized flip-scores (Santis et al., 2024):

Univariate test: For each feature, define residuals, score contributions, and form a standardized test statistic using estimated variance.
Permutation-based null: Generate $w$ random sign-flip vectors to empirically estimate the null distribution of the test statistic, delivering robustness to variance misspecification.
Multiple testing correction: Use the same flips for all features, constructing a $w \times m$ "flip-score" matrix. Adjusted $p$ -values by single-step max–T yield strong FWER control (Westfall–Young).
Sequential/step-down procedure: Strong FWER control is maintained by iteratively removing the largest test statistic and recalculating quantiles.
Extension to multivariate/correlated tests: The procedure either uses a Mahalanobis-based global statistic (full covariance estimation, $O(m^3)$ cost), or a scalable marginal score normalization followed by max–T, preserving asymptotic error control under arbitrary correlation.

A key feature is that because all features use identical flips, cross-feature dependencies are handled without explicit parametric modeling, allowing robust error rate control in the presence of arbitrary response correlation.

5. Empirical Evaluations and Results

In hardware analysis, the GNFT methodology was applied to NVIDIA RTX-3060 (Ampere) and Ada RTX-1000 (Ada Lovelace) GPUs with input formats binary16, bfloat16, and TensorFloat32, and output formats binary16, binary32 (Khattak et al., 3 Sep 2025). Detected features are summarized below:

Input/Output	Device	Subnormals (In/Out)	n_eab	n_ecb	Norm	N	RM-BFMA	RM-MBFMA
binary16 → b16	RTX-3060	✓/✓	1	≥3	deferred	8	trunc	trunc
bfloat16 → b16	RTX-3060	✓/✓	1	≥3	deferred	8	trunc	trunc
TF32 → b16	RTX-3060	✓/✓	1	≥2	deferred	4	trunc	trunc
binary16 → b16	Ada-1000	✓/✓	1	≥3	deferred	8	trunc	trunc
TensorFloat32 → b16	Ada-1000	✓/✓	1	≥2	deferred	4	trunc	trunc

For binary32 output, both RM-BFMA and RM-MBFMA remain truncation, while for binary16 they switch to RNE (Round-to-Nearest-Even). A plausible implication is that even in newly released or future architectures (Hopper, Blackwell), the methodology remains effective without modification.

In statistical applications, empirical studies have demonstrated that permutation GNFT often achieves greater power than Bonferroni while retaining strict error rate control, especially in correlated high-dimensional settings (Santis et al., 2024).

6. Scalability, Generality, and Practical Considerations

A major strength of GNFT, in both hardware verification and statistical testing, is complete parameterization in terms of observable characteristics ( $p_{in}$ , $p_{out}$ , and detected feature values), without recourse to device-specific constants or ad hoc thresholds (Khattak et al., 3 Sep 2025, Santis et al., 2024). This makes the approach both architecture-agnostic and format-agnostic: as new low-bit floating-point standards (e.g., 8-bit, $p_{in}=4$ ) and emerging hardware platforms are introduced, the same GNFT formulas and detection logic remain valid.

For statistical testing pipelines, computational complexity is $O(nm + wm)$ storage and $O(wnm)$ runtime—tractable for $m$ in the thousands, and implementable in standard scientific computing environments (R, Python, MATLAB).

In summary, Generalized Numerical Feature Testing, whether in the context of multiple GLMs or floating-point matrix multiplication hardware, provides a universal, principled, and scalable methodology for feature detection and statistical control, robust against misspecification and rapidly adaptable to new formats and architectures (Khattak et al., 3 Sep 2025, Santis et al., 2024).

Markdown Report Issue Upgrade to Chat

References (2)

Generalized Methodology for Determining Numerical Features of Hardware Floating-Point Matrix Multipliers: Part I (2025)

Permutation-based multiple testing when fitting many generalized linear models (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Generalized Numerical Feature Testing (GNFT).