Executable Function Accuracy (EFA)

Updated 5 February 2026

Executable Function Accuracy (EFA) is a framework for quantifying implementation-specific errors in numerical programs by measuring worst-case absolute errors and ULP discrepancies.
It combines static error analysis, affine arithmetic, and empirical benchmarking to bound approximation and roundoff errors in floating-point computations.
Automated tuning and regression testing techniques in EFA ensure optimized performance and maintained accuracy in critical numerical and generative math applications.

Executable Function Accuracy (EFA) refers to rigorously quantified, implementation-specific error or correctness criteria for functions executed inside computer programs, often with a focus on floating-point arithmetic, elementary mathematical functions, or parameterized problem-generating abstractions. The term EFA spans: (a) the worst-case (absolute or relative) error introduced by finite-precision evaluation in numerical codes (Darulova et al., 2018, Mikaitis et al., 6 Sep 2025) and (b) the specification and validation properties of executable, parametric mathematical problem generators (Khan et al., 14 Apr 2025). Approaches to EFA combine static error analysis, empirical validation using high-precision arithmetic, program synthesis, and automated test-bench methodology. The sections below survey EFA’s formulations, error metrics, benchmarking strategies, tool architectures, and representative case studies.

1. Foundational Definitions and Error Metrics

EFA is most classically defined for numerical programs $f: \mathbb{R}^n \to \mathbb{R}$ , where $f$ denotes the real-valued computation and $\widetilde{f}$ its finite-precision implementation. The primary metric is worst-case absolute error over a user-specified domain $D$ :

$\varepsilon_{\text{total}} = \max_{x \in D} | f(x) - \widetilde{f}(x) |$

This can be decomposed into

Approximation error $\varepsilon_{\text{approx}}$ : due to replacing exact function calls (from libm or similar) by polynomial approximations.
Roundoff error $\varepsilon_{\text{roundoff}}$ : arising from floating-point operations (add, subtract, multiply, divide, sqrt) and from evaluating the polynomial approximations themselves.

The total is bounded as:

$\varepsilon_{\text{total}} \leq \varepsilon_{\text{approx}} + \varepsilon_{\text{roundoff}}$

Relative error, $\varepsilon_r(x) = |f(x) - \widetilde{f}(x)| / |f(x)|$ , is handled by converting to an absolute error budget via a lower bound on $|f(x)|$ (Darulova et al., 2018).

For transcendental or elementary functions ( $e^x$ , $\log(x)$ , $\sin(x)$ , etc.), where correct rounding is not generally guaranteed across implementations, EFA is also measured in ULPs (Units in the Last Place) as:

$E(x) = \frac{| \widehat{f}(x) - f_{\text{exact}}(x) |}{\operatorname{ULP}_p(rz(f_{\text{exact}}(x)))}$

where $\widehat{f}(x)$ is the computed value in target floating-point format, $f_{\text{exact}}(x)$ is the high-precision reference, $\operatorname{ULP}_p$ is the ULP size in that format, and $rz$ denotes round-to-zero (Mikaitis et al., 6 Sep 2025). The global maximum $E(x)$ over $x \in D$ is then taken as the EFA of the implementation.

2. Static Error Analysis and Bound Calculation

Bounding EFA in approximation-augmented numerical programs involves two essential analyses:

Range Analysis: For each intermediate expression, interval or affine arithmetic bounds its possible values over $D$ .
Error Propagation: Affine arithmetic is employed to bound the accumulation of floating-point roundoff error, using the standard model $x \oplus_\text{fl} y = (x \oplus y)(1 + \delta)$ for $|\delta| \leq \epsilon_m$ and for each $\oplus \in \{+,-,\times,\div\}$ .

This multiphase analysis is exemplified in frameworks such as Daisy, which statically determine both $\varepsilon_{\text{approx}}$ and $\varepsilon_{\text{roundoff}}$ for straight-line code that may include both standard arithmetic and calls to polynomial approximations whose error properties are known a priori (Darulova et al., 2018). Soundness of the error bounds is critical for formal verification and a-priori guarantees.

3. Automated Generation and Tuning of Approximations

A central application of EFA is the automatic replacement of costly libm-style elementary function calls with polynomial (or rational) approximations, tuned to satisfy user-supplied total error budgets. Tools such as Metalibm synthesize and verify polynomial approximations for univariate elementary functions $g$ over intervals $[a,b]$ with target relative error $\varepsilon_i$ :

Argument reduction leveraging symmetry, periodicity, or functional equations.
Optional domain splitting and table-driven offset corrections.
Remainder-polynomial evaluation (typically via Horner’s scheme).
Rigorous error proofs, e.g., with Gappa.

Key tunable parameters are polynomial degree $d$ and target error $\varepsilon_i$ , linked via the desired overall absolute or relative error budget at each function callsite. Degree selection is performed via linear search, balancing accuracy against operation count (code cost) (Darulova et al., 2018).

4. Benchmarking, Regression Testing, and Empirical EFA Measurement

To empirically assess EFA, exhaustive or sampled test-benches compute per-function maximal error over the floating-point input domain. An example is the Julia MathBenchmark.jl suite, which implements:

Exhaustive enumeration for small domains (binary16 univariates),
Fixed-step sampling for larger formats (binary32, binary64),
ULP error computation against MPFR-based high-precision references.

The framework supports all IEEE-754 binary floating-point types and documents, for each math function and format, the maximal ULP error, input sample(s) at which it is achieved, and numbers of samples. This approach exposes the range of possible ULP errors in real implementations:

For binary16, all 24 univariate functions studied showed maximal errors of approximately $0.5$ ULP.
For binary32, errors spanned $0.5$–$2.4$ ULPs.
For binary64, errors up to $\sim4$ ULPs (not exhaustive due to input-space size) (Mikaitis et al., 6 Sep 2025).

This methodology underpins regression-testing pipelines and quantitative reproducibility for math-library implementations.

Format	Correctly Rounded (≤0.5ULP)	Max. Observed ULPs (Others)
binary16	sqrt, sinh, asin, cospi, sinpi, cbrt, atanh, log2, tanh	~0.500–0.501
binary32	sqrt, cbrt	up to ~2.42
binary64	sqrt	up to ~4.0

5. Tradeoff Strategies: Accuracy, Performance, Budgeting

Automated EFA-tuning frameworks allow users to specify a total allowable error $\varepsilon_{\text{total}}$ , which is split between approximation and roundoff error. Error budgets are distributed among function calls by:

Equal split: $\tau_i = \tau/n$ .
Derivative weighting: Allocating larger error budgets to calls with smaller local-to-global magnification factors, as determined by symbolic differentiation of $f$ with respect to each $g_i$ .

Subsequently, local error targets guide the selection of polynomial degrees at each site. Resulting code can be inlined and linked, yielding end-to-end C programs with certified worst-case EFA (Darulova et al., 2018). Empirical results demonstrate tradeoffs:

With relaxed accuracy budgets (increase allowed $\varepsilon_\text{total}$ ), speedups of $+10\%\ldots+27\%$ are realized without violating user-supplied accuracy constraints (e.g., forwardk2jY: $+17.6\%$ speedup for a $10^4\times$ relaxed budget).
In all cases, a second round of analysis verifies the achieved EFA does not exceed the budget.

6. EFA for Executable Functional Abstractions in Math Generation

In a distinct but related domain, EFA denotes "Executable Functional Abstraction," formalized as a triple $(\Theta,\mathit{render},\mathit{solve})$ for families of parameterized math problems (Khan et al., 14 Apr 2025):

$\Theta \subseteq \mathbb{R}^d$ : parameter space.
$\mathit{render}:\Theta\rightarrow\mathcal{Q}$ : parameter-to-question mapping.
$\mathit{solve}:\Theta\rightarrow\mathcal{A}$ : parameter-to-unique-answer mapping.

Validity of such EFAs is established through five invariant-checking unit tests: extractability, executability, degrees of freedom, single-valuedness, and original-seed consistency. Automated program synthesis pipelines (e.g., EFAGen) use these tests as reward signals for language-model-based code generation. This framework provides a systematic basis for generating, validating, and employing problem families in evaluation, adversarial testing, and data augmentation.

7. Practical Impact and Recommendations

Machine-checked EFA enables confident tuning and replacement of core numerical routines in performance-sensitive settings (e.g., simulation kernels, high-throughput analytics). It also supports maintainability, as regression tests on EFA can prevent accuracy regressions following code changes or optimizations, and can guide selection of correctly-rounded special functions critical for numerical reproducibility (Mikaitis et al., 6 Sep 2025). For generative math applications, formally verified EFAs systematically extend single instances into novel, difficulty-calibrated problems, supporting more robust learner or model evaluation (Khan et al., 14 Apr 2025).

The EFA paradigm thus unifies approaches to formal error budgeting in numerical computation and to executable programmatic abstractions in advanced math problem specification, with core principles rooted in explicit error quantification, mechanized validation, and empirically-grounded benchmarking.