Scaled MAD Global Envelope Test

Updated 19 December 2025

Scaled MAD GET is a non-parametric, simulation-based methodology that uses the maximum absolute deviation with pointwise scaling to assess global hypotheses in multivariate and functional data.
The test produces a graphical envelope that highlights regions of significant deviation, offering both rigorous family-wise error control and clear visual diagnostics.
It is widely applicable in spatial statistics, image analysis, and permutation-based inference, underpinned by robust computational techniques and simulation approaches.

The Scaled Maximum Absolute Difference (MAD) Global Envelope Test (GET) is a non-parametric, simulation-based hypothesis testing framework for multivariate or functional data. It enables simultaneous global inference for an entire vector or curve by comparing an observed function or test-statistic vector to an ensemble of replicates simulated under a null reference model. By scaling the maximum absolute deviation pointwise, the procedure accommodates heteroskedasticity and spatiotemporal dependence, and provides both a global p-value with rigorous family-wise error control and a graphical envelope indicating the regions of significant deviation. The test has wide applicability in spatial statistics, image analysis, non-parametric function comparison, and permutation-based inference for general linear models (Myllymäki et al., 2013, Myllymäki et al., 2019, Mrkvicka et al., 2019, Tivenan et al., 17 Dec 2025).

1. Formal Hypothesis Framework

The core object is a set of multivariate or functional test-statistic vectors $\{T_i\}$ , each of length $d$ (or functions evaluated on a grid). The test addresses the global hypothesis: $H_0: T_\text{obs} \stackrel{\mathcal{D}}{=} T, \quad \text{for all } k=1,\ldots,d,$ where $T_\text{obs}$ is the observed data vector or curve and $T$ is a null-distribution sample (e.g., from a fitted model, permutation, or parametric bootstrap). The alternative is that there is at least one $k$ where this equality fails.

There are two common scenarios:

Simple null: All $T_i$ are generated under a specified $H_0$ distribution.
Composite null: Plug-in parameter estimates are used both for the observed and simulated curves, ensuring exchangeability.

For regime comparison, e.g., in spatial boundary change detection, $T_\text{obs}(x) = \hat{y}^{(A)}(x) - \hat{y}^{(B)}(x)$ can represent a difference of predicted functions from separate periods (Tivenan et al., 17 Dec 2025).

2. Scaled MAD Statistic and Envelope Construction

The procedure centers on the "maximum absolute deviation" (MAD), a reduction of each vector or function to a single number, and its scaled variants (Myllymäki et al., 2013, Myllymäki et al., 2019, Tivenan et al., 17 Dec 2025). The general scaled MAD statistic is: $M_i = \max_{k=1,\ldots,d} \frac{|T_{ik} - T_{0k}|}{s_k},$ where $T_{0k}$ is the pointwise center (often the Monte Carlo mean) and $s_k$ is the local scale, which can be chosen according to three main options:

Scaling Type	Scale Factor $s_k$	R Package Type
Unscaled	$1$	"unscaled"
Studentized	Monte Carlo standard deviation $\sigma_k$	"st"
Directional-quantile	$\|\overline{Q}_k - T_{0k}\|$ or $\|T_{0k} - \underline{Q}_k\|$	"qdir"

Studentized scaling: $s_k = \sqrt{\text{Var}(T_{ik})}$ , robust against nonconstant variance.
Directional-quantile scaling: $s_k$ is the upper or lower Monte Carlo quantile distance, capturing skew or tail behavior.
Unscaled: rarely recommended, as it fails when variability is not constant (Myllymäki et al., 2019).

After computing the MAD for all $s$ replicates, the critical value $c_\alpha$ is the $(1-\alpha)$ -quantile of $\{M_i\}$ . The global envelope at each $k$ is: $\left[T_{0k} - c_\alpha s_k,\ T_{0k} + c_\alpha s_k\right],$ with modifications for directional-quantile scaling.

3. Monte Carlo Test and Family-Wise Error Rate Control

The test implements a Monte Carlo hypothesis test based on the exchangeability of the observed $M_1$ and simulated $M_i$ under $H_0$ (Myllymäki et al., 2013, Myllymäki et al., 2019, Mrkvicka et al., 2019):

Compute the MAD statistic $M_1$ for the data and $\{M_i\}_{i=2}^s$ for the nulls.
Compute the (Monte Carlo) p-value:

$p = \frac{1}{s} \#\left\{i : M_i \ge M_1 \right\}$

Reject $H_0$ at level $\alpha$ if $p < \alpha$ , or equivalently, if $M_1 > c_\alpha$ .
The procedure controls the (simultaneous) family-wise error rate (FWER) exactly at $\alpha$ , even for highly dependent or heteroskedastic statistics.

For permutation-GLM applications, indexing follows the same pattern with $J$ permutations; the critical MAD value $Q$ is selected as the $(1-\alpha)$ -quantile among $J+1$ exchangeable $M_i$ (Mrkvicka et al., 2019).

4. Graphical Envelope and Intrinsic Graphical Interpretation

A distinctive feature of the scaled MAD GET is the intrinsic graphical interpretation (IGI) property (Myllymäki et al., 2019): the observed vector or function crosses the global envelope at a location $k$ if and only if the global test rejects at level $\alpha$ . The envelope band is thus both a confidence region and a visual diagnostic highlighting the precise locations contributing to overall significance.

This graphical property means one can:

Identify which regions or coordinates drive rejection.
Visualize departures in multivariate, functional, or spatial settings (e.g., distances, spatial coordinates, time-points).

5. Algorithmic and Computational Considerations

The practical implementation follows:

Simulate $s$ null replicates $T_2,\dots,T_s$ .
Calculate pointwise mean $T_{0k}$ and scale $s_k$ .
For each $i$ , compute $M_i$ .
Sort $\{M_i\}$ ; set $c_\alpha$ as the $(1-\alpha)$ quantile.
Construct the envelope $\left[T_{0k}-c_\alpha s_k,\ T_{0k}+c_\alpha s_k\right]$ .
Evaluate p-value; plot data vs. envelope for interpretation.

Key computational properties:

Number of replicates: hundreds to thousands needed for stable quantiles, especially with $\alpha \ll 0.05$ .
For high-dimensional data (e.g., imaging), running-sums and in-place accumulation enable large-scale application.
Each replicate's computational cost is dominated by simulation or permutation steps; envelope construction is $O(sd)$ (Mrkvicka et al., 2019).
The approach is implemented and optimized in the R package GET, as detailed in (Myllymäki et al., 2019).

6. Applications and Extensions

Scaled MAD GETs have been used in:

Spatial statistics: Goodness-of-fit for summary functions (Ripley's $K$ , $L$ , $J$ ), comparing spatial boundaries (Myllymäki et al., 2013, Tivenan et al., 17 Dec 2025).
Boundary change detection: Quantifying and testing the significance of shifts in estimated spatial boundaries under Gaussian process models (Tivenan et al., 17 Dec 2025).
Permutation-GLM and neuroimaging: Familywise error-corrected inference for functional regressors or contrasts at all voxels or nodes (Mrkvicka et al., 2019).
General functional data analysis: Non-parametric ANOVA, confidence region construction, regression central regions (Myllymäki et al., 2019).

A representative workflow is implemented in GET:

library(GET)
cset <- curve_set(r=grid, obs=T1, sim=matrix)
res <- global_envelope_test(cset, type="st", alpha=0.05, alternative="two.sided")
print(res$pvalue)
plot(res) # IGI band + data curve highlighted

7. Theoretical and Practical Considerations

Essential validity conditions include:

Exchangeability: Replicates and observed values must be exchangeable under $H_0$ , satisfied by permutation or parametric simulation strategies.
Scale-robustness: Studentized or quantile-based scaling protects against inhomogeneity in variance, skew, or high-dimensional dependence.
Bandwidth/interval choice: The envelope protects over the entire grid or region of interest; choice of this region should reflect all plausible departures (Myllymäki et al., 2013).
Computational efficiency: For large $d$ , per-location accumulation significantly reduces memory demands; in one-dimensional domains, Cholesky or state-space GP simulation enables rapid null curve generation (Tivenan et al., 17 Dec 2025).
Interpretation: IGI ensures that graphical envelope crossings correspond exactly to statistical rejection.

Common pitfalls include:

Unscaled MAD tests under heteroskedasticity.
Insufficient replicates for high-precision p-values at small $\alpha$ .
Failure to use the same estimator or processing pipeline for both observed and null replicates.

The scaled MAD GET offers a unified framework for global, interpretable, and robust hypothesis testing in high-dimensional, functional, or spatial contexts, with rigorous error control and strong interpretability (Myllymäki et al., 2013, Myllymäki et al., 2019, Mrkvicka et al., 2019, Tivenan et al., 17 Dec 2025).