SAforest: Advanced Forest Inventory Estimation

Updated 27 December 2025

SAforest is a suite of small area estimation methodologies tailored for forest inventory, addressing zero inflation, hierarchical sampling, and spatial autocorrelation.
It employs a two-stage modeling framework that combines binary (occurrence) and continuous (magnitude) regressions to accurately capture forest attributes.
Both frequentist and Bayesian implementations, including multivariate spatial models, offer robust uncertainty quantification and enhanced precision in domain-level estimates.

SAforest refers to a suite of small area estimation (SAE) methodologies and associated implementations for forest inventory applications, specifically designed to address key data complexities such as zero inflation, hierarchical sampling, spatial autocorrelation, and multivariate structure in large-scale forest monitoring programs. Across its various methodological instantiations, "SAforest" consistently denotes frameworks that are tailored to infer domain-level (e.g., county, species × county) forest parameters—such as biomass, carbon, or volume—by leveraging plot-level data, advanced statistical modeling, and computational tools, with strong emphasis on both precision and uncertainty quantification. The term arises in recent statistical and environmental literature to describe both univariate and multivariate Bayesian (and frequentist) frameworks for synthesized forest inventory estimation.

1. Core Modeling Frameworks

SAforest models fundamentally address the challenge of zero-inflation in forest inventory data, reflecting the large prevalence of zero-valued plots (e.g., non-forest, absence of a species, or structurally empty locations) and highly skewed positive values. The canonical model specification is the two-stage ("hurdle" or "zero-inflated") mixed-effects framework:

Stage 1—Binary modeling: $Z_{dj} = I(y_{dj} > 0)$ , typically via a mixed-effects logistic regression:

$\operatorname{logit} \Pr(Z_{dj} = 1 \mid \mathbf{x}_{dj}, v_d) = \mathbf{x}_{dj}^\top \boldsymbol{\beta} + v_d, \qquad v_d \sim N(0, \sigma_v^2)$

Stage 2—Conditional positive modeling: For $y_{dj} > 0$ , a linear mixed-effects regression is applied:

$y_{dj} \mid Z_{dj} = 1 = \mathbf{x}_{dj}^\top \boldsymbol{\gamma} + u_d + e_{dj}, \qquad u_d \sim N(0, \sigma_u^2),\; e_{dj} \sim N(0, \sigma_e^2)$

This structure allows explicit modeling of both the occurrence (presence/absence) and positive magnitude of the forest attribute of interest, with random effects facilitating information pooling across small areas and robust uncertainty propagation (White et al., 2024, White et al., 28 Mar 2025).

2. Hierarchical Bayesian Extensions and Multivariate Models

Recent developments generalize SAforest to address species-specific and multivariate estimation demands. Doser et al. (2025) formalize a fully multivariate spatial Bayesian hurdle framework for fine-scale estimation of species-level biomass, with the following properties (Doser et al., 10 Mar 2025):

Hierarchical structure: For each plot $i$ and species $j$ , the model specifies Bernoulli occurrence and log-normal positive-part models for $z_j(s_i)$ and $y_j(s_i)$ , respectively.
Spatial and cross-species covariance: Multivariate spatial dependence is induced via low-rank factor models (Linear Model of Coregionalization; LMC), embedding both spatial autocorrelation and species correlation structurally in the random intercepts.
Zero handling: Absence is represented by a near-degenerate normal on zero to stabilize the likelihood.
Unit-level inference: All predictions—including for arbitrary user-defined domains—are generated by aggregating posterior draws over prediction grids.

Key advantages include borrowing statistical strength across both space and species, accommodating strong zero inflation, and direct propagation of uncertainty to small-area estimates.

3. Estimation Procedures and Computational Implementation

Fitting of SAforest models relies on mixed-effects inference, with estimation routes depending on the specific formulation:

Frequentist (univariate): Both stages are estimated via (restricted) maximum likelihood (ML/REML), with empirical BLUPs for area-specific random effects (White et al., 2024).
Parametric bootstrap MSE: Uncertainty in small-area predictors is quantified using synthetic finite populations and resampling, following a multi-step simulation and reestimation protocol.
Bayesian (unit/multivariate): Full joint posteriors are targeted via MCMC (often with NNGP for scalable spatial random effects) and hierarchical priors (White et al., 28 Mar 2025, Doser et al., 10 Mar 2025). Model convergence is diagnosed via traceplots, effective sample size, and $\widehat{R}$ statistics.

R software packages implement these methods for operational use. Specifically, the saeczi package provides the two-stage zero-inflated frequentist estimator, its mean squared error estimator, and prediction utilities with parallel computing support (White et al., 2024). Multivariate Bayesian frameworks leverage packages such as spOccupancy, spAbundance, and rFIA for data extraction and model fitting (Doser et al., 10 Mar 2025, Stanke et al., 2021).

Illustrative R Implementation

install.packages("saeczi")
library(saeczi)

res <- saeczi_two(
  data    = df_sample,
  popData = df_pop,
  domain  = "domain",
  y       = "y",
  Zform   = Z ~ tcc + elev,
  Yform   = y ~ tcc + elev,
  B       = 200,
  MSE     = TRUE,
  parallel = TRUE,
  ncores   = 4
)

print(res$estimates)

4. Performance Evaluation and Comparative Results

Performance is consistently evaluated by Monte Carlo simulation and cross-validation over realistic forest inventory scenarios, typically at the county level.

Metric	SAforest	Post-stratified	Area-level EBLUP	Unit-level EBLUP
Relative Bias (PRB)	Smallest	Higher	Higher	Higher
Empirical RMSE	Lowest	Higher	Higher	Higher
Coverage (95% interval)	Near-nominal	Typically lower	Lower	Near-nominal

In Nevada and Washington studies, SAforest achieves markedly lower bias and RMSE and improved interval coverage, especially in the presence of strong zero-inflation (58–98% of plots per county being zero). Comparative studies further demonstrate that including both spatial random effects and county-specific error variance components in a two-stage Bayesian model yields optimal bias–RMSE–coverage trade-offs (White et al., 28 Mar 2025).

5. Practical Application Domains

SAforest frameworks are deployed in multiple operational and research contexts:

US FIA and NFI studies: Estimation of county-level and species-specific forest biomass, including covariates such as NDVI, EVI, TCC, elevation, aspect, and climate normals (White et al., 2024, Doser et al., 10 Mar 2025).
Species-level mapping: Simultaneous small-area estimation for 20+ species, demonstrating high correlation ( $r=0.85$ –$0.96$) with design-based and kNN estimators, but with ~91% of domain-level estimates showing higher precision (Doser et al., 10 Mar 2025).
Uncertainty quantification: Posterior variance, RMSE and coverage rates are provided at granular spatial scales.
Influence on inventory protocols: Facilitates more reliable inventory at management-relevant resolutions without increasing the field data burden.

6. Extensions, Limitations, and Research Directions

SAforest continues to evolve:

Multivariate and spatial generalizations allow explicit modeling of species interactions, more flexible spatial dependence, and extension to arbitrary domains.
Design-compatibility: Interfaces such as rFIA streamline integration of design-based and model-based inference pipelines (Stanke et al., 2021).
Model assumptions: Valid inference depends on the appropriateness of the model structure (e.g., treatment of zero inflation, random effects). For very small domains, variance estimation can become unstable if sample counts are minimal.
Computational demands: Advanced spatial models (NNGP, multivariate LMC) require significant computational resources and careful diagnostic checking for MCMC convergence and mixing (White et al., 28 Mar 2025, Doser et al., 10 Mar 2025).
A plausible implication is that future advancements may focus on scaling Bayesian spatial models for ultra-large inventories and integrating remote sensing with field plot data.

Complementary SAE methodologies include area-level spatial Fay–Herriot models and temporal mixed-effects models, often prepared using rFIA for data ingestion and direct estimation. While such approaches yield efficient inference for large domains and timespans, they generally underperform for heavily zero-inflated, species-specific, or fine-domain targets compared to the two-stage unit-level (SAforest) strategies (Stanke et al., 2021, White et al., 2024).

In summary, SAforest represents the current state-of-the-art for small-area estimation in zero-inflated forest inventory contexts, blending hierarchical modeling, spatial statistics, and accessible software implementations to support precision forestry and resource assessment at actionable scales.