Size Distribution Modeling: Methods & Applications

Updated 4 December 2025

Size Distribution Modeling is the probabilistic representation of physical size variables, capturing key characteristics like tail behavior and skewness.
Analytical solutions, including closed-form quantile functions, enable efficient parameter estimation and robust Monte Carlo simulation.
These models are applied in diverse fields such as environmental studies, clinical data analysis, and risk assessment for accurate data fitting.

Size Distribution (SD) Modeling

A size distribution (SD) describes the probabilistic or deterministic distribution of a physical “size” variable—length, mass, radius, volume or related metric—over a collection of objects, particles, droplets, clusters, or aggregate structures in natural, engineered, or theoretical systems. SD modeling encompasses functional forms, inference methods, random number generation, parameter fitting, statistical and dynamical origins, and application-specific constraints. Modern SD models range from flexible parametric families capable of capturing a wide shape and tail behavior, to physically-motivated or mechanistically-derived distributions reflecting fragmentation, aggregation, or growth processes.

1. Mathematical Foundations and Classical Families

SDs are formally described by probability density functions (PDFs), cumulative distribution functions (CDFs), and associated quantile and hazard functions. Widely used classical families include Normal, Log-normal, Weibull, Gamma, Pareto and power-law distributions, each with characteristic tail behaviors, support, and moments. The selection of an appropriate SD is non-trivial: Different families can fit empirical data comparably well but exhibit subtle differences in representing skewness, truncation, or physical constraints (Hernández-Bermejo et al., 2019).

The S-distribution introduced by Voit (1992) and analyzed by Hernández–Bermejo and Sorribas is defined in terms of the CDF $F(x)=P(X\leq x)$ , via the differential equation: $\frac{dF}{dx} = f(x) = a\left(F^g - F^h\right)$ with $a>0,\;h>g$ , and $F(x_0)=F_0$ as location/initial condition. This family spans an extensive range of shapes, including heavy or truncated tails and variable skewness, and can represent both well-known distributions (via parameter choices) and entirely novel forms not reducible to classical families.

The Interpolating Family (IF) of Sinner et al. is a five-parameter model defined for $x\geq x_0$ : $f(x;p,b,c,q,x_0) = \frac{|b|q}{c}\left(\frac{x-x_0}{c}\right)^{b-1} G_p(x)^{-q-1} \left[1-\frac{1}{p+1}G_p(x)^{-q}\right]^p$ with $G_p(x)=(p+1)^{-1/q} + \left(\frac{x-x_0}{c}\right)^b$ . Here, $p$ serves as a continuous interpolation parameter, tuning the SD between pure power-law (Pareto-like) and exponential-cutoff (Weibull-like) behaviors (Sinner et al., 2016).

2. Analytical Solutions and Quantile Function Methods

Unlike most classical distributions, both the S-distribution and IF families admit analytical solutions for the quantile function $Q(p)=F^{-1}(p)$ , enabling direct inversion sampling and quantile-based fitting. For the S-distribution, Hernández–Bermejo and Sorribas derive: $Q(p) = x_0 + \frac{1}{a} \left\{ p^{1-g} \Phi(p^y,1,1+1/y) - F_0^{1-g} \Phi(F_0^y,1,1+1/y) \right\}$ where $y=h-g$ and $\Phi$ denotes the Lerch transcendent. This closed form facilitates efficient Monte Carlo generation of S-distributed random variates and quantile-matching regression (Hernández-Bermejo et al., 2019).

The IF family provides explicit inverse-CDF formulas for all $p$ , distinguishing $b>0$ versus $b<0$ cases. Sampling is achieved by drawing $U\sim\mathrm{Uniform}(0,1)$ and computing $X=Q_p(U)$ , with no need for numerically solving transcendental equations. This analytical tractability is particularly advantageous over generalized Beta-type families (Sinner et al., 2016).

3. Parametric Roles, Skewness, Tails, and Constraints

Model parameters in advanced SD families serve distinct roles:

Parameter	S-Distribution	IF Family	Common Effect
Scale	$a$ (inverse spread)	$c$	Shrinks/stretches overall width
Location	$x_0, F_0$	$x_0$	Shifts median/minimum support
Tail exponent	$y=h-g$ (diff. shape)	$q$	Controls right-tail heaviness
Shape/skew	$g$	$b$	Left-tail truncation/skewness
Interpolant	—	$p$	Interpolates power-law ↔ Weibull limit

Cases with $g<1$ in the S-distribution guarantee a hard lower bound $Q(0)$ , enabling models where $P(X\leq X_c)=0$ by solving $Q(0)=X_c$ for $x_0$ or $a$ (Hernández-Bermejo et al., 2019). Similarly, $x_0$ in the IF family sets the minimum size, and $p$ tunes tail decay. Both models can enforce physical lower cutoffs or support constraints that standard families cannot respect.

Tail behavior and skewness influence representation of rare large sizes and truncation at small sizes. For broad unimodal SDs, skewness is essential in capturing empirical asymmetries, while tail parameters determine fit accuracy in high-impact domains (e.g., insurance, survival analysis) (Sinner et al., 2016).

4. Model Fitting, Estimation, and Empirical Performance

SD parameter estimation is typically performed via nonlinear least squares, maximum likelihood, or quantile-based matching. In the S-distribution framework, Hernández–Bermejo and Sorribas recommend a two-stage procedure:

Initial estimate: Fit the differential form $f_i \approx a(F_i^g-F_i^h)$ to histogram or empirical CDF.
Quantile-matching: Order observed sample, map empirical probabilities $p_j=(j-0.5)/N$ , then minimize $\sum_j [X_{(j)}-Q(p_j)]^2$ over adjustable parameters ( $a$ , $x_0$ ).

This approach converges rapidly, fitting unimodal SDs of diverse origin. Empirical case studies include truncated fish-length distributions and clinical ICU measurement histograms, where visual overlay and QQ-plot provide fit quality assessment (Hernández-Bermejo et al., 2019).

The IF family uses explicit log-likelihood: $\ell(p,b,c,q,x_0) = n\ln\frac{|b|q}{c} + (b-1)\sum_i \ln(x_i-x_0) - (q+1)\sum_i\ln G_p(x_i) + p\sum_i\ln\Bigl[1-\frac{1}{p+1}G_p(x_i)^{-q}\Bigr]$ to estimate parameters by numerical maximization. Standard methods converge reliably, and the closed-form nature of CDF and quantile formulas simplifies handling of censored or truncated data. Comparative likelihood ratio and AIC/BIC tests favor IF over GB2, Weibull, and pure Pareto in diverse domains (Sinner et al., 2016).

5. Random Variate Generation and Simulation

Both families support direct inversion-based random variate generation. For the S-distribution:

def draw_S(a, g, h, x0, F0):
    y = h - g
    u = np.random.uniform(0,1)
    Delta0 = F0**(1-g) * lerchphi(F0**y, 1, 1+1/y)
    Delta = u**(1-g) * lerchphi(u**y, 1, 1+1/y)
    x = x0 + (1/a)*(Delta - Delta0)
    return x

where

\texttt{lerchphi}

is the Lerch transcendent (Mathematica/SciPy). The procedure scales efficiently to

10^4

samples per second (Hernández-Bermejo et al., 2019).

IF family sampling uses the explicit quantile $Q_p(u)$ for $u\in(0,1)$ , enabling efficient simulation, including under censored or bounded regimes. The simple form contrasts with general beta-type distributions, where normalization and inversion are more involved (Sinner et al., 2016).

6. Practical Recommendations and Case Studies

For size-distribution modeling tasks, modern flexible parametric families offer multiple advantages:

Explicit control over lower bounds, skew and tails.
Analytical quantile and inverse-CDF formulas enable robust fitting and simulation.
Capability to nest standard SDs (Pareto, Weibull, Burr XII, Fréchet, etc.) and switch behavior via parameters.
Empirical validation in insurance, environmental, clinical and survival data sets demonstrates superior fit, particularly in tail regions (Sinner et al., 2016, Hernández-Bermejo et al., 2019).

Recommended workflow:

Fit the full S-distribution or IF model to observed or simulated data; perform diagnostic convergences and QQ-overlay.
Examine parameter estimates: If $p$ is near zero, a simpler power-law model may suffice; if $p\to\infty$ , an exponential-cutoff model applies.
Use closed-form formulas for risk and survival probabilities, or for random sampling in Monte Carlo simulation.
For censored/truncated samples, exploit direct quantile or CDF inversion, avoiding numerical integration.

The flexibility, analytic tractability, and quantitative fit provided by the S-distribution and the Interpolating Family equip researchers with robust universal frameworks for unimodal size-distribution modeling, while maintaining practical ease of use and extensibility (Hernández-Bermejo et al., 2019, Sinner et al., 2016).

References

Hernández–Bermejo B., Sorribas A., "Analytical Quantile Solution for the S-distribution, Random Number Generation and Statistical Data Modeling" (Hernández-Bermejo et al., 2019)
Sinner A., Stephanou M., Blanchard G., "An Interpolating Family of Size Distributions" (Sinner et al., 2016)
Voit E. O., Biom. J. 7:855-878 (1992)

Markdown Report Issue Upgrade to Chat

References (2)

Analytical Quantile Solution for the S-distribution, Random Number Generation and Statistical Data Modeling (2019)

An Interpolating Family of Size Distributions (2016)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Size Distribution (SD) Modeling.