Size Distribution Modeling: Methods & Applications
- Size Distribution Modeling is the probabilistic representation of physical size variables, capturing key characteristics like tail behavior and skewness.
- Analytical solutions, including closed-form quantile functions, enable efficient parameter estimation and robust Monte Carlo simulation.
- These models are applied in diverse fields such as environmental studies, clinical data analysis, and risk assessment for accurate data fitting.
Size Distribution (SD) Modeling
A size distribution (SD) describes the probabilistic or deterministic distribution of a physical “size” variable—length, mass, radius, volume or related metric—over a collection of objects, particles, droplets, clusters, or aggregate structures in natural, engineered, or theoretical systems. SD modeling encompasses functional forms, inference methods, random number generation, parameter fitting, statistical and dynamical origins, and application-specific constraints. Modern SD models range from flexible parametric families capable of capturing a wide shape and tail behavior, to physically-motivated or mechanistically-derived distributions reflecting fragmentation, aggregation, or growth processes.
1. Mathematical Foundations and Classical Families
SDs are formally described by probability density functions (PDFs), cumulative distribution functions (CDFs), and associated quantile and hazard functions. Widely used classical families include Normal, Log-normal, Weibull, Gamma, Pareto and power-law distributions, each with characteristic tail behaviors, support, and moments. The selection of an appropriate SD is non-trivial: Different families can fit empirical data comparably well but exhibit subtle differences in representing skewness, truncation, or physical constraints (Hernández-Bermejo et al., 2019).
The S-distribution introduced by Voit (1992) and analyzed by Hernández–Bermejo and Sorribas is defined in terms of the CDF , via the differential equation: with , and as location/initial condition. This family spans an extensive range of shapes, including heavy or truncated tails and variable skewness, and can represent both well-known distributions (via parameter choices) and entirely novel forms not reducible to classical families.
The Interpolating Family (IF) of Sinner et al. is a five-parameter model defined for : with . Here, serves as a continuous interpolation parameter, tuning the SD between pure power-law (Pareto-like) and exponential-cutoff (Weibull-like) behaviors (Sinner et al., 2016).
2. Analytical Solutions and Quantile Function Methods
Unlike most classical distributions, both the S-distribution and IF families admit analytical solutions for the quantile function , enabling direct inversion sampling and quantile-based fitting. For the S-distribution, Hernández–Bermejo and Sorribas derive: where and denotes the Lerch transcendent. This closed form facilitates efficient Monte Carlo generation of S-distributed random variates and quantile-matching regression (Hernández-Bermejo et al., 2019).
The IF family provides explicit inverse-CDF formulas for all , distinguishing versus cases. Sampling is achieved by drawing and computing , with no need for numerically solving transcendental equations. This analytical tractability is particularly advantageous over generalized Beta-type families (Sinner et al., 2016).
3. Parametric Roles, Skewness, Tails, and Constraints
Model parameters in advanced SD families serve distinct roles:
| Parameter | S-Distribution | IF Family | Common Effect |
|---|---|---|---|
| Scale | (inverse spread) | Shrinks/stretches overall width | |
| Location | Shifts median/minimum support | ||
| Tail exponent | (diff. shape) | Controls right-tail heaviness | |
| Shape/skew | Left-tail truncation/skewness | ||
| Interpolant | — | Interpolates power-law ↔ Weibull limit |
Cases with in the S-distribution guarantee a hard lower bound , enabling models where by solving for or (Hernández-Bermejo et al., 2019). Similarly, in the IF family sets the minimum size, and tunes tail decay. Both models can enforce physical lower cutoffs or support constraints that standard families cannot respect.
Tail behavior and skewness influence representation of rare large sizes and truncation at small sizes. For broad unimodal SDs, skewness is essential in capturing empirical asymmetries, while tail parameters determine fit accuracy in high-impact domains (e.g., insurance, survival analysis) (Sinner et al., 2016).
4. Model Fitting, Estimation, and Empirical Performance
SD parameter estimation is typically performed via nonlinear least squares, maximum likelihood, or quantile-based matching. In the S-distribution framework, Hernández–Bermejo and Sorribas recommend a two-stage procedure:
- Initial estimate: Fit the differential form to histogram or empirical CDF.
- Quantile-matching: Order observed sample, map empirical probabilities , then minimize over adjustable parameters (, ).
This approach converges rapidly, fitting unimodal SDs of diverse origin. Empirical case studies include truncated fish-length distributions and clinical ICU measurement histograms, where visual overlay and QQ-plot provide fit quality assessment (Hernández-Bermejo et al., 2019).
The IF family uses explicit log-likelihood: to estimate parameters by numerical maximization. Standard methods converge reliably, and the closed-form nature of CDF and quantile formulas simplifies handling of censored or truncated data. Comparative likelihood ratio and AIC/BIC tests favor IF over GB2, Weibull, and pure Pareto in diverse domains (Sinner et al., 2016).
5. Random Variate Generation and Simulation
Both families support direct inversion-based random variate generation. For the S-distribution:
1 2 3 4 5 6 7 |
def draw_S(a, g, h, x0, F0): y = h - g u = np.random.uniform(0,1) Delta0 = F0**(1-g) * lerchphi(F0**y, 1, 1+1/y) Delta = u**(1-g) * lerchphi(u**y, 1, 1+1/y) x = x0 + (1/a)*(Delta - Delta0) return x |
IF family sampling uses the explicit quantile for , enabling efficient simulation, including under censored or bounded regimes. The simple form contrasts with general beta-type distributions, where normalization and inversion are more involved (Sinner et al., 2016).
6. Practical Recommendations and Case Studies
For size-distribution modeling tasks, modern flexible parametric families offer multiple advantages:
- Explicit control over lower bounds, skew and tails.
- Analytical quantile and inverse-CDF formulas enable robust fitting and simulation.
- Capability to nest standard SDs (Pareto, Weibull, Burr XII, Fréchet, etc.) and switch behavior via parameters.
- Empirical validation in insurance, environmental, clinical and survival data sets demonstrates superior fit, particularly in tail regions (Sinner et al., 2016, Hernández-Bermejo et al., 2019).
Recommended workflow:
- Fit the full S-distribution or IF model to observed or simulated data; perform diagnostic convergences and QQ-overlay.
- Examine parameter estimates: If is near zero, a simpler power-law model may suffice; if , an exponential-cutoff model applies.
- Use closed-form formulas for risk and survival probabilities, or for random sampling in Monte Carlo simulation.
- For censored/truncated samples, exploit direct quantile or CDF inversion, avoiding numerical integration.
The flexibility, analytic tractability, and quantitative fit provided by the S-distribution and the Interpolating Family equip researchers with robust universal frameworks for unimodal size-distribution modeling, while maintaining practical ease of use and extensibility (Hernández-Bermejo et al., 2019, Sinner et al., 2016).
References
- Hernández–Bermejo B., Sorribas A., "Analytical Quantile Solution for the S-distribution, Random Number Generation and Statistical Data Modeling" (Hernández-Bermejo et al., 2019)
- Sinner A., Stephanou M., Blanchard G., "An Interpolating Family of Size Distributions" (Sinner et al., 2016)
- Voit E. O., Biom. J. 7:855-878 (1992)