Flexible Dirichlet Models
- Flexible Dirichlet (FD) is a class of multivariate distributions that generalizes the classical Dirichlet by integrating custom baseline distributions and generator functions.
- It supports tailor-made marginal behaviors and enables both positive and negative correlations, overcoming limits of standard Dirichlet models.
- FD models are applied in compositional data analysis and mixture modeling, offering improved fit and interpretability in real-world datasets.
The Flexible Dirichlet (FD) family refers to a class of multivariate probability distributions that generalize the classical Dirichlet, introducing greater flexibility in capturing marginal behaviors, dependence structure, and support. This is achieved through two principal frameworks: (1) Dirichlet-generated models—embedding baseline distributions in the Dirichlet generator; and (2) stochastic diffusion processes whose stationary laws are generalized Dirichlet distributions. Both frameworks expand modeling capacity for compositional data, densities on simplices, and multivariate phenomena constrained by a conservation principle (Bakosi et al., 2013, Arashi et al., 2019).
1. Formal Constructions and Definitions
Two distinct but closely related constructions underlie FD families.
Dirichlet-generated (Beta-generated-multivariate) construction:
Let . For each , let be the CDF of a baseline distribution (commonly Gamma(, )), and the CDF of a -dimensional Dirichlet() on the simplex . The FD cumulative distribution function is
The joint PDF is
where is the PDF of the baseline and the multivariate Beta function. The support is (Arashi et al., 2019).
Stochastic diffusion construction (Generalized Dirichlet diffusion):
Let be a vector-valued Itô diffusion always on the simplex , . The Fokker–Planck operator is constructed so that its unique stationary distribution is Lochner’s generalized Dirichlet law,
with parameter coupling as in the text, and functions of and (Bakosi et al., 2013).
2. Distributional Properties and Parameterizations
Both frameworks provide substantial modeling flexibility by decoupling marginal shape and dependence constraints present in classical Dirichlet models.
- Parameters:
- Dirichlet-generated model: baseline distribution parameters ( per component), Dirichlet shape parameters .
- Generalized Dirichlet: $2N$ parameters , with $2(N-1)$ degrees of freedom due to simplex constraint.
- Marginals:
Each marginal is a univariate beta-generated (e.g., beta-Gamma) distribution:
This enables control over skewness, tails, and modality not available in the fixed Beta() marginals of the standard Dirichlet.
- Dependence:
Both positive and negative covariances are achievable. The sign and strength of depend on (or ), whereas all classical Dirichlet off-diagonal covariances are strictly negative. In the generalized Dirichlet, sequential partitions admit a lower-triangular dependence structure, supporting arbitrary sign (Bakosi et al., 2013, Arashi et al., 2019).
3. Moment Structure and Correlations
Closed forms are available for expectations, variances, and covariances in both constructions.
- First moments (Dirichlet-generated):
where is an auxiliary multiple Beta-integral (Arashi et al., 2019).
- Second moments and covariances:
- Generalized Dirichlet explicit formulæ: For ,
where and as defined above (Bakosi et al., 2013).
- Possible correlation patterns:
is always negatively correlated with the rest; for , is sign-switchable via hyperparameter selection.
4. Parameter Estimation and Model Fitting
Maximum likelihood estimation is the standard approach for both FD constructions.
- Log-likelihood:
- Optimization:
Score equations involve digamma functions and derivatives of the baseline CDF/PDF; no closed-form solutions are available, but standard optimization routines (Newton–Raphson, quasi-Newton, optim() in R) yield reliable convergence for moderate and sample size (Arashi et al., 2019).
- Identifiability:
The SDE-to-parameter mapping in the diffusion construction is many-to-one: different SDE coefficients may yield the same , but each SDE determines a unique stationary law.
5. Flexibility, Support, and Special Cases
The FD family allows for custom support, tail behavior, and degenerate cases.
- Support:
- Classical Dirichlet: Simplex .
- FD: , a wedge in . Choice of allows custom marginal supports.
- Tail behavior:
- With a Gamma baseline, each exhibits exponential tail modulated by a polynomial .
- Using Pareto, Weibull, Fréchet, etc., as baseline in allows modeling of heavy tails.
- Special cases:
- Uniform baselines and recover classical Dirichlet.
- Decoupling: and yields independent marginals.
- Relations to other families:
- Unifies classical Dirichlet, Liouville, beta-generated univariate families, and the generalized Dirichlet as special or limiting cases.
6. Applications and Empirical Performance
FD models are notably suitable for compositional data analysis and mixture modeling where flexible marginal and joint behavior is required.
- Real data examples:
- Pekin duck serum proteins (p=3): FD model captures extreme outliers, with lower AIC/BIC and KS distance than the Dirichlet (Arashi et al., 2019).
- White-cell counts (p=3): With negative correlations, FD model provides superior fit (assessed by QQ-plots, contours, and model selection metrics).
- Model testing:
New empirical-cdf KS test techniques are available for goodness-of-fit assessment (Arashi et al., 2019).
- Interpretation:
FD models handle both negative and positive dependencies as they arise in real-world compositional, phase-fraction, and biological data—features unattainable by standard Dirichlet models or Dirichlet diffusion processes (Bakosi et al., 2013, Arashi et al., 2019).
7. Conditional Distributions and Extensions
- Marginals:
Each is a beta-generated baseline variable (Arashi et al., 2019).
- Conditionals:
Conditionals are again beta-generated, with updated Dirichlet parameters depending on the observed values.
- Generalization:
The constructions are framework-agnostic: any continuous baseline with tractable can be inserted, and the generator need not be limited to Dirichlet forms if more complex correlation structures are required.
- Stochastic modeling:
In the diffusion framework, the FD law arises as the unique long-time law for a class of multidimensional Itô processes, providing an avenue for physical or biological systems modeling with conservation constraints and custom dependence patterns (Bakosi et al., 2013).
In summary, the Flexible Dirichlet family—encompassing Dirichlet-generated and generalized Dirichlet diffusion models—extends classical compositional modeling to a general class parameterized by both baseline distribution and generator characteristics. This enables closed-form marginals and higher moments, support for both signs of correlation, user-controlled tail behavior, and tractable parameter estimation for applications in compositional data, mixture models, and systems governed by conservation principles (Bakosi et al., 2013, Arashi et al., 2019).