Predictive Diversity Score (PDS)
- Predictive Diversity Score (PDS) is a metric that quantifies statistical dissimilarity between probability distributions using a tunable parameter derived from Jensen-Shannon divergence.
- It is constructed as a monoparametric family of metrics that ensures true metric properties for α in (0, 1/2], allowing fine control over sensitivity.
- PDS finds practical applications in tasks such as time series segmentation and quantum state discrimination by detecting subtle changes and anomalies.
The Predictive Diversity Score (PDS) is a metric derived from the Jensen-Shannon divergence (JSD) family, designed to quantify statistical dissimilarity between probability distributions with tunable sensitivity. The PDS is constructed as a monoparametric family of metrics from the classical JSD, enabling fine-grained control over the metric properties and their sensitivity in discriminative, clustering, or segmentation tasks. The formalism supports both classical and quantum extensions and is particularly advantageous for applications involving time series segmentation, symbolic sequences, and quantum state discrimination.
1. Jensen-Shannon Divergence: Foundation for Predictive Diversity
The Jensen-Shannon divergence between two probability distributions and over a finite alphabet is defined as
where the Kullback-Leibler divergence is . Alternatively, in terms of the Shannon entropy ,
JSD is symmetric, bounded, always finite, and well-defined even for zero-probability events.
2. Monoparametric Family of Metrics: Definition and Properties
The core result underpinning the PDS is the existence of a monoparametric family of distance metrics based on the JSD: where . This family interpolates between JSD () and its square root (), with only the latter interval producing true metrics (satisfying all four metric axioms: non-negativity, symmetry, identity of indiscernibles, and triangle inequality).
- Metric Validity: For any , is a metric.
- For , is not a metric (violates triangle inequality).
- For , is conjectured not to be a metric; construction of counterexamples supports this claim, though a general proof is open.
Explicitly, for distributions and ,
with .
Summary of Metric Exponents
| Exponent | Metric Property | Notes |
|---|---|---|
| True metric | Main proven result | |
| Conjectured not a metric | Supported by counterexamples | |
| Not a metric | Analytically proven |
3. Sensitivity and Tunability of the Predictive Diversity Score
The parameter in the PDS allows fine control over the sensitivity of the score:
- Lowering : Increases the magnitude and sensitivity to differences between distributions, making PDS particularly responsive to small or subtle changes.
- Choosing : Should be based on task-specific requirements; for example, for sequence segmentation, lower values within the valid range can enhance detection performance.
This tunability is practically significant for tasks where the detection of distributional changes or anomalies is crucial and where simple pointwise distances might lack sufficient discriminative power.
4. Application to Symbolic Sequence Segmentation
In symbolic sequence segmentation, the PDS serves as a flexible statistic for detecting change-points or nonstationary behavior. The main computation involves a moving–window version of the metric:
where , are symbol frequencies on either side of the segmentation point , and are relative window sizes. The position maximizing estimates the change-point.
Empirical results show that using lower yields more pronounced segmentation statistics, providing practical guidance for parameter selection in diverse segmentation scenarios.
5. Quantum Generalization of the Predictive Diversity Score
The PDS framework extends naturally to quantum information:
- Quantum JSD metric:
where , , and the maximization is over all POVMs.
- The family defines a quantum metric for .
- The quantum PDS thus serves as a bona fide metric for distinguishing density operators, with direct applicability to convergence testing, algorithm performance boundaries, and sensitivity analysis in quantum state space.
6. Implementation Considerations and Limitations
Computational Aspects
- Cost: The computation involves histogram-based probability estimation and Shannon entropy evaluations; for moderate alphabet sizes or sequence segmentation, complexity scales linearly in sequence length and polynomially (linearly in most practical regimes) in alphabet size.
- Parameter Selection: Explicit choice of is critical, requiring validation or domain knowledge.
- Domain of Applicability: While the PDS is well-defined for any pair of probability distributions on a finite alphabet, its quantum extension requires evaluation over all POVMs, which can be computationally demanding in high-dimensional Hilbert spaces.
Limitations
- For : The metric fails to satisfy triangle inequality, so one loses metric-space guarantees.
- Applications with severely limited data: Reliability of empirical probability estimates may be compromised, potentially affecting the sensitivity and stability of the PDS.
7. Significance and Impact in Statistical Data Analysis
The introduction of the Predictive Diversity Score as a monoparametric JSD-based metric offers a modular, interpretable, and tunable measure of distributional dissimilarity. It grounds applications such as symbolic sequence segmentation, detection of nonstationarity, and quantum state discrimination in a rigorous metric framework, with the advantage of tunable sensitivity. The quantum extension corroborates the general applicability of the construction to broader informational and physical contexts, including quantum information processing.
Overall, the PDS (i.e., the family ) provides a theoretically sound, practically robust alternative to single-parameter divergences in scenarios demanding accurate, flexible quantification of statistical diversity (Osán et al., 2017).