Aggregated Score Ratios (ASR)
- Aggregated Score Ratios (ASR) are a theoretical concept aimed at combining multiple forensic similarity scores into a composite likelihood ratio, though no formal methodology exists.
- The critique in [1910.05240] highlights that current score-based likelihood ratios rely solely on single scores, exposing limitations in addressing evidential strength.
- Future research on ASR must overcome challenges in joint-density modeling and coherence to align with Bayesian principles for effective forensic analysis.
Aggregated Score Ratios (ASR) do not appear in the literature surveyed within "Defence Against the Modern Arts: the Curse of Statistics 'Score-based likelihood ratios'" (Neumann et al., 2019), nor is any formal construction or definition provided under this designation. This term does not occur in the referenced paper, and no methodology for aggregating multiple similarity scores into a composite ratio is given. All methods and analysis in the work focus exclusively on single-score likelihood ratios, which are constructed from a scalar-valued similarity or distance measure between forensic samples. Nevertheless, the context and critique of score-based likelihood ratios provided in (Neumann et al., 2019) offer insight into the limitations and theoretical issues that would apply to any multi-score aggregation strategy, should one be proposed in future work. The following sections describe the relevant background and critique of score-based likelihood ratios, with emphasis on the factual content from (Neumann et al., 2019).
1. Overview of Score-Based Likelihood Ratios
Score-based likelihood ratios (score-based LRs) form a family of methods in which a similarity or difference score between pieces of forensic evidence is mapped to a likelihood ratio that quantifies the evidential value. In all cases covered by (Neumann et al., 2019), the comparison between a questioned sample and a reference sample is reduced to a single scalar score, typically denoted as or kernel , where and are feature vectors representing the samples. The likelihood ratio is then constructed as a function of this score:
No aggregation of multiple scores is described.
2. Families of Single-Score Methods
The paper identifies and critiques four common families of single-score LR methods:
- Common-Source Methods: The score is computed under the assumption that both items come from the same source.
- Suspect-Centred Methods: The score reflects the similarity of the questioned item to the suspect’s known material.
- Trace-Centred Methods: The score centers the metric on the questioned (trace) material.
- Asymmetric Methods: The score does not symmetrically treat the compared items.
Each approach seeks to estimate the probability or frequency of the observed score under two alternative hypotheses. However, (Neumann et al., 2019) demonstrates that, regardless of the mapping used, these methods typically fail to converge to the true Bayes factor and often violate the desired properties of coherence and evidential sufficiency.
3. Absence of Multi-Score Aggregation
Nowhere in (Neumann et al., 2019) is a general multi-score aggregator—such as a ratio of joint densities of multiple scores,
—presented or advocated. Every method studied in the work collapses the forensic comparison to a single scalar, and all discussion of validation, convergence, and geometric intuition pertains exclusively to univariate score-based models.
This suggests that multi-score fusion, joint-density modelling, or any similar aggregated scoring construction remains outside the scope of the methodologies and analyses contained in (Neumann et al., 2019).
4. Validation and Critique of Single-Score Approaches
Extensive validation in the paper relies on simulation experiments in which score-based LRs are plotted against true LRs, using hierarchical normal models for forensic features. These experiments highlight limitations, including:
- Systematic incoherence, broad departures from the Bayes factor.
- Pathologies both in the central tendency and variability of the proxy score-based likelihood ratios.
- Lack of general convergence to the ideal Bayesian answer, even under favorable statistical conditions.
At no point are there protocols, performance plots, or cost analyses dedicated to multiple-score ratios, aggregations, or fusions. All figures and recommendations exclusively address single-score validation.
5. Practical Implications and Limitations
All practical recommendations in (Neumann et al., 2019) pertain to the substantial theoretical and practical limitations of single-score LR methods: the necessity of correctly specifying case-specific likelihoods, the dangers of ad hoc heuristics, and the pitfalls arising from unprincipled conditioning. Attention is drawn to the impossibility of attaining both coherence and convergence to the Bayes factor when the comparison is reduced to a single score.
A plausible implication is that, should an aggregated score ratio be developed, it would need to directly address these foundational issues of incoherence and lack of convergence. The paper does not enumerate procedures for combining or fusing multiple scores, nor does it provide a validation framework for such constructs.
6. Context and Future Directions
Although the term "Aggregated Score Ratio (ASR)" is absent from (Neumann et al., 2019), the general challenge of reliably quantifying evidential strength using derived scores remains central. For any future method pursuing the aggregation of multiple scores into a likelihood-ratio framework, the analyses and critiques in the paper serve as a caution, emphasizing the necessity of rigorous probabilistic modeling of the evidential process.
A plausible implication is that researchers seeking principled aggregation would need to go beyond the score-based LR family studied in (Neumann et al., 2019), possibly by incorporating joint density modeling or learning-based fusion methods from allied fields such as speaker recognition or machine learning.
7. Summary Table of Methods (as in (Neumann et al., 2019))
| Method | Score Input | Aggregation of Scores |
|---|---|---|
| Common-Source | Single scalar | None |
| Suspect-Centred | Single scalar | None |
| Trace-Centred | Single scalar | None |
| Asymmetric | Single scalar | None |
No method in (Neumann et al., 2019) aggregates multiple scores or defines an ASR. All approaches reduce evidence comparison to a single score and then map that to a likelihood-ratio proxy.
No definition, methodology, or empirical evaluation of Aggregated Score Ratios (ASR) is provided in "Defence Against the Modern Arts: the Curse of Statistics 'Score-based likelihood ratios'" (Neumann et al., 2019). The paper’s comprehensive critique applies exclusively to single-score-based likelihood ratios, and any attempt to extend these ideas to multi-score aggregation must address the foundational shortcomings identified in the single-score case.