Explore alternative similarity metrics for DaMCL

Investigate alternative probability-similarity metrics for Distribution-aware Minimal Context Length (DaMCL) beyond Jensen–Shannon Distance, and determine whether such metrics yield additional insights or complementary advantages when comparing next-token distributions under short versus full context across decoding strategies.

Background

DaMCL measures the minimal prefix length at which the decoding-based next-token distribution under a short context matches that under the full context according to a chosen similarity metric. The paper primarily adopts Jensen–Shannon Distance (JSD) due to its metric properties and robustness but also presents preliminary comparisons with TVD, KL, and an F1-based set metric.

The authors note that while JSD works well and other metrics show related behavior, more systematic exploration of alternative metrics could reveal complementary advantages or new insights into distributional convergence and context dependence.

References

Nonetheless, we acknowledge that further exploration of alternative metrics may reveal additional insights or complementary advantages. We leave this to future work.

Short-Context Dominance: How Much Local Context Natural Language Actually Needs? (2512.08082 - Vakilian et al., 8 Dec 2025) in Appendix, Section "DaMCL", Subsection "Additional Metrics for DaMCL"