Assess computational implications of adaptive short-prefix lengths in LSDS detection

Ascertain the computational implications and trade-offs of using adaptive short-prefix lengths (for example, setting the short context to 0.1|s|) in the Long–Short Distribution Shift (LSDS)–based long-context detection pipeline, and compare their performance and costs against fixed-length prefixes such as 32 or 64 tokens.

Background

LSDS is defined as the Jensen–Shannon Distance between next-token distributions obtained from a short context (default 32 tokens) and the full context under a given decoding method (typically nucleus sampling). The detector labels a sequence as long-context when LSDS exceeds a threshold.

The authors evaluate fixed short-prefix lengths and also an adaptive strategy where the short prefix is a fraction of the input length (e.g., 0.1|s|), observing consistent separability. However, they note that the computational implications of such adaptive schemes require further paper.

References

Such adaptive schemes may provide better cross-dataset generalization, although their computational implications require further study. We leave exploration of such direction for future work.

— Short-Context Dominance: How Much Local Context Natural Language Actually Needs? (2512.08082 - Vakilian et al., 8 Dec 2025) in Appendix, Section "Long-Context Sequence Detection", Subsection "Ablation on Short-Context Length", paragraph "Adaptive Short-Context Length"

Assess computational implications of adaptive short-prefix lengths in LSDS detection

Sponsor

Background

References

Related Problems