Dice Question Streamline Icon: https://streamlinehq.com

Temperature Effects on Ensemble-Based Variance Reduction

Determine why ensembling multiple responses at non-zero decoding temperature reduces cross-lingual response variance and improves target-language accuracy more effectively than ensembling at zero temperature for large language models, and evaluate the influence of model-specific temperature handling (e.g., mapping temperature 0 to a non-zero value) on this effect.

Information Square Streamline Icon: https://streamlinehq.com

Background

In response ensembling experiments, the authors observe that default non-zero temperature yields better variance reduction and target accuracy than zero temperature, contrary to the expectation that zero temperature should suppress variance more.

They suggest that model-specific implementations may map zero temperature to a non-zero value, but explicitly state that the cause of the observed behavior is unclear, motivating targeted investigation.

References

It is unclear why we obtained better target accuracy when ensembling at non-zero temperature: 70% (at non-zero temperature) vs 65% (at zero temperature).

Rethinking Cross-lingual Gaps from a Statistical Viewpoint (2510.15551 - Piratla et al., 17 Oct 2025) in Appendix: Ensembling with a single LLM (Section: appendix:single_model_ensemble)