Temperature Effects on Ensemble-Based Variance Reduction
Determine why ensembling multiple responses at non-zero decoding temperature reduces cross-lingual response variance and improves target-language accuracy more effectively than ensembling at zero temperature for large language models, and evaluate the influence of model-specific temperature handling (e.g., mapping temperature 0 to a non-zero value) on this effect.
References
It is unclear why we obtained better target accuracy when ensembling at non-zero temperature: 70% (at non-zero temperature) vs 65% (at zero temperature).
                — Rethinking Cross-lingual Gaps from a Statistical Viewpoint
                
                (2510.15551 - Piratla et al., 17 Oct 2025) in Appendix: Ensembling with a single LLM (Section: appendix:single_model_ensemble)