Replicability of o1 outputs under default temperature and the role of stochasticity
Ascertain whether outputs produced by OpenAI’s o1 model at default temperature 1.0 for a fixed planning instance are primarily due to stochastic sampling, and characterize the implications for replicability and interpretability.
References
The current model is also set to a default temperature of 1.0, which further reduces replicability and interpretability--for any given problem, it is never clear whether the result is merely the result of stochasticity.
                — LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench
                
                (2409.13373 - Valmeekam et al., 20 Sep 2024) in Section 3, Accuracy/Cost Tradeoffs and Guarantees, footnote on temperature and stability