Dice Question Streamline Icon: https://streamlinehq.com

Cost-effectiveness of OpenAI’s o1-preview for planning tasks

Determine whether employing OpenAI’s o1-preview Large Reasoning Model for planning problems is cost-effective under its reasoning-token pricing scheme compared to alternative approaches such as classical planners and LLM-Modulo systems.

Information Square Streamline Icon: https://streamlinehq.com

Background

Unlike typical LLM APIs that charge by input and output tokens, o1’s pricing includes opaque “reasoning tokens” generated during inference and billed at a higher rate. The authors report substantial costs when evaluating o1 on PlanBench despite lower dollar and time costs for classical planning baselines like Fast Downward, which achieve guaranteed correctness.

Given the hidden nature of o1’s internal traces and the lack of control over reasoning tokens, establishing whether o1 delivers sufficient accuracy gains to justify its costs is crucial for practical deployment in planning domains.

References

While o1-preview may provide higher accuracy than LLMs, it still fails to provide any correctness guarantees, and it is unclear that it is at all cost-effective.

LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench (2409.13373 - Valmeekam et al., 20 Sep 2024) in Section 3, Accuracy/Cost Tradeoffs and Guarantees