Quantifying energy–fidelity trade-offs in text-to-video diffusion

Determine the quantitative relationship between energy consumption and output video fidelity in text-to-video diffusion generation, specifying how changes in spatial resolution, temporal length, denoising steps, and model architecture affect the energy–quality trade-off.

Background

The paper focuses on latency and energy without assessing perceptual quality, even though practitioners must balance quality against computational costs in deployment.

By explicitly stating that the energy–fidelity question remains open, the authors highlight the need for systematic evaluation linking energy profiles to objective or subjective quality metrics across typical generation settings.

References

We deliberately excluded perceptual quality from our scope, leaving open the question of energy–fidelity tradeoffs.

— Video Killed the Energy Budget: Characterizing the Latency and Power Regimes of Open Text-to-Video Models (2509.19222 - Delavande et al., 23 Sep 2025) in Section: Limitations and Conclusion — Limitations paragraph

Quantifying energy–fidelity trade-offs in text-to-video diffusion

Background

References

Related Problems