Dice Question Streamline Icon: https://streamlinehq.com

Measuring the energy contribution of audio generation in production T2V systems

Ascertain the energy cost attributable to audio generation components within production text-to-video systems that also synthesize audio, and quantify their contribution relative to the video generation pipeline under realistic usage.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper benchmarks energy and latency for video generation but does not include audio, even though many production systems generate synchronized audio alongside video.

The authors explicitly note that the energy contribution of audio remains unexplored, indicating the need to measure and compare audio’s share to GPU-dominated video inference to complete the end-to-end energy budget.

References

Finally, many production T2V systems (e.g., Veo) also generate audio, whose contribution to energy cost remains unexplored.

Video Killed the Energy Budget: Characterizing the Latency and Power Regimes of Open Text-to-Video Models (2509.19222 - Delavande et al., 23 Sep 2025) in Section: Limitations and Conclusion — Limitations paragraph