CGL @ ETHZ - Spatiotemporal Diffusion Priors for Extreme Video Compression

Spatiotemporal Diffusion Priors for Extreme Video Compression

Lucas Relic, A. Emmenegger, R. Azevedo, Y. Zhang, M. Gross, C. Schroers

To appear: Picture Coding Symposium (Aachen, Germany, December 8-11, 2025)

Abstract

Diffusion models have recently demonstrated impressive results in image compression, where the strong spatial prior enables the synthesis of fine details rather than allocating bits to transmit them. In this work, we propose to extend this paradigm to video compression by utilizing a generative spatiotemporal prior and present the first codec based on a video diffusion model. Our method operates by performing long-context interpolation guided by sparse inter-frame predictions, thus requiring minimal motion information. To this end, we develop a sparse, bidirectional optical flow which serves as a bitrate-efficient motion conditioning in the diffusion decoding process. The resulting codec can compress videos to extremely low rates (as low as 0.01 bits per pixel) while maintaining realistic textures and motion, and outperforms both neural and traditional baselines on several benchmark datasets. Our method shows state-of-the-art performance in perceptually-oriented distortion metrics, and, when considering rate-realism, we achieve an improvement in FID score of up to 73.3 at the same bitrate compared to the leading traditional video codec, VTM. Overall, we present an important first work examining spatiotemporal diffusion priors for video compression.

Downloads

[PDF]

Computer Graphics Laboratory

Spatiotemporal Diffusion Priors for Extreme Video Compression

Abstract

Downloads