Computer Graphics Laboratory ETH Zurich


Rendering with Style: Combining Traditional and Neural Approaches for High-Quality Face Rendering

P. Chandran, S. Winberg, G. Zoss, J. Rivere, M. Gross, P. Gotardo, D. Bradley

Proceedings of ACM SIGGRAPH Asia (Tokyo, Japan, Dec. 14-17, 2021), ACM Transactions on Graphics, vol. 40, no. 6, pp. 223:1-223:14


For several decades, researchers have been advancing techniques for creating and rendering 3D digital faces, where a lot of the effort has gone into geometry and appearance capture, modeling and rendering techniques. This body of research work has largely focused on facial skin, with much less attention devoted to peripheral components like hair, eyes and the interior of the mouth. As a result, even with the best technology for facial capture and rendering, in most high-end productions a lot of artist time is still spent modeling the missing components and fine-tuning the rendering parameters to combine everything into photo-real digital renders. In this work we propose to combine incomplete, high-quality renderings showing only facial skin with recent methods for neural rendering of faces, in order to automatically and seamlessly create photo-realistic full-head portrait renders from captured data without the need for artist intervention. Our method begins with traditional face rendering, where the skin is rendered with the desired appearance, expression, viewpoint, and illumination. These skin renders are then projected into the latent space of a pre-trained neural network that can generate arbitrary photo-real face images (StyleGAN2). The result is a sequence of realistic face images that match the identity and appearance of the 3D character at the skin level, but is completed naturally with synthesized hair, eyes, inner mouth and surroundings. Notably, we present the first method for multi-frame consistent projection into this latent space, allowing photo-realistic rendering and preservation of the identity of the digital human over an animated performance sequence, which can depict different expressions, lighting conditions and viewpoints. Our method can be used in new face rendering pipelines and, importantly, in other deep learning applications that require large amounts of realistic training data with ground-truth 3D geometry, appearance maps, lighting, and viewpoint.


Download Paper
Download Paper