filename     : Pra21b.pdf
entry        : article
conference   : Siggraph Asia 2021
pages        :
year         : 2021
month        : December
title        : Rendering with Style: Combining Traditional and Neural Approaches for High-Quality Face Rendering
subtitle     :
author       : Prashanth Chandran, Sebastian Winberg, Gaspard Zoss, Jeremy Rivere, Markus Gross, Paulo Gotardo, Derek Bradley
booktitle    : ACM Trans. Graph.
ISSN/ISBN    : 0730-0301
editor       : 
publisher    : ACM
publ.place   :
volume       : 40
issue        : 6
language     : English
keywords     : Rendering, Digital humans, Inpainting, StyleGAN, GAN
abstract     : For several decades, researchers have been advancing techniques for creating and rendering 3D digital faces, where a lot of the effort has gone into geometry and appearance capture, modeling and rendering techniques. This body of research work has largely focused on facial skin, with much less attention devoted to peripheral components like hair, eyes and the interior of the mouth. As a result, even with the best technology for facial
capture and rendering, in most high-end productions a lot of artist time is still spent modeling the missing components and fine-tuning the rendering
parameters to combine everything into photo-real digital renders. In this work we propose to combine incomplete, high-quality renderings showing
only facial skin with recent methods for neural rendering of faces, in order to automatically and seamlessly create photo-realistic full-head portrait
renders from captured data without the need for artist intervention. Our method begins with traditional face rendering, where the skin is rendered
with the desired appearance, expression, viewpoint, and illumination. These skin renders are then projected into the latent space of a pre-trained neural
network that can generate arbitrary photo-real face images (StyleGAN2). The result is a sequence of realistic face images that match the identity and
appearance of the 3D character at the skin level, but is completed naturally with synthesized hair, eyes, inner mouth and surroundings. Notably, we
present the first method for multi-frame consistent projection into this latent space, allowing photo-realistic rendering and preservation of the identity of the digital human over an animated performance sequence, which can depict different expressions, lighting conditions and viewpoints. Our method can be used in new face rendering pipelines and, importantly, in other deep learning applications that require large amounts of realistic training data with ground-truth 3D geometry, appearance maps, lighting, and viewpoint.