High-quality passive facial performance capture using anchor frames

T. Beeler, F. Hahn, D. Bradley, B. Bickel, P. Beardsley, C. Gotsman, M. Gross

Proceedings of ACM SIGGRAPH (Vancouver, Canada, August 7-11, 2011), ACM Transactions on Graphics, vol. 30, no. 4, pp. 75:1-75:10

Abstract

We present a new technique for passive and markerless facial performance capture based on anchor frames. Our method starts with high resolution per-frame geometry acquisition using state-of-the-art stereo reconstruction, and proceeds to establish a single triangle mesh that is propagated through the entire performance. Leveraging the fact that facial performances often contain repetitive subsequences, we identify anchor frames as those which contain similar facial expressions to a manually chosen reference expression. Anchor frames are automatically computed over one or even multiple performances. We introduce a robust image-space tracking method that computes pixel matches directly from the reference frame to all anchor frames, and thereby to the remaining frames in the sequence via sequential matching. This allows us to propagate one reconstructed frame to an entire sequence in parallel, in contrast to previous sequential methods. Our anchored reconstruction approach also limits tracker drift and robustly handles occlusions and motion blur. The parallel tracking and mesh propagation offer low computation times. Our technique will even automatically match anchor frames across different sequences captured on different occasions, propagating a single mesh to all performances.

Sample Dataset

We provide a sample dataset for research purposes.

What the archive contains
The archive contains the captured image data, calibrated cameras as well as the reconstructed per frame meshes. These meshes are in full vertex correspondence. The sequence consists of 346 frames captured at 42 fps.
The file structure is:

cameras

cam0.cam
cam1.cam
...

data

cam1

17843_imgxxx.bmp
...

cam2

DISCLAIMER.txt
INFO.txt
meshes

17843_imgxxx.ply
...

videos

darthMaul.mp4
geometry.mp4
sideBySide.mp4

The file formats

Images
The images are provided in 8bit RGB bmp format. These are the original images used by our algorithm.

Reconstructed Geometry
The format of the reconstructed geometry is ply. It is a single mesh per frame. All meshes are in full vertex correspondence, so the indices are replicated per mesh. The meshes can be read and converted with Meshlab.

Cameras
The camera format is our own. It consists of two lines; a header and the actual parameters. The header describes the content of the parameters, possible values are:

Property		Description
name		Name of the camera
fx,fy		Focal length
cx,cy		Principal point
alpha		Skew
nx,ny		Image size
k1..k5		Distortion parameters as described by Bouguet
tx,ty,tz		Extrinsic translation
rx,ry,rz		Extrinsic rotation (given in Rodrigues notation)
zn,zf		Near and far planes of the working volume

Obtaining the data

The human face is very personal and we decided thus not to publish the data online. On the other hand, high quality reconstruction data is very valuable to many researchers. As a compromise we offer to send the data directly to approved researchers. To request the data, please send an email to dbeeler at inf dot ethz dot ch stating

your name, title or position, and institution or affiliation
your intended use of the images and/or reconstructed geometry
a statement saying that you accept the following terms of licensing (please copy the licensing text into your email):
The rights to copy, distribute, and use the 3D computer models and image data (henceforth called "data") you are being given access to are under the control of Markus Gross, director of the Computer Graphics Lab, ETH Zurich. You are hereby given permission to copy this data in electronic or hardcopy form for your own scientific use and to distribute it for scientific use to colleagues within your research group. Inclusion of rendered images or video made from this data in a scholarly publication (printed or electronic) is also permitted. In this case, credit must be given to the publication: High-quality passive facial performance capture using anchor frames. However, the data may not be included in the electronic version of a publication, nor placed on the Internet. These restrictions apply to any representations (other than images or video) derived from the data, including but not limited to simplifications, remeshings, and the fitting of smooth surfaces. The making of physical replicas this data is also prohibited, and the data may not be distributed to students - also not in connection with a class. For any other use, including distribution outside your research group, written permission is required from Markus Gross. Any commercial use also requires written permission from Markus Gross. Commercial use includes but is not limited to sale of the data, derivatives, replicas, images, or video, inclusion in a product for sale, or inclusion in advertisements (printed or electronic), on commercially-oriented web sites, or in trade shows.

Inappropriate use

Please remember that faces are of very personal nature. Keep your renderings and other uses of the data in good taste. Don't put the faces in degrading or tasteless context and don't simulate nasty things happening to them (like breaking, exploding, melting, etc.). Choose another model for these sorts of experiments. Also, exercise reasonable caution to prevent the data from wandering beyond your research group.