Computer Graphics Laboratory ETH Zurich

ETH

Human Shape from Silhouettes using Generative HKS Descriptors and Cross-Modal Neural Networks

E. Dibra, H. Jain, C. Öztireli, R. Ziegler, M. Gross

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Honolulu, HI, USA, July 21-26, 2017), pp. -
[Abstract] [BibTeX] [PDF][PDF suppl.]

Abstract

In this work, we present a novel method for capturing human body shape from a single scaled silhouette. We combine deep correlated features capturing different 2D views, and embedding spaces based on 3D cues in a novel convolutional neural network (CNN) based architecture. We first train a CNN to find a richer body shape representation space from pose invariant 3D human shape descriptors. Then, we learn a mapping from silhouettes to this representation space, with the help of a novel architecture that exploits correlation of multi-view data during training time, to improve prediction at test time. We extensively validate our results on synthetic and real data, demonstrating significant improvements in accuracy as compared to the state-of-the-art, and providing a practical system for detailed human body measurements from a single image.

@InProceedings{DibraJOZG17,
Title = {Human Shape from Silhouettes using Generative HKS Descriptors and Cross-Modal Neural Networks},
Author = {Endri Dibra and Himanshu Jain and A. Cengiz {\"{O}}ztireli and Remo Ziegler and Markus H. Gross},
Booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, July 21-26, 2017},
Year = {2017},
Abstract = {In this work, we present a novel method for capturing human body shape from a single scaled silhouette. We combine deep correlated features capturing different 2D views, and embedding spaces based on 3D cues in a novel convolutional neural network (CNN) based architecture. We first train a CNN to find a richer body shape representation space from pose invariant 3D human shape descriptors. Then, we learn a mapping from silhouettes to this representation space, with the help of a novel architecture that exploits correlation of multi-view data during training time, to improve prediction at test time. We extensively validate our results on synthetic and real data, demonstrating significant improvements in accuracy as compared to the state-of-the-art, and providing a practical system for detailed human body measurements from a single image.}
}
[Download BibTeX]

Downloads

Download Paper
[PDF]
Download Paper
[PDF suppl.]