D. Giger, J.C. Bazin, C. Kuster, T. Popa, M. Gross
Proceedings of IEEE ICME 2014 (Chengdu, China, July 14-18, 2014), pp. 1-6
Abstract
Eye contact is a critical aspect of human communication. However, when talking over a video conferencing system, such as Skype, it is not possible for users to have eye contact when looking at the conversation partner’s face displayed on the screen. This is due to the location disparity between the video conferencing window and the camera. This issue has been tackled by expensive high-end systems or hybrid depth+color cameras, but such equipment is still largely unavailable at the consumer level and on platforms such as laptops or tablets. In contrast, we propose a gaze correction method that needs just a single webcam. We apply recent shape deformation techniques to generate a 3D face model that matches the user’s face. We then render a gaze-corrected version of this face model and seamlessly insert it into the original image. Experiments on real data and various platforms confirm the validity of the approach and demonstrate that the visual quality of our results is at least equivalent to those obtained by state-of-the-art methods requiring additional equipment.
Overview
With the wide availability of broadband Internet, video conferencing
is becoming more and more popular both for professional
and private use, and gradually replacing traditional
audio calls. However when talking over a traditional video
conferencing system such as Skype or Apple’s FaceTime, conversation
partners do not have eye contact. Concretely, when
the camera is at the top of the screen, the users have the impression
that the conversation partner is looking down.
In this work, we present a practical gaze correction system
that relies on only a single camera, and thus can be used on a
variety of platforms ranging from desktop computers to laptops
and from professional video-conferencing systems to tablets.
In the absence of any geometric information and with only one
view available, our method fits a generic template to the image
in real-time, preserving the facial expression of the participant,
and uses this geometric proxy to synthesize a gaze corrected
version of the head that is then transferred seamlessly into the
original image.
Results
Top: frames recorded during video conferencing. Bottom: Our real-time gaze correction result.