Computer Graphics Laboratory ETH Zurich


Computational Stereo Camera System with Programmable Control Loop

S. Heinzle, P. Greisen, D. Gallup, C. Chen, D. Saner, A. Smolic, A. Burg, W. Matusik, M. Gross

Proceedings of ACM SIGGRAPH (Vancouver, Canada, August 7-11, 2011), ACM Transactions on Graphics, vol. 30, no. 4, pp. 94:1-94:10
[Abstract] [BibTeX] [PDF] [Video]


Stereoscopic 3D has gained significant importance in the entertainment industry. However, production of high quality stereoscopic content is still a challenging art that requires mastering the complex interplay of human perception, 3D display properties, and artistic intent. In this paper, we present a computational stereo camera system that closes the control loop from capture and analysis to automatic adjustment of physical parameters. Intuitive interaction metaphors are developed that replace cumbersome handling of rig parameters using a touch screen interface with 3D visualization. Our system is designed to make stereoscopic 3D production as easy, intuitive, flexible, and reliable as possible. Captured signals are processed and analyzed in real-time on a stream processor. Stereoscopy and user settings define programmable control functionalities, which are executed in real-time on a control processor. Computational power and flexibility is enabled by a dedicated software and hardware architecture. We show that even traditionally

[Download Video]

author = {Heinzle, Simon and Greisen, Pierre and Gallup, David and Chen, Christine and Saner, Daniel and Smolic, Aljoscha and Burg, Andreas and Matusik, Wojciech and Gross, Markus},
title = {Computational stereo camera system with programmable control loop},
journal = {ACM Trans. Graph.},
issue_date = {July 2011},
volume = {30},
issue = {4},
month = {August},
year = {2011},
issn = {0730-0301},
pages = {94:1--94:10},
articleno = {94},
numpages = {10},
url = {},
doi = {},
acmid = {1964989},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {camera system, programmable, stereoscopy},
[Download BibTeX]

Figure 1: Our custom beam-splitter stereo-camera design is comprised of motorized lenses, interaxial distance and convergence. A programmable high performance computational unit controls the motors. User input is performed using a stereoscopic touch screen.


The entertainment industry is steadily moving towards stereoscopic 3D (S3D) movie production, and the number of movie titles released in S3D is continuously increasing. The production of stereoscopic movies, however, is more demanding than traditional movies, as S3D relies on a sensitive illusion created by projecting two different images to the viewer’s eyes. It therefore requires proper attention to achieve a pleasant depth experience. Any imperfections, especially when accumulated over time, can cause wrong depth perception and adverse effects such as eye strain, fatigue, or even motion sickness. The main difficulty of S3D is the complex interplay of human perception, 3D display properties, and content composition. The last one of these especially represents the artistic intent to use depth as element of storytelling, which often stands in contrast to problems that can arise due to inconsistent depth cues. From a production perspective, this forms a highly complex and non-trivial problem for content creation, which has to satisfy all these technical, perceptual, and artistic aspects.

Unfortunately, shooting high-quality stereoscopic live video content remains an art that has been mastered only by a small group of individuals. More specifically, the difficulty arises from the fact that in addition to setting traditional camera parameters (such as zoom, shutter speed, aperture, and focus), camera interaxial distance and convergence have to be set correctly to create the intended depth effect. Adjusting all these parameters for complex dynamically changing scenes poses additional challenges. Furthermore, scene cuts and shot framing have to be handled appropriately in order to provide a perceptually pleasing experience. These problems become even more pronounced for live broadcast of stereo content, such as in sports applications. Capturing high-quality stereo 3D footage therefore requires very sophisticated equipment along with the craftsmanship of an experienced stereographer all of which makes the S3D production inherently difficult and expensive. The cost for S3D movie productions is estimated 10%-25% higher than for traditional productions.

We propose a computational stereo camera system that features a closed control loop from analysis to automatic adjustments of the physical camera and rig properties. Our freely programmable architecture comprises a high-performance computational unit that analyzes the scene in real-time (e.g., by computing 3D structure or by tracking scene elements) and that implements knowledge from stereography to capture quality S3D video in our control loop algorithms. Since stereography is still a widely open field with a continuously evolving conception of S3D cinematography, we designed our camera architecture as a freely reprogrammable set of processing units. This enables us to utilize different algorithms for different scenes, shots, or artistic intentions. In addition, we support scripting of complex operations to develop and optimize shots within the actual movie production. Thus, some of the postproduction is shifted back into the production cycle. In a live broadcast scenario scripts may be predefined and executed on demand.

For efficient camera operation, we devise a set of interaction metaphors that abstract the actual camera rig operations into intuitive gestures. The operator controls the camera using a multitouch stereoscopic user interface. In addition, the interface enables monitoring the S3D content as well as the related stereo parameters instantly. In order to achieve real-time performance, we implemented our custom computational architecture combining FPGA, GPU, and CPU processing close to the sensor to achieve a low latency control loop. To summarize, the contributions of our paper are as follows:

  • A computational stereo camera system for stereography and video analysis with a closed control loop model for automatic adjustment of the stereo and camera parameters,
  • A programming environment for the computational unit of our stereoscopic camera rig for scripting of complex shots,
  • A re-programmable control unit for adapting rig parameters to different shots, scenes, and user preferences,
  • A multi-touch stereoscopic user interface including scene preview and intuitive interaction metaphors,
  • An advanced system architecture combining FPGA, GPU, and CPU processing to provide a high-performance platform for full HD (1920x1080) video processing in real-time.


Download Paper
Download Video