The entertainment industry is steadily moving towards stereoscopic 3D (S3D) movie production, and the number of movie titles released in S3D is continuously increasing. The production of stereoscopic movies, however, is more demanding than traditional movies, as S3D relies on a sensitive illusion created by projecting two different images to the viewer’s eyes. It therefore requires proper attention to achieve a pleasant depth experience. Any imperfections, especially when accumulated over time, can cause wrong depth perception and adverse effects such as eye strain, fatigue, or even motion sickness. The main difficulty of S3D is the complex interplay of human perception, 3D display properties, and content composition. The last one of these especially represents the artistic intent to use depth as element of storytelling, which often stands in contrast to problems that can arise due to inconsistent depth cues. From a production perspective, this forms a highly complex and non-trivial problem for content creation, which has to satisfy all these technical, perceptual, and artistic aspects.
Unfortunately, shooting high-quality stereoscopic live video content remains an art that has been mastered only by a small group of individuals. More specifically, the difficulty arises from the fact that in addition to setting traditional camera parameters (such as zoom, shutter speed, aperture, and focus), camera interaxial distance and convergence have to be set correctly to create the intended depth effect. Adjusting all these parameters for complex dynamically changing scenes poses additional challenges. Furthermore, scene cuts and shot framing have to be handled appropriately in order to provide a perceptually pleasing experience. These problems become even more pronounced for live broadcast of stereo content, such as in sports applications. Capturing high-quality stereo 3D footage therefore requires very sophisticated equipment along with the craftsmanship of an experienced stereographer all of which makes the S3D production inherently difficult and expensive. The cost for S3D movie productions is estimated 10%-25% higher than for traditional productions.
We propose a computational stereo camera system that features a closed control loop from analysis to automatic adjustments of the physical camera and rig properties. Our freely programmable architecture comprises a high-performance computational unit that analyzes the scene in real-time (e.g., by computing 3D structure or by tracking scene elements) and that implements knowledge from stereography to capture quality S3D video in our control loop algorithms. Since stereography is still a widely open field with a continuously evolving conception of S3D cinematography, we designed our camera architecture as a freely reprogrammable set of processing units. This enables us to utilize different algorithms for different scenes, shots, or artistic intentions. In addition, we support scripting of complex operations to develop and optimize shots within the actual movie production. Thus, some of the postproduction is shifted back into the production cycle. In a live broadcast scenario scripts may be predefined and executed on demand.
For efficient camera operation, we devise a set of interaction metaphors that abstract the actual camera rig operations into intuitive gestures. The operator controls the camera using a multitouch stereoscopic user interface. In addition, the interface enables monitoring the S3D content as well as the related stereo parameters instantly. In order to achieve real-time performance, we implemented our custom computational architecture combining FPGA, GPU, and CPU processing close to the sensor to achieve a low latency control loop. To summarize, the contributions of our paper are as follows: