The camera pans evenly to follow the figure skater, smoothly zooming in as she pulls away and slowing down as she changes direction. A skilled hand with the camera? Yes. However, the hand isn’t human. The camera movements are generated by an intelligent robotic system.
The system is backed by technology developed at Seervision, a spin-off from ETH Zurich that produces systems for automated video production that are capable of learning. The core of such systems is their image-analysis software, which is capable of recognizing and classifying people and makes sure that the cameras follow their movements. An expansion of the algorithm to other subjects is in planning.
Collaborative on a human-equivalent level
The software is quite sophisticated. Using what is referred to as Visual Position Tracking, the system sets multiple reference points for each item in the image. These are used to generate movement patterns that are supplied continuously to the controller. The movement patterns are used to continuously and dynamically optimize the field of view. In addition, various image design modules make sure that the requirements for professional image composition are being fulfilled. Conrad von Grebel, Business Developer and co-founder of Seervision, explains: “Our software adjusts camera movements in real time. What’s unique about our process is that the cameras operate at a human level of proficiency.” The system still allows the producer to intervene at any time. “If I command the software to do a closeup, I can adjust it manually when needed.” This makes the technology a perfect symbiosis between an autonomous system and human artistic skill. It works with all commonly used cameras and a web browser.
The number of cameras used in a typical TV production ranges from three to eight in studio setups, to more than 50 during the soccer world cup. This can quickly become quite costly. However, Seervision is not primarily focused on cutting cost: The company wants to give film producers a tool that makes their work more plannable and improves the quality of live transmissions. This results in changes in the nature of tasks: The camera operator becomes a multi-camera coordinator who is in charge of the visual language and thus is responsible for the style of the transmission. The benefits include more attractive and immediately adjustable camera angles and movements, as well as the minimization of human error.
The software is continuously learning
The varied functions of the system are based on artificial intelligence: For programming the software, the experts in Zurich continuously analyze existing film material to develop the world’s first artificial neuronal net for video productions – a concept that originated in brain research. The software independently learns from the data of past productions to optimize processes of visual recognition and comprehension and to determine the field of view and the movements of the camera. The developers even plan to use artificial intelligence to automate the cutting process.
Seervision equips its camera robots with brushless EC 45 flat motors and EPOS4 compact positioning controllers from maxon that move the camera noiselessly with a precision of 0.0002 degrees. Additional drives mounted on the lens ensure optical accuracy and control the camera focus. Interesting: The original idea of automatically filming a university lecture gave birth to a technology company. Seervision has been part of the maxon Young Engineers Program since 2016, won the Swiss Technology Award in 2018, and has been nominated for this year’s Digital Economy Award.
Author: Luca Meister
Young Engineers Program
With its Young Engineers Program (YEP), maxon supports innovative projects with discounted products and technical advice. Find out more and apply now.