Human-Centered Multimedia: making remote togetherness possible

After two years of COVID-19 we all know that communicating through a flat screen is exhausting. Since long before that, professor Pablo Cesar has been focussing his research on highly realistic volumetric video conferencing which will allow smooth, natural, communication and collaboration – making remoteness a thing of the past.

Let’s cut to the chase. A good friend is getting married, but you are on the other side of the world, or perhaps your city is in lockdown. You have decent cell phone reception so you can receive pictures of the event and even some streaming video. But what if, using 3D-goggles, you could really immerse yourself in the party? Wouldn’t this make you feel much closer to the people you care about? 

“My research into real-time volumetric video is about facilitating the way people communicate and the way people access media,” says Pablo Cesar, group leader at the National Research Institute for Mathematics and Computer Science (CWI) and professor of Human-Centered Multimedia Systems at TU Delft. “Yes, it is about algorithms and optimisation, but we approach these from a human perspective rather than a technology perspective. We aim to understand, and transmit, the essence of communication.”

If I want to use the internet to visit a cultural heritage museum with my father, I don’t want to be looking at a panda-version of him

Not a panda

Born in Spain, having studied in Finland and with a Japanese wife, Cesar has long been accustomed to a sense of remoteness as well as to an urge to maintain real connections. And, to him, the Metaverse with its avatars just doesn’t measure up. “If I want to use the internet to visit a cultural heritage museum with my father, I don’t want to be looking at a panda-version of him,” he says. “I do want to be in virtual reality, as it allows three-dimensional visualisations and interactions beyond the realm of the normal world. But I want to have a highly realistic representation of the surroundings and, especially, of my loved ones.”

Context and intention

Of course, there is a catch to capturing and transmitting highly realistic representations in 3D in real-time. “Managing finite resources is a core issue of computer science,” Cesar says. “We want to provide the best quality of experience, adapted to any technological conditions.” So, no waiting around for faster processors and 6G mobile communication but rather an optimisation based on human-centric principles and methodologies.

We want to provide the best quality of experience, adapted to any technological conditions.

“I do a lot of communication with my hands,” Cesar says. “In most situations, only my face and hands need to be represented life-like. The wrinkles in my shirt can be ignored, as can my feet. Unless I’m dancing at a wedding of course.” Whereas the human brain is primed for ignoring any non-essential information based on a specific context, computers are not. One of the core challenges of his research therefore is to develop algorithms that are context aware, that can decide which information is important to send and which information can be safely ignored. “To optimise what we send or not, we need to understand what is happening, what you are looking at, where you are moving to, even the emotional state of the people communicating.”

Domain experts

Cesar has made it a personal mission to put human-computer interaction at the core of computer science. “We run a lot of experiments with users, for example on how to represent interpersonal distances in virtual reality,” he says. “For our experiments we collaborate with domain experts, such as sociologists and psychologists, but also people from opera and theatre.”

He also fosters collaborations in the Netherlands and Europe regarding the technical aspects of volumetric video conferencing. “Our own focus is primarily on the networking aspects, on the communication itself, on the understanding of humans and on the optimisation of all the signals related to that,” he says. “We collaborate with experts in multi-sensory experiences and, for actual and efficient 3D-rendering, with experts in computer graphics.”

To optimise the data we send or not, we need to understand the context and intention, even the emotional state of the participants

More personal than a director’s cut

To see where this research is going, it helps to provide some examples of what Cesar has already achieved in some of his many projects. For one, he simulated a chance passer-by having a volumetric video consult with a doctor regarding an injured cyclist. It involved a mobile phone camera with depth perception and all data was transmitted using the public mobile 5G network. “The quality of the video was very low,” he says. “It served as a proof of concept.” 

He also transmitted multiple feeds from a soccer match to a pub full of supporters. “The idea behind transmitting multiple streams, rather than just the director’s cut, is that it allows personalisation of what to ‘broadcast’ on the receiver side, based on personal preferences.”

Transformative

Cesar’s hope is that volumetric video, allowing very natural communication, will help in all aspects of our lives. With applications from videoconferencing and healthcare to museums and cars, it certainly has the power to be transformative. But we’re not there yet. “If you were born in the ‘80s, you will remember what 2D video was like in the ‘90s and what has been achieved since then,” Cesar says. “That is pretty much where we are right now with volumetric video. We are able to provide demonstrations, so that people can start imagining what is possible. That is the purpose. It will help us to really build the technology that they need!”

/* */