Human-Centered Multimedia Analysis


Involved faculty members:

Pablo Cesar Alan Hanjalic


Digitalization of our society is a reality, radically reshaping the way we live and communicate. We are moving towards a connected intelligent world, in which always-on sensing and monitoring will enable rich immersive media experiences (remote working, medical consultation, online cultural heritage experience, entertainment). The MMC group addresses future challenges on the design, development, and evaluation of multimedia systems, resulting from this societal transformation.

The vision is a future where multimedia systems will be human-centered and thus capable of understanding the user(s) and the environment, providing empathic and highly customized (multi)media experiences that are immersive and interactive. This makes the research focus on facilitating and improving the way people access media and communicate with others and with the environment, addressing key problems resulting from the limited communication bandwidth and dense connectivity of content, people and devices.

The MMC group approaches this research area as a combination of artificial intelligence and data science techniques with a strong human-centric, empirical approach to understanding and modelling the experiences of users and maximizing the quality of this experience (QoE). Through this research, computer science in Delft is empowered towards design and development of next generation intelligent and empathic real-life multimedia systems, that will enable it to assess the true impact of the artificial intelligence and data science methods and algorithms underlying such systems. Grounding this design and development on realistic testing grounds and data sets will lead to valuable computer science solutions in a number of domains, such as healthcare and wellbeing, education, smart cities, and creative industries.

Some research challenges include:

  • Analysis of multimedia signal quality: understanding, monitoring, and prediction of the informativeness and perceptual quality of media signals in a real-life context (e.g. wearables, mobile video), which undergo various kinds of processing, such as lossy compression and streaming;
  • User behaviour understanding from multimedia signals: modelling user experience, behaviour, and navigation patterns in immersive media environments;


Representative publications

  1. J. Williamson, J. Li, V. Vinayagamoorthy, D.A. Shamma, and P. Cesar. 2021, “Proxemics and Social Interactions in an Instrumented Virtual Reality Workshop,” In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (ACM CHI 2021).
  2. S. Subramanyam, I. Viola, A. Hanjalic, and P. Cesar, "User Centered Adaptive Streaming of Dynamic Point Clouds with Low Complexity Tiling,” in Proceedings of the ACM Multimedia Conference (ACM MM 2020), Seattle, USA, October 12-16, 2020.
  3. T. Zhang, A. El Ali, C. Wang, A. Hanjalic, P. Cesar, “RCEA: Real-time, Continuous Emotion Annotation for Collecting Precise Mobile Video Ground Truth Labels,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (ACM CHI 2020), Honolulu, USA, April 25-30, 2020
  4. S. Schwarz, M. Preda, V. Baroncini, M. Budagavi, P. Cesar, et.al. Emerging MPEG Standards for Point Cloud Compression. IEEE Journal on Emerging and Selected Topics in Circuits and Systems (IEEE JETCAS), 9(1): pp. 133-148, 2019.
  5. M. Schmitt, J. Redi, D.C.A. Bulterman, and P. Cesar, "Towards individual QoE for multi-party video conferencing," IEEE Transactions on Multimedia (TMM), 20(7):1781-1795, 2018
  6. R. Mekuria, K. Blom, and P. Cesar, "Design, Implementation and Evaluation of a Point Cloud Codec for Tele-Immersive Video," IEEE Transactions on Circuits and Systems for Video Technology (IEEE TCSVT), 27(4): 828 -842, 2017.
/* */