Validation and validity in multimedia information systems


Involved faculty member:

C.C.S. Liem


In the current era of big data, we can acquire and analyze more data than ever, but this data is unstructured and messy, and measurement procedures may not have been optimal. Even more strongly, in many human-focused use cases, we may not be able to fully articulate what and where to measure, even though we have a good sense on what is an intended or unintended outcome. As one example, in music, we frequently encounter such challenges of measurement. Music information can digitally be described in many ways using many modalities, but the success of a song is typically determined by implicit human responses and subjective human interpretation. This makes the collection of ecological, yet valid and reliable data observations (and associated) labels a non-trivial matter.

Furthermore, while much of our expertise considers applying machine learning techniques to multimedia data, these techniques will be part of larger systems. While many of our techniques are ‘working as intended’ according to traditional metrics and evaluation procedures, often, in actual systems, they may not, and even cause societally problematic outcomes.

Therefore, under this research line, we focus on methodological frameworks to better assess whether our multimedia processing technologies really achieve what they are intended to achieve. Inspired by work in metrology, psychometric validity and metascientific practices, we seek to gain more confidence in computational measurement procedures. Our ambition is to also do this for ecological use cases ‘in the wild’, outside of fully controlled lab settings. As researchers, we also seek to offer more transparency with regard to researcher degrees of freedom in our own work, and to more consciously balance the methodologies of design, science and engineering, depending on the problems we target.

Inspired by work in software testing, we seek to really treat multimedia information systems as systems, involving multiple stages, components, and levels of complexity, which require dedicated and different testing and validation strategies. In this, we seek to more concretely operationalize many of the key requirements of the EU’s Ethics Guidelines on Trustworthy AI.

Representative publications

  1. A. Panichella and C.C.S. Liem, “What Are We Really Testing in Mutation Testing for Machine Learning? A Critical Reflection,” in Proceedings of the 43rd International Conference on Software Engineering - New Ideas and Emerging Results track (ICSE NIER), May 2021.
  2. C.C.S. Liem and C. Mostert, “Can’t Trust the Feeling? How Open Data Reveals Unexpected Behavior of High-Level Music Descriptors,” in Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR), October 2020.
  3. C.C.S. Liem and A. Panichella, “Oracle Issues in Machine Learning and Where to Find Them,” in Proceedings of the 8th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE), June 2020.
  4. C.J. König, A.M. Demetriou, P. Glock, A.M.F. Hiemstra, D. Iliescu, C. Ionescu, M. Langer, C.C.S. Liem, A. Linnenbürger, R. Siegel, and I. Vartholomaios, “Some Advice for Psychologists Who Want to Work With Computer Scientists on Big Data,” Personnel Assessment and Decisions, vol. 6, iss. 1, 2020.
  5. J. Kim, J. Urbano, C.C.S. Liem and A. Hanjalic, “Are Nearby Neighbors Relatives? Diagnosing Deep Music Embedding Spaces,” Frontiers in Applied Mathematics and Statistics, vol. 5, November 2019.
  6. J. Kim, A.M. Demetriou, S. Manolios and C.C.S. Liem, “Beyond Explicit Reports: Comparing Data-Driven Approaches to Studying Underlying Dimensions of Music Preference,” in Proceedings of the 27th ACM Conference On User Modelling, Adaptation And Personalization (UMAP), June 2019.
  7. C.C.S. Liem, M. Langer, A. Demetriou, A.M.F. Hiemstra, Achmadnoer Sukma Wicaksana, M.Ph. Born, C.J. König, “Psychology Meets Machine Learning: Interdisciplinary Perspectives on Algorithmic Job Candidate Screening,” in H.J. Escalante, S. Escalera, I. Guyon, X. Baró, Y. Güçlütürk, U. Güçlü and M. van Gerven, The Springer Series on Challenges in Machine Learning: Explainable and Interpretable Models in Computer Vision and Machine Learning, Springer, pp. 197-253, 2018.
/* */