Building on two decades of research on video content analysis and retrieval, including the seminal publications on affective video content representation and modelling, cross-modal retrieval, generation and captioning has emerged as a challenging new research direction within the MMC Group, which gained momentum through the collaboration with the University of Electronic Science and Technology, Chengdu, China, and with the University of Illinois at Urbana-Champaign, IL, USA. Our research targets innovative methodological and algorithmic concepts that automatically infer semantic links between pieces of information conveyed by different modalities. Such concepts could be used in a cross-modal retrieval scenario, in which e.g., an image can be found based on a textual or spoken description of its semantic content, but also in the scenarios involving “translating” the information from one modality into another. Examples of such scenarios are image and video captioning (in text and speech), visual question generation, and image generation from spoken descriptions.

The MMC Group has rapidly built up a strong track record in cross-modal retrieval, generation and captioning marked by the numerous and impactful publications in the major conference and journal venues in this field. These venues include ACM International Conference on Multimedia (ACM Multimedia), IEEE Transactions on Image Processing, IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Transactions on Audio, Speech and Language Processing and IEEE Transactions on Neural Networks and Learning Systems. These publications have already brought two Best Paper Awards in 2017 and 2019.


