Martha Larson

Visit my website for details of my on-going activities.

For a comprehensive list of publications see the publications list on my website; for a selected list, scroll down.


Martha Larson works in the area of multimedia retrieval and recommendation with a focus on speech, language and meaning. She is an expert in multimedia analysis techniques that make use of automatic speech recognition and audio analysis. Her more recent work involves multimedia in social networks and human computation, including crowdsourcing.

Within the Multimedia Computing Group she sets her focus on algorithms that are designed to meet user information needs that go beyond topic. Such aspects are often deeply hidden, and can be revealed only by analyzing large quantities of data, aided by crowdsourcing. Examples of her work on such orthogonal-to-topic aspects of multimedia include: novel highlights in spoken content, user intent in multimedia search, affective multimedia indexing, user trust, and context-aware recommender systems.

Larson is highly engaged in the international research community. She is co-founder of the MediaEval international multimedia benchmarking initiative. In 2012 and 2013, she served as Area Chair in Crowdsourcing for Multimedia at ACM Multimedia. She has been involved in the organization of multiple workshops including: Crowdsourcing for Multimedia (ACM Multimedia 2012 and 2013) and Searching Spontaneous Conversational Speech (SIGIR 2008, ACM Multimedia 2009-2010).

She has served on the program committees of numerous conferences in the area of information retrieval (SIGIR, ECIR, CIKM), multimedia (MMM, CBMI, ACM Multimedia, IEEE ICME, ACM ICMR), speech technology (Interspeech) and computational linguistics (COLING). She also reviews for a variety of journals including: IEEE Transactions on Multimedia, IEEE Transactions on Speech, Audio and Language Processing, ACM Transactions on Multimedia Computing, Communications and Applications, and ACM Transactions on Interactive Intelligent Systems.

Martha Larson started working in the area of speech recognition and multimedia retrieval in 2000 and her formative experiences included time spent as a visiting researcher at the Technical Informatics department at the University of Duisburg and as an intern at IBM Watson Research Center. Before joining Delft University of Technology, she researched and lectured in the area of audio-visual retrieval at Fraunhofer IAIS and at the University of Amsterdam. She has participated as both researcher and research coordinator in a number of projects including the EU-projects CrowdRec, CUbRIK, PetaMedia, MultiMatch, SHARE. Her most influential early work was focused on developing vocabulary-independent speech-based access for large radio archives within a industry project during her time at Fraunhofer.

Martha Larson holds a MA and PhD in theoretical linguistics from Cornell University and a BS in Mathematics (concentration in Electrical Engineering) from the University of Wisconsin. She is a member of Tau Beta Pi Engineering Honor Society.

Selected Scientific contributions

Larson, M., Melenhorst, M., Menendez, M. and Peng Xu. Using Crowdsourcing to Capture Complexity in Human Interpretations of Multimedia Content. In Fusion in Computer Vision – Understanding Complex Visual Content (Springer Advances in Computer Vision and Pattern Recognition), to appear.

Soleymani, M., Larson, M., Pun, T., Hanjalic, A. Corpus Development for Affective Video Indexing. IEEE Transactions on Multimedia, to appear.

Larson, M. and Jones, G.J.F. Spoken Content Retrieval: A Survey of Techniques and Technologies. 2012. Foundations and Trends in Information Retrieval: Vol. 5: No 4-5, pp 235-422.

Rudinac, S., Larson, M., Hanjalic, A., Learning Crowdsourced User Preferences for Visual Summarization of Image Collections, IEEE Transactions on Multimedia, vol.15, no.6, pp.1231-1243, Oct. 2013.

Hanjalic, A., Kofler, C. and Larson, M. (alphabetical) Intent and its discontents: the user at the wheel of the online video search engine. ACM Multimedia 2012, pp. 1239-1248.

Larson, M., Soleymani, M., Eskevich, M., Serdyukov, P., Ordelman, R. and Jones, G.J.F. 2012. The Community and the Crowd: Multimedia Benchmark Dataset Development. IEEE MultiMedia, vol. 19, no. 3, pp. 15-23, July 2012.

Larson, M., Kofler, C., Hanjalic, A. Reading between the Tags to Predict Real-World Size-Class for Visually Depicted Objects in Images. ACM Multimedia 2011, pp. 273-282, 2011.

Larson, M. et al. Automatic Tagging and Geotagging in Video Collections and Communities. ACM International Conference on Multimedia Retrieval (ICMR 2011), pp. 51:1-51:8, 2011.

Tsagkias, M., Larson, M. and de Rijke, M. Predicting Podcast Preference: An Analysis Framework and its Application, Journal of the American Society for Information Science and Technology, Vol. 61, No. 2, pp. 374-391, February 2010.

Larson, M., Eickeler, S. and Kohler, J. Supporting Radio Archive Workflows with Vocabulary Independent Spoken Keyword Search. Proceedings of SIGIR 2007 Workshop Searching Spontaneous Conversational Speech. 2007.

Leopold, E., Kindermann, J., Paass, G., Volmer, S., Cavet, R., Larson, M., Eickeler, S., Kastner, T. Integrated Classification of Audio, Video and Speech using Partitions of Low-Level Features, ECML Workshop on Multimedia Discovery and Mining, 2003.

Larson, M., Willett, D., Kohler, J., Rigoll, G. Compound splitting and lexical unit recombination for improved performance of a speech recognition system for German parliamentary speeches, International Conference on Spoken Language Processing, 2000.