The Academic Fringe Festival - Matthias Boehm: System Infrastructure for Data-centric ML Pipelines - Balancing Automation and Manual Control

03 april 2023 16:00 - Locatie: Online | Zet in mijn agenda

by Matthias Boehm | Technische Universit├Ąt Berlin


Data-centric machine learning (ML) pipelines include - besides the training and hyper-parameter tuning of ML models - primitives for data cleaning, data augmentation, data validation, and model debugging in order to construct high-quality datasets with good coverage. Interestingly, state-of-the-art techniques for data integration, cleaning, and augmentation as well as model debugging are often based on machine learning themselves, which motivates their integration into ML systems. In this talk, we make a case for optimizing compiler infrastructure in Apache SystemDS, an open-source ML system for the end-to-end data science lifecycle. However, instead of full automation - which is rather unrealistic - we aim to automate the mechanical aspects of various tasks in data-centric ML pipelines while retaining manual control. As two concrete examples, we discuss SAGA for automatically enumerating data cleaning pipelines, and SliceLine for model debugging with regard to sub-groups of the input dataset.

Speaker Biography

Matthias Boehm is a full professor for large-scale data engineering at Technische Universit├Ąt Berlin and the BIFOLD research center. His cross-organizational research group focuses on high-level, data science-centric abstractions as well as systems and tools to execute these tasks in an efficient and scalable manner. From 2018 through 2022, Matthias was a BMK-endowed professor for data management at Graz University of Technology, Austria, and a research area manager for data management at the co-located Know-Center GmbH. Prior to joining TU Graz in 2018, he was a research staff member at IBM Research - Almaden, CA, USA, with a major focus on compilation and runtime techniques for declarative, large-scale machine learning in Apache SystemML. Matthias received his Ph.D. from Dresden University of Technology, Germany in 2011 with a dissertation on cost-based optimization of integration flows. His previous research also includes systems support for time series forecasting as well as in-memory indexing and query processing.


About the Academic Fringe Festival

The Academic Fringe Festival (TAFF) is an exciting concoction of invited talks and panel discussions around important themes of research and innovation in Computer Science.This fourth edition is on "Human-Centered AI: Knowledge and Language". The series features prominent researchers and practitioners, whose work has made fundamental contributions in these fields.

Artificial Intelligence is used more and more in society, from healthcare to government decisions and recruitment. Along with the rapid increase of AI adoption comes increased concerns about the inherent shortcomings of such technologies (e.g., robustness) and the social, and ethical implications. To create AI systems that can properly serve humans, it is crucial to put humans at the center of the process such that the outcome system behaves in a way that fits the values and needs of people. This poses new challenges to technological development: how to build AI systems that can be understood by humans and that can align their behaviour with human values? Tackling these challenges requires new ways of looking at AI systems, e.g., machine learning models as knowledge bases and as autonomous agents that people can query, interact with, and influence.

More information:


Join us

To receive announcements of upcoming presentations and events organized by TAFF and get the Zoom link to join the presentations, join our mailing list.


Visit the website of The Academic Fringe Festival