Dick Epema

Emeritus Professor in Distributed Systems of the Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS) of Delft University of Technology.

Dick H.J. Epema obtained an MSc in mathematics with a minor in Spanish in 1979, and a PhD in mathematics (algebraic geometry) in 1983, both from Leiden University in the Netherlands. His PhD thesis is entitled Surfaces with Canonical Hyperplane Sections. In 1988 he also obtained an MSc in computer science from Delft University of Technology.

Currently he is full professor of Distributed Systems at Delft University of Technology. From September 2011 until 2016, he was a part-time full professor of Decentralized Distributed Systems in the System Architecture and Networking group of Eindhoven University of Technology.

His research interests are in the areas of scheduling in distributed computing systems (grids, clusters, clouds, datacenters) and cooperative systems (peer-to-peer systems, online social networks, blockchain).

In the area of scheduling, his current focus is on resource allocation to data-processing frameworks such as MapReduce and Spark. An important topic has been processor co-allocation, that is, the distribution of single (parallel) applications across multiple clusters. His scheduling research centers around the KOALA grid scheduler, which has been deployed on the DAS system and which had processor co-allocation as one of its initial main features. KOALA has later been extended to deal with many application types, such as workflows, bags-of-tasks, and data-processing frameworks.

In the area of cooperative systems, his research was on measurements and modeling of the BitTorrent P2P system, on all aspects of video distribution (recorded, live, and VoD) in swarm-based P2P systems, and on reputation mechanisms and resilience against sybil attacks, as part of the research and development of the Tribler P2P system. His research in this area recently moved to trust and the blockchain.

Previously, he did research in performance analysis, and he has investigated many different types of priority and fair queuing systems, ranging from theoretical single-server queuing models to the decay-usage scheduling policy in UNIX multiprocessors.

Dick Epema has obtained many research grants from NWO, the EU, and the Dutch government (BSIK, COMMIT). He was involved in the VL-e (grid computing) and I-Share/Freeband (virtual communities in the Internet and P2P computing) BSIK projects, and he participates in the Infrastructure Virtualization for e-Science project of the Dutch national COMMIT program. He has authored over 140 scientific papers, and has been on numerous program committees in grids, clouds, and P2P computing. He is an associate editor of the IEEE Trans. on Parallel and Distributed Systems and the IEEE Trans. on Cloud Computing. He was General Co-Chair of the EuroPar2009 and the IEEE P2P 2010 conferences, and he was General Chair of the 21st ACM Symp. on High-Performance Parallel and Distributed Computing in 2012 and of the 13th IEEE/ACM Symp. on Cluster, Cloud and Grid Computing in 2013. He was Program Committee Co-Chair of the 22nd ACM Symp. on High-Performance Parallel and Distributed Computing in 2013.

Main research interests

Distributed systems: design, operation and performance analysis
Resource management and scheduling in distributed computing systems: grids, clusters, clouds, datacenters
Cooperative Systems: modeling and analysis, trust and reputation mechanisms, blockchain

Current PhD students

Vincent van Beek (scheduling business-critical workloads in clouds, with Alexandru Iosup)
Masoud Ghiassi (data processing for machine learning, with Lydia Chen)
Bulat Nasrulin (risk-based approach to blockchain, with Johan Pouwelse)
Satwik Prabhu Kumble (anonymity, with Stefanie Roos)
Quinten Stokkink (blockchain-based identity management, with Johan Pouwelse)
Gill (Jiyue) Huang (incentives and attacks in federated learning, with Lydia Chen and Stefanie Roos)
Chi Hong (optimization of machine learning systems, with Lydia Chen)
Bart Cox (practical federated learning systems, with Jérémie Decouchant)

Previous PhD students

Jan de Jongh, Share Scheduling in Distributed Systems, February 2002
Anca Bucur, Performance Analysis of Processor Co-Allocation Policies in Multicluster Systems, March 2004
Hashim Mohamed, The Design and Implementation of the KOALA Grid Resource Management System, November 2007
Pawel Garbacki, Improving P2P Applications by Breaking the Architecture Symmetry, December 2008
Alexandru Iosup, A Framework for the Study of Grid Inter-operation Mechanisms, January 2009
Jan David Mol, Free-riding Resilient Video Streaming in Peer-to-Peer Networks, January 2010
Ozan Sonmez, Application-Oriented Scheduling in Multicluster Grids, June 2010
Michel Meulpolder, Managing Supply and Demand of Bandwith in Peer-to-Peer Communities, March 2011
Nezih Yigitbasi, Understanding and Improving the Performance Consistency of Distributed Computing Systems, December 2012
Rahim Delaviz Aghbolagh, A Robust Reputation Mechanism for Peer-to-Peer Systems, October 2013 (with Johan Pouwelse)
Adele Lu Jia, Online Networks as Societies: User Behaviors and Contribution Incentives, October 2013 (with Johan Pouwelse)
Dimitra Gkorou, Exploiting Graph Properties for Decentralized Reputation Systems, November 2014 (with Johan Pouwelse)
Siqi Shen, Massivizing Networked Virtual Environments on Clouds, April 2015 (with Alex Iosup)
Mihai Capota, User Contribution in Peer-to-Peer Communities, July 2015 (with Johan Pouwelse)
Riccardo Petrocco, Improving Peer-to-Peer Video Streaming, April 2016 (with Johan Pouwelse)
Yong Guo, Distributed Heterogeneous Systems for Large-Scale Graph Processing, May 2016 (with Alex Iosup)
Bogdan Ghit, Optimizing the Performance of Data Analytics Frameworks, May 2017
Alexey Ilyushkin, Scheduling Workloads of Workflows in Clusters and Clouds, December 2019 (with Alex Iosup)
Martijn de Vos, Decentralization and Disintermediation in Blockchain-based Marketplaces, June 2021

Research highlights

Condor Flocking
Decay-usage scheduling in multiprocessors
Processor co-allocation in multicluster systems
Measuring and analyzing grid and cloud workloads
Balancing resources among frameworks in datacenters
2Fast: Collaborative downloading in BitTorrent
Measuring and modeling swarm-based P2P systems
Reputation systems in online social networks

Editorships

Associate editor of IEEE Trans. on Parallel and Distributed Systems (2009-2014)
Associate editor of IEEE Trans. on Cloud Computing

Chairmanships

General and Program Committee Co-Chair of LSAP 2009 in Munich
General and Program Committee Co-Chair of Euro-Par 2009 in Delft
Vice Program Committee Chair 10th IEEE/ACM Int'l Symp. on Cluster, Cloud and Grid Computing (CCGrid) 2010 in Melbourne
General Co-Chair of the 10th IEEE Conference on Peer-to-Peer Computing in Delft
General and Program Committee Co-Chair of LSAP 2010 in Chicago
General and Program Committee Co-Chair of LSAP 2011 in San Jose
General Chair of the 21st Int'l ACM Symp. on High-Performance Parallel and Distributed Computing (HPDC) 2012 in Delft
General Chair of the 13th IEEE/ACM Int'l Symp. on Cluster, Cloud and Grid Computing (CCGrid) 2013 in Delft
Program Committee Co-chair of the 22nd Int'l ACM Symp. on High-Performance Parallel and Distrubuted Computing (HPDC) 2013 in New York City
Area Chair of Clouds and Distributed Computing, Supercomputing 2016 in Salt Lake City

Program Committee member for

HPDC 2022, Minneapolis, June 2022
HPDC 2021, Stockholm, June 2021
ICDCS 2020, Singapore, July 2020
HPDC 2020, Stockholm, June 2020
CCGrid 2020, Melbourne, May 2020
EuroPar 2019, Gottingen, August 2019
ICDCS 2019, Dallas, Texas, July 2019
28th ACM Symp. on High-Performance Parallel and Distributed Computing (HPDC'19), Phoenix, AZ, USA, June 2019
CCGrid 2019, Cyprus, May 2019
27th ACM Symp. on High-Performance Parallel and Distributed Computing (HPDC'18), Tempe, AZ, USA, June 2018
ICDCS 2018, Vienna, July 2018
CCGrid 2018, Washington, DC, May 2018
IEEE Int'l Conference on Big Data, December 2017
Supercomputing, Denver, USA, November 2017
IEEE 24th Int'l Symp. on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), Banff, Canada, September 2017
21st Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), Orlando, USA, June 2017
26th ACM Symp. on High-Performance Parallel and Distributed Computing (HPDC'17), Washington DC, USA, June 2017
IEEE 24th Int'l Symp. on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), London, September 2016
1st Workshop on Edge Computing (WEC'16), in conjunction with ICDCS, Nara, Japan, June 2016.
25th ACM Symp. on High-Performance Parallel and Distributed Computing (HPDC'16), Kyoto, Japan, June 2016
CCGrid 2016, Cartagena, Colombia, May 2016
20th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), Chicago, USA, May 2016

2015

IEEE 23rd Int'l Symp. on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), Atlanta, October 2015
24th ACM Symp. on High-Performance Parallel and Distributed Computing (HPDC'15), Portland, June 2015
Workshop on The Science of Cyberinfrastructure: Research, Experience, Applications and Models (SCREAM'15, in conjunction with HPDC'15), Portland, June 2015
8th Workshop on Virtualization Technologies in Distributed Computing (VTDC, in conjunction with HPDC'15), Portland, June 2015
19th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), Hyderabad, India, May 2015

2014
IEEE Cluster 2014, Madrid, September 2014
23rd ACM Symp. on High-Performance Parallel and Distributed Computing (HPDC'14), Vancouver, June 2014
CCGrid 2014, Chicago, May 2014
JSSPP 2014, Phoenix, May 2014

2013
17th Int'l Conf. on Principles of Distributed Systems (OPODIS), December 2013
SuperComputing 2013, Denver, November 2013
2013 IEEE Int'l Conference on Big Data (IEEE BigData 2013), October, Silicon Valley
1st Int'l Workshop on Optimization Techniques for Resource Management in Clouds (ORMaCloud, with HPDC-13), New York, June 2013
JSSPP 2013, Boston, May 2013
4th ACM/SPEC Int'l Conference on Performance Engineering, Prague, April 21-24, 2013

2012
5th IEEE/ACM Int'l Conference on Utility and Cloud Computing (UCC 2012), Chicago, November 2012
IEEE P2P Computing 2012, Tarragona, Spain, September 2012
JSSPP 2012, Shanghai, May 2012

Before 2012
ParCo 2011, Gent, Belgium, Aug-Sept 2011
4th Annual Int'l Systems and Storage Conference (SYSTOR), Haifa, Israel, May-June 2011
HPDC 2011, San Jose, CA, USA, June 2011
CCGrid 2011, Newport Beach, CA, USA, May 2011
Grid2010, Brussels, Belgium, October 2010
JSSPP 2010, Atlanta, USA, April 2010
CCGrid 2010, Melbourne, Australia, May 2010 (vice chair Performance Modeling and Evaluation)
IEEE P2P Computing 2009, Seattle, USA, September 2009
Euro-Par 2009, Delft, the Netherlands, August 2009 (co-chair)
HPDC 2009, Garching, Germany, June 2009
CCGRID 2009, Shanghai, China, May 2009
IEEE P2P Computing 2008, Aachen, Germany, September 2008
Grid2008, Tsukuba, Japan, September 2008
Euro-Par 2008, Canary Islands, Spain, August 2008 (global chair of the topic peer-to-peer systems)
HPDC 2008, Boston, USA, June 2008
IPDPS 2008, Miami, Florida, USA, April 2008
IPTPS 2008, Tampa Bay, Florida, USA, February 2008
IEEE P2P Computing 2007, Galway, Ireland, September 2007
Euro-Par 2007, Rennes, France, August 2007
ICDCS 2007, Toronto, Canada, 25-29 June 2007
CCGrid 2006, Rio de Janeiro, Brazil, May 2007
HPDC-15, Paris, France, 19-23 June 2006
The Sixth International Workshop on Global and Peer-to-Peer Computing organized at the IEEE/ACM International Symposium on Cluster Computing and the Grid 2006 (IEEE/ACM CCGRID 2006), Singapore, May 2006
Second Workshop on System Management Tools for Large-Scale Parallel Systems, in conjunction with the 2006 Int'l Parallel and Distributed Processing Symp., April 29, 2006, Rhodos, Greece
Grid 2005 - 6th IEEE/ACM Int'l Workshop on Grid Computing, November 12, 2005, in conjunction with SuperComputing 2005, Seattle, Washington, USA
The Second Grid Resource Management Workshop (GRMW-2005), in conjunction with the Sixth Int'l Conference on Parallel Processing and Applied Mathematics, September 11-14, 2005, Poznan, Poland
The Fifth International Workshop on Global and Peer-to-Peer Computing organized at the IEEE/ACM International Symposium on Cluster Computing and the Grid 2005 (IEEE/ACM CCGRID 2005), Cardiff, UK, May 2005
The European Grid Conference 2005, Amsterdam, the Netherlands, 14-16 feb. 2005
ICCP 2003, Koahsiung, Taiwan, October 2003
Performance 2002, The IFIP WG 7.3 Int'l Symposium on Computer Performance Modeling, Measurement and Evaluation, Rome, september 23-27, 2002
CCGrid 2002, Berlin, Germany, May 2002
The First Euroglobus Workshop in Lecce, Italy, 16-23 june, 2001
The 2nd Workshop on MAthematical (performance) Modeling and Analysis (MAMA2000) in conjunction with Sigmetrics 2000, june 17-18, 2000, in Santa Clara, Ca., USA
The Distributed Computing and Metacomputing Workshop as part of HPCN'99

Resource Management and Scheduling in Distributed Processing Systems

The KOALA Multicluster Scheduler

KOALA is a scheduler that we have designed and implemented in the PDS group, and that has been deployed on the DAS system. KOALA is our research vehicle for research in scheduling and resource management in multicluster systems, grids, and clouds. Its main original feature was processor co-allocation, but it supports now many more application types, such as Bags-of-Tasks, workflows, and MapReduce applications. KOALA development has been an ongoing effort in several research projects.

The Distributed ASCI Supercomputer (DAS)

The DAS is a six-cluster computer-science infrastructure funded by NWO (the Dutch National Science Foundation) and installed and maintained by the ASCI Research School. One of the clusters is located at TU Delft. The DAS is very important for the research of the PDS group. The KOALA scheduler has been developed for and installed on the DAS.

Infrastructure Virtualization for e-Science (IV-e, part of the national Dutch COMMIT programme, 2011-2017).

This project is a sequel to the VL-e project (see below) on resource management, e-Science applications, workflows and data management in large-scale distributed computing systems such as clouds. The two research topics of the PDS group in this project are further development of the KOALA scheduler and application-specific scheduling. In particular, we currently focus on scheduling data-intensive frameworks such as MapReduce and workflow scheduling.

PhD students: Bogdan Ghit and Alexey Ilyuskin

GUARD-G: Guaranteed Delivery in Grids (2007-2012)

The goal of this project on grid computing is to design and analyze techniques for delivering guaranteed service to applications in grids. The GUARD-G project is part of the GLANCE programme funded by NWO, and is performed jointly with Leiden University.

PhD student: Nezih Yigitbasi
Postdoc: Hashim Mohamed

ALEA: Handling Uncertainties in Large-Scale Distributed Systems (2009-2010)

The goal of ALEAE is to provide models and algorithmic solutions in the field of resource management that cope with uncertainties in large-scale distributed systems. ALEAE is a joint project of Delft University of Technology, INRIA in France, Osaka University in Japan, and the Zuse Institute in Berlin, Germany. One of the main achievements of the ALEAE project is the Failure Trace Archive (FTA), which is a centralized public repository of availability traces of parallel and distributed systems, and tools for their analysis. The purpose of this archive is to facilitate the design, validation, and comparison of fault-tolerant models and algorithms.

Virtual Laboratory for e-Science (2004-2010)

In the Dutch national project Virtual Lab for e-Science (VL-e), we focus on resource management, scheduling, and performance analysis in grids. In particular, we study the management and scheduling of jobs that require co-allocation, that is, the simultaneous allocation of resources (processors, data, etc.) in multiple subsystems making up a grid. For this purpose, we have designed and implemented the KOALA grid scheduler.

PhD students: Alexandru Iosup and Ozan Sonmez
Postdocs: Alexandru Iosup, Ozan Sonmez and Hashim Mohamed

CoreGRID (2004-2008)

CoreGRID is a Network of Excellence of the European Union in grid computing, with 42 participating universities and public research institutes in Europe. CoreGRID is divided into six work packages or so-called virtual institutes. One of these is the virtual institute on Resource Management and Scheduling, in which the PDS group participates.

Condor (1992-2996)

In this project on grid computing, we focused on resource management across multiple sites. In particular, we designed and implemented the flocking mechanism in Condor for load sharing and job migration across different Condor pools, in cooperation with the main designer of the Condor system, Miron Livny of the University of Wisconsin at Madison.

Peer-to-Peer Systems and Online Social Networks

P2P-Fusion (2006-2009)

P2P-Fusion is an EU project on peer-to-peer systems for creative reuse of multimedia content in virtual communities. The project has seven partners in Finland, Hungary, and the Netherlands.

PhD students: Michel Meulpolder and Rahim Delaviz

I-SHARE (2004-2010)

I-SHARE is a project on sharing technology at different levels in wired and wireless P2P systems. It is part of the BSIK programmme Freeband. As a guiding example, we are defining an architecture for P2P-TV, a P2P system for the dissemination of both live and recorded programs of 10,000+ TV channels. Research issues are how to do recommendations to users on TV programs, how to design the user interface, how to build application-level multicast trees for distributing live video, and in general, how to share the contents of individual video recordings on users' hard disks.

PhD student: Jan David Mol
Postdoc: Johan Pouwelse

Two-level peer-to-peer systems (TLP2PS, 2003-2008)

The research topic in this NWO-funded project is to exploit the heterogeneity of P2P systems, and in particular, to assess the performance impact of the presence of superpeers, which are peers that have more capabilities than other peers.

PhD student: Pawel Garbacki

On May 27, 2016, I held my inaugural lecture entitled Gedistribueerde systemen: van efficientie tot vertrouwen at Delft University of Technology. Here are all the materials (all are in Dutch):
- the video recording of the lecture
- the slides of the presentation (pdf, ppt)
- the text of the lecture (closely matches the slides)

Dynamic Resource Provisioning for Application Frameworks in Datacenters, presentation at Google, Mountain View, 10 March 2015.

Decentraliseer--en Beheers?, Inaugural lecture at Eindhoven University of Technology, 23 November 2012 (all in Dutch):
- the slides of the presentation (pdf, ppt)
- the text of the lecture

Twenty Years of Grid Scheduling Research and Beyond, Keynote at the 12th IEEE/ACM Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012), 16 May 2012.

Peer-to-Peer File Sharing: Past!-Present-Future? A Delft View, Keynote at the 11th IEEE Int'l Conference on Peer-to-Peer Computing (P2P'11), 31 August 2011.

Exploiting Heterogeneity in Parallel and Distributed Systems, Keynote at the 7th Workshop on Algorithms, Models, and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar'2009), 25 August 2009.

Main teaching interests

Distributed systems
Distributed algorithms
Cloud Computing

Master's courses

Distributed Algorithms (IN4150)

In this course, basic distributed algorithms are treated for such problems as synchronization, causal message ordering, deadlock, mutual exclusion, election, minimum-weight spanning trees, fault tolerance, consensus, and stabilization.

For TUD students, more information is available on Brightspace.

PhD course

Advanced Blockchain Engineering (ASCI course A27)

with Johan Pouwelse, Quinten Stokkink, Martijn de Vos (all TU Delft), and Marc Makkes (Vrije Universiteit Amsterdam)

Topics: distributed consensus, state machine replication, autonomous ledger-based micro-economies, blockchains for resource-constrained devices; lab assignment on the design and implementation of blockchain

Expected again in spring 2020.