Zaid Al-Ars

Zaid Al-Ars is an associate professor at the Computer Engineering Lab of the Delft University of Technology, where he leads the research and education activities of the big data architectures research theme of the lab. His work focuses on addressing the bottlenecks in big data application scalability on multicore architectures and proposing optimized solution alternatives for system performance, memory, power, reliability, etc. The research interests of Dr. Al-Ars include:

Analysis and development of multicore systems to accelerate big data applications such as bioinformatics
Methods for application domain analysis and mapping to appropriate multicore architectures (CPU, GPU, FPGA)
Design and optimization of interconnect solutions to improve system performance and relieve data transfer bottlenecks
Design, characterization and test process improvement for multicore systems using the abundant computational resources available in these systems

Dr. Al-Ars is also a co-founder of Bluebee, a high-tech startup active in the intersection between the fields of cloud computing, high-performance computing and genomics applications. Prior to joining the TUDelft, Dr. Al-Ars spent a number of years in the Product Engineering Group of Infineon Technologies and Siemens Semiconductors in Munich, Germany, where he was responsible for constructing new test methodologies to reduce the overall cost of the memory production test flow.

Here is a list of my current PhD students:

Johan Peltenburg, High-Performance Processor Architectures for Big Data Applications, started 2016
Ernst houtgast, Acceleration and Optimization of the Genomics Pipeline Through Hardware/Software Co-Design, in collaboration with Bluebee, started 2014
Ahmed Nauman, Efficient implementation of DNA sequence assembly on FPGA based systems, started 2014
Shanshan Ren, Optimizing Genomic Computational Pipelines on High-Performance Systems, 2013
Valery Kritchallo, Facilitating Algorithmic Scalability on Multicore Platforms Using Data Communication Profiling, started in 2013
Mahroo Zandrahimi, Heterogeneous Multicore Design Techniques Under Process Variations, in collaboration with ST-Microelectronics, started 2013

Here is a list of my graduated PhD students:

Hamid Mushtaq, Runtime Monitoring for Fault Tolerant Computing, graduated 2015
Cuong Pham Quoc, Hybrid Interconnect and Interprocessor Communication, graduated 2015
Sandra Irobi, Test Development for Parasitic Fails in Deep Sub-Micron Memory Devices, graduated 2012
Laiq Hasan, Hardware Acceleration of Bioinformatics Sequence Alignment Applications, graduated 2011

Here is a list of my current big data support staff:

Hamid Mushtaq (PhD Computer Engineering), Scalability of Big Data In-Memory Computing Frameworks
Tom Mokveld (MSc Bioinformatics), Big Data Solutions for Genomics Diagnostics Pipelines
Dorus Leliveld (MSc Computer Engineering), Machine Learning Algorithm Optimization and Big Data Analytics

Here is a list of my current masters students:

Ahmad Hesam, Brain Developmental Model Simulation on Scalable Compute Clusters, with CERN (CH), 2017
Paul Bakker, Eye Tracking Algorithms for Psychological Disease Diagnosis, with Erasmus UMC, 2017
Rene Miedema, Implementation of Brain Simulation Models on the Maxeler Dataflow Engine, with Erasmus UMC, 2017
Yun Lu, Medical Image Managements Systems for High-Performance Image Processing Pipelines, 2017

Parag Bhosale, GPU-Based Acceleration of Medical Image Registration, with Leiden University, 2017
Sander Suursalu, Predictive Maintenance of Oil Refineries Using Deep Learning and Big Data Approaches, with Shell, 2017
Nikolas Bampetas, High-Performance DNA Analysis Pipelines Using Scalable Apache Spark Acceleration, 2017
Huang-Da Chi, Parallelizing Video Filtering Pipelines on Manycore Platforms, with Spin Digital (DE), 2017
Konstantinos Gkougkoulias, Analysis of Efficient FPGA-Based Overlay Architectures, 2017
Rujuta Kulkarni, Efficient FPGA-Based Overlay Architectures for Big Data Applications, 2017
Uttam Kumar Elango, Scalable Real-Time Medical Image Processing Architecture on FPGA, with Philips, 2017
Bianco Zandbergen, Application Specific Reconfigurable Processor Architecture on FPGAs, 2017
Tong Dong Qiu, Acceleration of De Novo Assembly Algorithms for Human DNA, 2017
Jan-Harm Betting, Machine Learning Algorithms for Realtime Whisker Tracking, with Erasmus UMC, 2017
Saevar Hilmarsson, Streaming Processor Framework for High Performance Medical Image Processing, 2017
Prashanth Guledal Lakshamana, Dynamically Reconfigurable Multicore Processor Architecture, 2017

Here is a selected list of my graduated masters students:

Yang Ma, Hardware Acceleration of Realtime Whisker Tracking Algorithms, with Erasmus UMC, 2017
Rolf Heij, Design and Implementation of FPGA-based Image Processing Computational Fabric, 2016
Spyros Foniadakis, Effective Compression Techniques for DNA Sequencing and Analysis Datasets, 2016
Tudor Voicu, SparkJNI: A Reference Design for a Heterogeneous Apache Spark Framework, 2016
Bas Metman, Software to Hardware: Reducing Design Time of Optimized FPGA Implementations in Medical Devices, in collaboration with Philips, 2016
Panagiotis Mitsis, High Performance OpenCL Implementation of Medical Image Processing Algorithms, in collaboration with Philips, 2016
Michiel Jaspers, Acceleration of Read Alignment with Coherent Attached FPGA Coprocessors, in collaboration with IBM (US) and Bluebee, 2015
Jagruth Prasanna Kumar, Robust Body-Coupled Communication for Wearable Devices, in collaboration with Philips Research, 2015
Alexis Kanter, Design Space Exploration for a Local Object Store, in collaboration with IBM (US), 2015
Sumedh Jambekar, Performance Improvement of Motion Control Applications Using GPPs and ASIPs on FPGA, in collaboration with ASML, 2015
Tom Hubregtsen, Evaluation of different storage systems in Hadoop and Spark, in collaboration with IBM, 2015
Casper Folkers, Scalable Machine Learning Algorithms on a Big Data Infrastructure, 2015
George Kathareios, Compression of Next-Generation DNA Sequencing Data, 2014
Johan Peltenburg, Hardware Acceleration of Short-Read Mapping with the Burrows-Wheeler Aligner, in collaboration with UMC Utrecht, 2014
Kostas Patsis, Evaluation of DNA Scaffolding Techniques Using PacBio Long Reads, in collaboration with Leiden University, 2014
Sriram Adiga, NoC Characterization Framework for Design Space Exploration, in collaboration with Recore Systems, 2014
Namitha Gopalakrishna, Execution Time Analysis of Audio Algorithms, in collaboration with Bosch, 2014
Phani Kiran Padmanabharao, Hardware Acceleration of BWA-MEM Genome Mapping Application, in collaboration with Bluebee, 2014
Ratnakar Madan, Performance Improvement of Optical Algorithms on Multicore Platforms, in collaboration with ASML, 2013
Du Nguyen Anh, Development of a Brain Neural Model Simulation on GPUs, in collaboration with Erasmus Medical Center, 2013
Robert Lodder, Use of High-Throughput DNA Data for Discovery of Unknown Genes in Zebrafish, in collaboration with Leiden University, 2013
Amora Amir, Implementation of Bio-Informatics Applications on Various GPU Platforms, 2013
Ronnie Klanderman, Flash Memory Device: Electrical Modeling and Simulation, 2012
Rakshith Amarnath, Techniques for Memory Mapping on Multi-Core Automotive Embedded Systems, in collaboration with Bosch (Germany), 2012
Reinier van Kampenhout, Deterministic Task Transfer in Network-on-Chip Based Multi-Core Processors, in collaboration with Fraunhofer Institute (Germany), 2011
Eric Vermij, Genetic Sequence Alignment on a Supercomputing Platform, 2011
Aijie Zhao, Reliable In-Vehicle FlexRay Network Scheduler Design, 2011
Marijn Kentie, Biological Sequence Alignment Using Graphics Processing Units, 2010
Patrick van Wijnen, Feasibility Analysis for Hardware Acceleration of Pattern Recognition Algorithms, 2010
Erick van Rijk, Development of a Workload Set for Multi-Core Architectures, in collaboration with Apple, 2009
Sander Kootkar, Reliable sensor networks, in collaboration with Logica, 2009

I'm responsible or contribute to teaching the following courses:

Recent developments in computing systems have resulted in the emergence of a number of different computational platforms that provide various performance, cost and power advantages in different application domains. This course discusses the most widely used computational platforms (CPUs, GPUs, FPGAs and DSPs), while addressing the theoretical and practical trade-offs in computer system organization and the latest developments and trends in computer architecture. The course will help the students in quantifying architectural design decisions in terms of performance, cost and power. An accompanying lab aids the students in applying this knowledge to create powerful heterogeneous (CPU, GPU, FPGA and/or DSP) computational solutions in computationally intensive application domains, such as multimedia and scientific computing.

EE4C07: Advanced Computing Systems
ET4310: Supercomputing for Big Data

Big data is one of the hottest IT terms today, used to describe large and complex data sets that are difficult to process using traditional data processing systems. In this course, we will introduce the student to the most important concepts of big data and the available tools and systems used to manage it. In a series of labs, the students will be working with a number of different big data problems, addressing such aspects as implementing big data algorithms using Hadoop on a high performance computational system, as well as using higher-level languages and system management tools to guide and monitor big data systems. The students will have access to a big data cluster to evaluate the effectiveness of their implemented solution in practice.

ET3432: Computer Architecture and Organisation

This course provides an overview of the architecture and organization of a computer hardware system and the important principles of computer organization. The course demonstrates the interrelation between hardware and software, and illustrates how the computers operates and how they can be programmed, with the emphasis on processor design and implementation.

Topics discussed are computer system overview, measuring and comparing performance, ISA: instruction set architecture (MIPS, x86, 8051, JVM), computer arithmetic, processor implementation, fast processor implementation, memory hierarchy and caching, interfacing, etc.

ET4381: Advanced Multicore Systems

In the past number of years, it has become clear that continued scaling in transistor dimensions can no longer significantly increase processor performance. Factors like the power wall, memory wall and instruction-level parallelism (ILP) wall have shifted the effort to increase performance towards parallel multicore processing. This course discusses the emerging field of multicore processing and talks about multicore architectures in detail. It provides in-depth evaluation of the domain of multicore systems and presents different classifications to enable efficient presentation of available hardware architecture alternatives. Students learn about design goals with respect to processing elements, memory system and interconnect network. Different challenges and open issues in the area of multicore systems will be broadly discussed. Case studies of several prevalent multicore implementations will be presented to promote practical learning.

The study goals of this course are as follows. Identifying possible alternatives for different components of multicore architectures (processing elements, memory system, interconnect network). Determining the design goals from the specific requirements of the application. Exploring the design space of available hardware resources and determining an optimal system architecture for a specific application. Applying advanced research in the domain of multicore systems to optimize the target application.

EE2421: Object-Oriented Programming

This course introduces the students to the basic concepts of object-oriented programming using C++ as a the language of choice. The course teaches how to use data types (operators, expressions, type conversions, declarations), basic programming commands (conditionals and loops), procedural abstraction (functions and parameters), classes and objects (specification en implementation), inheritance (friends, abstraction and polymorphism), algorithm development and the use of UML for object-oriented system development.

The objectives of this course is to give the students the knowledge, insight and tools to develop their own well-designed object-oriented programs. The course is accompanied with a lab that follows the material of the lectures and allows students to experiment with the learned programming concepts.

ET3115: Embedded Systems

An embedded system is a data processing system that is part of a bigger system and that delivers a specific service to the bigger system. Embedded systems play an important role in our daily life and they appear in applications varying from mobile phones to washing machines. This course is a clear continuation to and integration of the first and second year courses in computer systems, because an embedded systems comprises a hardware and a software part. The course addresses a number of topics associated with embedded systems design: requirements, challenges, and design methodologies of embedded systems; combinational and sequential logic, and application-specific hardware optimizations; general-purpose and application-specific instruction-set processors; interrupts; peripherals; memories and their interfaces; buses; state machines; processes, process communication and synchronization, and process scheduling.

ET4076: VLSI Test Technology and Reliability

With the continuous scaling of transistor feature sizes, the VLSI chip density is exponentially increasing. This results in a significant complexity of today's and future VLSI technology; such a complexity has reached the point where billions of transistors are integrated on a single chip (as it is the case for System on Chip). To guarantee customer's satisfaction, produced VLSI chips have to be reliable and fully tested. Verification and production testing represent 50 to 60% of the chips production total cost, and are now the biggest cost of the technology. It has been known for a while that tackling problems associated with testing VLSI chips at earlier design stage levels significantly reduces the testing cost. Thus it is important for hardware designers to be exposed to concepts of VLSI testing which can help them design better products at lower cost. To get a feeling about how important is test technology, you can imagine that just (functionally) testing of a 64bit adder (no flips flops) at 1GHz will cost 585 years! What about today's chips with millions of flip flips? What are the practical and the efficient ways to deal with testing of VLSI chips?

This course is an introduction to the field of digital systems testing, which is an integral part of IC design and manufacturing. The topics discussed are: Importance of VLSI Testing, Test process and Automatic Test Equipment, Defects versus Fault Models, Fault Simulation, Logic Simulation, Combinational Circuit Testing, Sequential Circuit Testing, Memory Testing, Design-for-Testability, Scan Design, Boundary Scan, Built-in-Self Test, Delay Test, Current Testing, semiconductor and IC reliability, etc.

I am currently involved in the following projects:

ALMARVI: Algorithms, Design Methods, and Many-core Execution Platform for Low-Power Massive Data-Rate Video and Image Processing

Advanced image and video processing systems are becoming a crucial and resource consuming part of embedded applications in many sectors. ALMARVI aims to facilitate the transition from a vertically structured market to a horizontally structured market. In particular, it focuses on reducing overall system design cost by 20% - 30% through modularity, flexible interfacing, adaptive architecture, execution platform with well-developed tool chains, adaptability and run-time configurability.

I'm the technical coordinator of ALMARVI. My research in this project involves creating architectures and middle-ware to enable run-time configurable processors that ensure optimal multicore system utilization.

BENEFIC: Best ENergy EFficiency solutions for heterogeneous multI-core Communicating systems

BENEFIC envisions a new category of "nomad smart devices" equipments always connected, requiring more cores working at higher frequencies below tight power consumption budget are invading our daily life. In spite of a lot of efforts in silicon technologies and battery capacity it is not sufficient to compensate or better to overtake the greediness of the new features of such devices in energy. Based on current projections, decreasing CMOS feature size will not be enough to reach future 3GPP bit rate and following this trend a power gap estimated at > 13x in the period until 2020 must to be closed.

I'm a work package leader in BENEFIC. My research in this project involves proposing adaptive voltage-frequency design processes to accommodate environmental and manufacturing process variation in multicore processors.

Completed projects I was involved with:

SMECY: Smart Multicore Embedded SYstems

SMECY envisions that emerging multi-core technologies will rapidly develop to massively parallel computing environments which, due to improved performance, energy and cost properties, will extensively penetrate the embedded system industry in a few years. This will affect and shape the whole business landscape, e.g. semiconductor vendors need to be capable of offering advanced multi-core platforms to diverse application sectors, IP providers need to re-target existing and develop new solutions to be compatible with evolving multi-core platforms and the need of embedded system houses, in addition to product architecture adaptations and renewing their system, architecture, software and hardware development processes.

My research in SMECY involves investigating reliable computing on multicore systems, as well as the automation of efficient interconnect design processes.

COMCAS: COmmunication-centric heterogeneous Multi-Core ArchitectureS

In the coming years, mobile equipment will require more cores (1,4x per year) working at higher frequencies (1,05 x per year), which results in an increase of power consumption. However, evolution of battery capacity (5% per year) will not be sufficient to guarantee the autonomy of embedded systems. Evolution is towards heterogeneous platforms, which include sophisticated on-chip communication infrastructures for higher efficiency and performances. Moreover, heterogeneous cores provide a sound balance between performance and power consumption. The COMCAS project will create a breakthrough in low power design solutions for these new heterogeneous platforms and is crucial for fighting the complexity increase in mobile equipment in the years to come.

I'm a work package leader in COMCAS. Our work in this project was related to the development of power management techniques for real-time applications that are mapped on multiple heterogeneous cores.

INDEXYS: INDustrial EXploitation of the genesYS cross-domain architecture

The objective of INDEXYS is to tangibly realize industrial implementations of cross-domain architectural concepts developed in the GENESYS project in three domains: automotive, aerospace and railway, thereby relating to ARTEMIS-JU Industrial Priority: "Reference designs and architectures". The GENESYS architectural style supports a composable, robust, and comprehensible component-based framework with strict separation of computation from message-based communication. INDEXYS expands the GENESYS approach by implementing and integrating architectural services into prevailing (real-world!) platform solutions. A key goal of INDEXYS is legacy integration, for platform providers - by integrating new architectural services into legacy platforms - and for platform users - by supporting legacy applications.

Zaid Al-Ars

Dit onderdeel wordt voor u geblokkeerd omdat het cookies bevat. Wilt u deze content (en anderen) alsnog bekijken? Door hier op te klikken geeft u alsnog toestemming voor het plaatsen van cookies.

Zaid Al-Ars

Zaid Al-Ars

Deel deze pagina: