Data Processing On Modern Hardware

Information

The lectures for this course are pre-recorded and available on the Moodle course web-page link.
The tutorials are going to be held via video web-conference in the following BBB room link.

Content

This course highlights some of the implications that current hardware trends have on database processing. Advances such as deep cache hierarchies or the use of hardware accelerators had a major impact on how we design and implement data processing algorithms and data structures. This lecture will show how carefully laying out data in memory and good algorithm design can increase the effectiveness of hardware caches; how we can speed up database operations by parallelizing on modern CPUs; how to achieve efficient synchronization for data structures; and how to leverage specialized instructions/accelerators (e.g., SIMD) for data processing. We are also going to take a look at offloading computation to programmable hardware devices (FPGAs), and see how we can benefit from novel network and storage technologies (RDMA and NVRAM).

More specifically, the topics we are going to cover are the following topics:

Writing efficient code for the memory hierarchy.
Parallelizing data-intensive tasks on multi-core CPUs.
Using efficient synchronization of data structures.
Leveraging modern hardware features and technologies for compute (e.g., SIMD processing, accelerators), network (e.g., RDMA), and storage (e.g., NVRAM).

Organization

5 ECTS
Lectures are held in English
Lectures uploaded on Tuesdays afternoon (Moodle)
Tutorial web-conference, Wednesdays 11am-12pm
Gitlab
Mattermost
There will be an exam (E) and graded homework (H). The overall grade will be computed as follows: min(E, E*0.6+H*0.4). In addition, you have to achieve 4.0 in the exam to pass the course. The exam may be oral (depending on the number of participants).

Prerequisites

The course is aimed at Master-level students who have solid systems programming experience in C/C++ and have already taken the following (or similar) courses:

Introduction to Databases
Introduction to Computer Architecture

Material

Slides

Slides will be regularly uploaded shortly before each lecture.

Lecture 1: Introduction and Hardware Trends (videos)
Lecture 2: Cache Awareness (videos)
Lecture 3: Processing Models (videos)
Lecture 4: In-memory Joins (videos)
Lecture 5: Instruction Execution (videos)
Lecture 6: Data-level Parallelism (SIMD) (videos)
Lecture 7: Multicore parallelism and synchronization (videos)
Lecture 8: Multicore NUMA, interference and isolation (videos)
Lecture 9: Offloading Compute-intensive Tasks to FPGAs by Prof. Zsolt Istvan (video)
Lecture 10: Rack-scale data processing (videos)

Homework

Assignment 1: Cache awareness
Assignment 2: Processing models
Assignment 3: Hash joins
Assignment 4: SIMD
Assignment 5: Synchronization
Assignment 6: Parallelization and NUMA-awareness

Literature

This is not a standard course (i.e., there is no real textbook). Most material is taken out of research papes, which will be referenced in the slides. However, the following list can be useful either as background or complementary reading.

"Computer Architecture: A Quantitative Approach" (6th edition) by Hennessy and Petterson.
"Computer Systems: A Programmer's Perspective" (3rd edition) by Bryant and O'Hallaron
Intel's Software Developer Manuals
Intel's Top-Down Microarchitectural Analysis Method (TMAM) and the Roofline model
Agner Fog's Software optimization resources
Ulrich Drepper's What Every Programmer Should Know About Memory

Technische Universität München

Data Processing On Modern Hardware

Information

Content

Organization

Prerequisites

Material

Slides

Homework

Literature

Lehrstuhl III / XXV
Datenbanksysteme

Prof. Neumann
Prof. Giceva
Fakultät für Informatik
TU München

Navigation

Technische Universität München

Data Processing On Modern Hardware

Information

Content

Organization

Prerequisites

Material

Slides

Homework

Literature

Lehrstuhl III / XXV Datenbanksysteme Prof. Neumann Prof. Giceva Fakultät für Informatik TU München

Navigation

Lehrstuhl III / XXV
Datenbanksysteme

Prof. Neumann
Prof. Giceva
Fakultät für Informatik
TU München