Cloud-Based Data Processing

Information

The lectures for this course are pre-recorded and available on the Moodle course web-page link.
The tutorials are going to be held via video web-conference in the following BBB room link.

Content

This course will introduce how modern data centers and clouds work. We will discuss the latest technologies and emerging trends that are driving the evolution of modern data centers, the fundamental design principles behind building scalable systems, and how to do efficient data management cloud-natively and at large scale.

The special focus will be on cloud-based data processing, and in particular on how to design and build scalable services in light of the latest trends in data centers from the infrastructure side (e.g., resource disaggregation, heterogeneous hardware, high performance networks) to system stack support (e.g., virtualization, containers, resource management and scheduling) and modern workload demands (e.g., data management, ML, streaming, security and privacy, etc.).


Organization

  • 5 ECTS
  • SWS 2V + 2Ü
  • Lectures are held in English
  • Lectures uploaded on Tuesdays afternoon to Moodle. The official lecture slot is on Wednesdays between 10am-12pm.
  • The tutorial is held via BBB Wednesdays from 12pm-1pm.
  • For the Gitlab repository see Moodle.
  • For the Mattermost channel see Moodle.
  • The assessment is in the form of a virtual oral exam. Bonus will be given to students who do the exercise assignments and project work.
  • Student with no TUM credentials can still access our Moodle page for the material, please send an email to per.fuchs@cs.tum.edu to obtain the password.

Prerequisites

This course is aimed at master-level students who have already taken some of the following (or similar) courses:

  • IN0008 Fundamentals of Databases
  • IN0009 Basic Principles: Operating Systems and Systems Software
  • IN0010 Introduction to Computer Networks and Distributed Systems

Material

Slides

The slides will be regularly uploaded shortly before each lecture.

Assignments and Project work

We will apply many of the concepts covered in the lecture as part of hands-on homework assignments. The last 5-6 weeks of the semester will be reserved for project work.

Literature

This is not a standard course (i.e., there is no real textbook). Most material is taken out of research papers, which will be references in the slides. However, the following list can be useful either as background or complementary reading.

  • "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems" by Martin Kleppmann
  • "Distributed Systems" by Maarten van Steen and Andrew S. Tanenbaum
  • "Principles of Distributed Database Systems" by M. Tamer Ozsu and Patrick Valduriez