Cloud-Based Data Processing
InformationThe lectures for this course are pre-recorded and available on the Moodle course web-page link.
The tutorials are going to be held via video web-conference in the following BBB room link.
This course will introduce how modern data centers and clouds work. We will discuss the latest technologies and emerging trends that are driving the evolution of modern data centers, the fundamental design principles behind building scalable systems, and how to do efficient data management cloud-natively and at large scale.
The special focus will be on cloud-based data processing, and in particular on how to design and build scalable services in light of the latest trends in data centers from the infrastructure side (e.g., resource disaggregation, heterogeneous hardware, high performance networks) to system stack support (e.g., virtualization, containers, resource management and scheduling) and modern workload demands (e.g., data management, ML, streaming, security and privacy, etc.).
- 5 ECTS
- SWS 2V + 2Ü
- Lectures are held in English
- Lectures uploaded on Tuesdays afternoon to Moodle. The official lecture slot is on Wednesdays between 10am-12pm.
- The tutorial is held via BBB Wednesdays from 12pm-1pm.
- For the Gitlab repository see Moodle.
- For the Mattermost channel see Moodle.
- The assessment is in the form of a virtual oral exam. Bonus will be given to students who do the exercise assignments and project work.
- Student with no TUM credentials can still access our Moodle page for the material, please send an email to firstname.lastname@example.org to obtain the password.
PrerequisitesThis course is aimed at master-level students who have already taken some of the following (or similar) courses:
- IN0008 Fundamentals of Databases
- IN0009 Basic Principles: Operating Systems and Systems Software
- IN0010 Introduction to Computer Networks and Distributed Systems
The slides will be regularly uploaded shortly before each lecture.
- Lecture 1: Introduction + Scalable Systems for the Cloud
- Lecture 2: Overview of Datacenters and Cloud Computing
- Lecture 3: Distributed Data Part I (Replication and Partitioning)
- Lecture 4: Distributed Data Part II (Fault-tolerance, Broadcast Protocols)
- Lecture 5: Consensus
- Lecture 6: Consistency
- Lecture 7: Cloud-native OLTP (DBaaS)
- Lecture 8: Cloud-native OLAP (data warehouses)
- Lecture 9: Cluster-level Scheduling and Resource Management
Assignments and Project work
We will apply many of the concepts covered in the lecture as part of hands-on homework assignments. The last 5-6 weeks of the semester will be reserved for project work.
- Assignment 1: High Level System Design
- Assignment 2: A System Design from Scratch
- Assignment 3: A Physical Design for an URL Shortener Service
- Assignment 4: Implementing a Highly Available URL Shortener Service in the Amazon Cloud
- Project: Sorting 1 TB in the Cloud
This is not a standard course (i.e., there is no real textbook). Most material is taken out of research papers, which will be references in the slides. However, the following list can be useful either as background or complementary reading.
- "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems" by Martin Kleppmann
- "Distributed Systems" by Maarten van Steen and Andrew S. Tanenbaum
- "Principles of Distributed Database Systems" by M. Tamer Ozsu and Patrick Valduriez