Datenbanksysteme und moderne CPU-Architekturen

Information

Content

  • This lecture covers the implementation of database systems, including how to leverage modern hardware architectures.
  • The lectures are held in English.
  • In the Exercises for this lecture, you will have the chance to build a tiny database system from scratch.

Prerequisites

  • Grundlagen der Informatik
  • Grundlagen Datenbanksysteme (GDB) (IN0008)

Organization

  • First Lecture: Tuesday, April 25th 2017
  • Lecture + Exercises: 2pm-5pm
  • Bonus: 75% required for the 0.3/0.4 grade bonus
  • Room: 02.09.014 (seminar-room)

Literature

  • Theo Härder, Erhard Rahm. Datenbanksysteme: Konzepte und Techniken der Implementierung. Springer, Berlin; 2nd ed.
  • Hector Garcia-Molina, Jeff Ullman, Jennifer Widom. Database Systems: The Complete Book
  • D. E. Knuth. The Art of Computer Programming Volume III
  • Joseph M. Hellerstein, Michael Stonebraker, James Hamilton. Architecture of a Database System

Lecture Slides



Project

  • Please read this page for some general notes on the programming assignments.
  • Please provide (minimal) documentation and comment your code.
  • You should work on the project in teams of two students. (The first task can be done alone.)
  • Send an email to kohna@in.tum.de with a zip/tar.gz solution file. Please use the prefix [dbimpl] in your submission email’s subject.

Task 1: External Sort (due: May 2, 10am)

  1. Write a function void externalSort(int fdInput, uint64_t size, int fdOutput, uint64_t memSize) that sorts size 64 bit unsigned integer values stored in the file referred to by the file descriptor fdInput using memSize bytes of main memory and stores the result in the file associated with the file descriptor fdOutput. Your function should implement the external merge sort algorithm and should perform a k-way merge during the merge phase, i.e. merge k runs together at once. To sort individual runs, you may use STL’s std::sort. To manage the k-way merge, the STL std::priority_queue (from std:queue) may be helpful. Useful system calls are open/cose, read/write, pread/pwrite, posix_fallocate.
  2. Write a test case that sorts 5GB of data and verifies the order of the output. The command-line interface must be sort inputFile outputFile memoryBufferInMB. Use the input file generator for testing. Your data format must adhere to the format specified in the program.