Datenbanksysteme und moderne CPU-Architekturen

Information

Content

  • This lecture covers the implementation of database systems, including how to leverage modern hardware architectures.
  • The lectures are held in English.
  • In the Exercises for this lecture, you will have the chance to build a tiny database system from scratch.

Prerequisites

  • Grundlagen der Informatik
  • Grundlagen Datenbanksysteme (GDB) (IN0008)

Organization

  • Exam Review: Thursday 28 June 3pm, room: 02.09.058

Literature

  • Theo Härder, Erhard Rahm. Datenbanksysteme: Konzepte und Techniken der Implementierung. Springer, Berlin; 2nd ed.
  • Hector Garcia-Molina, Jeff Ullman, Jennifer Widom. Database Systems: The Complete Book
  • D. E. Knuth. The Art of Computer Programming Volume III
  • Joseph M. Hellerstein, Michael Stonebraker, James Hamilton. Architecture of a Database System

Lecture Slides



Project

  • Please read this page for some general notes on the programming assignments.
  • Please provide (minimal) documentation and comment your code.
  • You should work on the project in teams of two students. (The first task can be done alone.)
  • Send an email to leis@in.tum.de with a zip/tar.gz solution file. Please use the prefix [dbimpl] in your submission email’s subject.

Task 1 (due: April 19, 10am)

  1. Write a function void externalSort(int fdInput, uint64_t size, int fdOutput, uint64_t memSize) that sorts size 64 bit unsigned integer values stored in the file referred to by the file descriptor fdInput using memSize bytes of main memory and stores the result in the file associated with the file descriptor fdOutput. Your function should implement the external merge sort algorithm and should perform a k-way merge during the merge phase, i.e. merge k runs together at once. To sort individual runs, you may use STL’s std::sort. To manage the k-way merge, the STL std::priority_queue (from std:queue) may be helpful. Useful system calls are open/cose, read/write, pread/pwrite, posix_fallocate.
  2. Write a test case that sorts 5GB of data and verifies the order of the output. The command-line interface must be sort inputFile outputFile memoryBufferInMB. Use the input file generator for testing. Your data format must adhere to the format specified in the program.

Task 2 (due: May 3, 10am)

Write a basic buffer manager that manages buffer frames and controls concurrent access to these frames. The buffer manager should offer the following functionality:
  • BufferManager::BufferManager(unsigned pageCount)
    Create a new instance that keeps up to pageCount frames in main memory.
  • BufferFrame& BufferManager::fixPage(uint64_t pageId, bool exclusive)
    A method to retrieve frames given a page ID and indicating whether the page will be held exclusively by this thread or not. The method can fail (by throwing an exception) if no free frame is available and no used frame can be freed. The pageID variable is split into a segment ID and the actual page ID. Each page is stored on disk in a file with the same name as its segment ID (e.g., "1").
  • void BufferManager::unfixPage(BufferFrame& frame, bool isDirty)
    Return a frame to the buffer manager indicating whether it is dirty or not. If dirty, the page manager must write it back to disk. It does not have to write it back immediately, but must not write it back before unfixPage is called.
  • void* BufferFrame::getData()
    A buffer frame should offer a method giving access to the buffered page. Except for the buffered page, BufferFrame objects can also store control information (page ID, dirtyness, ...).
  • BufferManager::~BufferManager()
    Destructor. Write all dirty frames to disk and free all resources.
Your buffer manager should have the following features:
  • High performance. Release locks as early as possible.
  • Concurrency: It should be able to handle concurrent method invocations efficiently (e.g. using pthread_rwlock_t). Requests to fixPage should block until the requested access (exclusive or shared) can be fulfilled.
  • Buffering: It should use a buffer of size frames to keep pages in memory as long as possible. If no free frames are available, old frames should be reclaimed using some reasonable strategy.
Your buffer manager does not need to flush of pages to disk asynchronously or prefetch pages that are likely to be accessed in the near future. Use this test program to validate your implementation.

Task 3 (due: May 17, 10am)

Create a metadata/schema segment for your database system. For each relation store its name, segment id, size in pages, and all its attributes and types. You should be able to serialize and deserialize the meta data segment to/from disk. Always store it in segment 0, e.g. in a file called 0. If you want to able to read SQL schemas, you may find the schema parser and the schema example file useful. Alternatively, you can simply create the schema programmatically.

Implement slotted pages for your database system.
  1. Define a segment type SPSegment that operates on slotted pages. A slotted page consists of three parts: A header, the slots and the (variable-length) records. Records are addressed by TIDs (tuple identifier), consisting of a page ID and a slot ID.
  2. Provide an interface to insert, remove, update and lookup "records". Records consist of a size and the data (you could also think of them as strings: they contain a length indicator len followed by a pointer to len characters).
    TID SPSegment::insert(const Record& r)
    Searches through the segment’s pages looking for a page with enough space to store r. Returns the TID identifying the location where r was stored. Note: This can be implemented much more efficiently with a free space bitmap as described in chapter 3, slide 3, but you are not required to do this.
    bool SPSegment::remove(TID tid)
    Deletes the record pointed to by tid and updates the page header accordingly.
    Record SPSegment::lookup(TID tid)
    Returns the read-only record (Record.hpp) associated with TID tid.
    bool SPSegment::update(TID tid, const Record& r)
    Updates the record pointed to by tid with the content of record r.

Test your implemention, for example with this slotted pages test skeleton

Task 4 (due: May 31, 10am)

Implement a B+-Tree index for your database system. Your tree should support different (opaque) key types. Parameterize the B+-Tree with a key type and a comparator. You can assume that keys are unique and all key types have fixed length. The B+-Trees must support the following reentrant operations
  • insert Inserts a new key/TID pair into the tree.
  • erase Deletes a specified key. You may simplify the logic by accepting under full pages.
  • lookup Returns a TID or indicates that the key was not found.
Use the concurrency control techniques from the slides "Concurrent Access (2)" and "Concurrent Access (3)". Test your implemention, for example with this B+Tree test skeleton.

Task 5 (due: June 14, 10am)

Create the following simplifed physical operators for your database system:
  • Print: Prints out all input tuples in a human-readable format.
  • Table Scan: Scans a relation and produces all tuples as output.
  • Projection: Projects to a subset of the input schema.
  • Select: Implements predicates of the form a = c where a is an attribute and c is a constant
  • Hash Join: Compute inner join by storing left input in main memory, then find matches for each tuple from the right side. The predicate is of the form left.a = right.b.
In general, all operators should offer (a superset of) the following interface:
  • void open(): Open the operator
  • bool next(): Produce the next tuple
  • vector getOutput(): Get all produced values
  • void close(): Close the operator
Begin by creating a Register class that can be used to store and retrieve values of any type through methods like int getInteger() or void setString(const string& s). In your implementation, you may restrict the database types to integer and a fixed-size character type. It also needs to be able to compare Register objects (operator and operator==) and compute a hash value (e.g. for Hash Join operators). The Table Scan operator is initialized (in its constructor) with a relation. Its next method reads the next tuple (if any) and its getOutput method returns the values of the current tuple. The Print operator is initialized with an input operator and an output stream to which its next method writes the next tuple (if any) in a human-readable format. The Projection operator is initialized with an input operator and a list of register IDs (indexes into the register vector) it should project to. The Selection operator is initialized with an input operator, a register ID and a constant. The Hash Join operator is initialized with two input operators, and two register IDs. One ID is from the left side and one is from the right side.

Task 6 (due: July 5, 10am)

Implement a parallel hash join algorithm using the following techniques:
  • Chaining with locks: Implement your parallel hash join using a hash table with fine-grained locking (one lock per chain). You can try out the different mutex variants provided by Intel TBB.
  • Chaining without locks: Avoid using locks in this implementation. You should make use of std::atomic::compare_exchange_weak.
  • Linear probing: Similar to Chaining, you should make use of std::atomic::compare_exchange_weak.
  • Please add your implementation to hashjoinskeleton.cpp. Compare your implementation against the provided STL implementation. You may use parallel_for provided by Intel TBB.

Task 7 (optional, due: July 12, 10am)

Based on the simple LLVM example program, create an LLVM-based “subscript”-compiler for simple binary arithmetic expressions (e.g., (v0 + v1) * (v2 − v3)).
Write a function similar to CreateFibFunction that takes an llvm::Module, an llvm::LLVMContext and the root of a binary arithmetic expression tree as parameters. The tree’s nodes contain the operators *,/,+,−, constants and variables (v0, ..., vn). The function generates a function f that can be called with n integer argments and returns the result of the expression, e.g. in the example above f(1,2,5,2) returns 9 (since (1 + 2) * (5 − 2) = 9). Your code may be completely independent from your database system’s code base.