Seminar: Techniques for implementing main memory database systems

Information

Content

In this seminar we deal with techniques for implementing main memory database systems and related topics.

Prerequisites

  • lecture Fundamentals of Databases (Grundlagen Datenbanken, GDB) or similar course
  • Very good knowledge in data bases, good programming skills in C++ (depends on topic)

Dates

  • weekly meeting: on tuesday, 2 p.m. - 3:30 p.m., room 02.09.014
  • first meeting: October 17, 2017

Organization

  • First organisational meeting for the seminar: Thursday, July. 13th, 4.00 p.m., room MI 02.13.010
  • Besides the seminar talk also an implementation of the key aspects of your topic as a component of a main memory database system in C++ has to be done. For the data mining topics it would be useful to get SQL queries instead.
  • Contact us by email or in person to obtain literature recommendations. For most topics you find papers of our group on our web site that serve as primary source for your seminar.
  • The presentation can be done in English or in German. English is recommended only if you are proficient in English writing and conversation.

Topics

Alle Themen orientieren sich an der Architektur unseres Hauptspeicher-Datenbanksystems HyPer (hyper-db.de). Auf der Webseite finden Sie auch entsprechende Literaturreferenzen. Viele Themen werden auch im entsprechenden Kapitel des Lehrbuchs "Datenbanksystem: Eine Einführung" abgehandelt (dort allerdings in knapperer Form als wir es von Ihrer Ausarbeitung erwarten). Weiterhin empfehlen wir die Nutzung der Bibliographie-Datenbank dblp. Kontaktieren Sie uns rechtzeitig (nachdem) Sie sich eingelesen/eingearbeitet haben, um den Aufbau zu besprechen.

  • new topics / neue Themen:
    • Versioning for databases / Versionsverwaltung für Datenbanken (Benedikt Kleiner)
    • Aggregation of temporal data on NVIDIA GPUs (supervisor: Andreas Kipf, NVIDIA graphics board required)
    • Improvements of Bloom-Filters (idea by: Andreas Kipf) (Matthias Bungeroth)
    • Database Cracking (David Werner)
    • Datamining on specific algorithms: Is it possible using SQL?
      • Clustering-Algorithms like DBScan
      • Classification: Decision Trees (Dominik Vinan)
      • Classification: Naive-Bayes
      • Linear Regression
      • Logistic Regression
      • Time series analysis: ARIMA model
      • Text analysis: TFIDF (Thuy Tran)
      • Text analysis: Topic analysis
      • Hypthesis-Testing
    • MapReduce and SQL: Do we need MapReduce? (Michael Schwarz)
    • How database index structures can be used for Data Mining (Johannes Kirchmaier)
    • Topics about Graph Databases (supervisor: Jan Böttcher) (Mahammed Valiyev)
    • Latency Hiding in Tree Lookups (supervisor: Timo Kersten) (Lukas Karnowski)
  • Last year topics (can be reused) / Themen des Vorjahres (können übernommen werden)
    • Multi-Core Rechner / NUMA /Multi-Threaded Parallelization
    • Column-Store / Row-Store / Hybrid Store
    • Snapshotting / Schattenspeicher (Daniel Kutasi)
    • Kompilation von Anfrageplänen - versus Interpretation
    • Synchronisation: Lock-free versus 2PL
    • Compaction
    • Indexing: ART
    • parallele Hash-Joins: Radix-Join versus globale HT
    • Parallelization of a Query Engine (Thomas Blum)
    • HTM versus Latching
    • Multi-Version Concurrency Control

Material

  • LaTeX Template for Thesis (UTF-8): link
  • LaTeX Template for Thesis (Windows): link
  • Notes on C++: link
  • Slides of the organisational meeting: link
  • Gitlab of our Chair: link