Code Generation for Data Processing

Contents

Note (2022-10-04): a more detailed schedule of the topics of this lecture can be found below.

Code generation is a key technique for efficient program execution and data processing. This lecture will cover the following topics from a practical perspective with accompanying hands-on exercises:

  • Execution models of programs (interpretation, bytecode, machine code generation, etc.)
  • Program representations (source code, intermediate representations (IRs), different forms of bytecode)
  • Classical techniques of code generation
    • SSA and optimization techniques, exemplary described on LLVM-IR
    • Machine code generation: instruction selection and register allocation
  • Execution of programs in virtual machines (e.g., WebAssembly, BPF, JavaScript)
    • Sandboxing and optimizations for JIT compilation
  • Execution of database queries (e.g., SQL, data frame API)
    • Execution models and code representations
  • Execution of machine code/binary translation (e.g., RISC-V)
    • Specifics when translating machine code

Organization

  • Lecture with integrated exercises: Thu 10–14 (c.t., with break) in 02.11.018
  • Exercises will include hands-on programming tasks
  • Language: English
  • Module: CIT3230001, 6 ECTS, Bachelor/Master elective
    • Area "Databases and Information Systems" for Informatics, Wirtschaftsinformatik/Information Systems, Informatics: Games Engineering, Biomedical Engineering
    • B1.2 "Advanced Topics in Data Engineering" for Data Engineering and Analystics
  • Written exam (90 minutes), might change to oral on low registration count
  • Zulip stream for this lecture; private contact via e-mail

Prerequisites

The course is aimed at bachelor/master students who have taken the following (or similar) courses:

  • IN0004 Introduction to Computer Architecture
  • IN0008 Fundamentals of Databases

Material

Material and exercises will be regularly provided throughout the semester.

Important: the following schedule is preliminary and subject to change.

Date Lecture TopicHomework
20.10. Overview, Motivation, Interpretation Techniques [lec01.pdf]
Exercise: (no exercise session)
[hw01.txt]
27.10. Compiler Front-end [lec02.pdf]
Exercise: discussion of hw01
[hw02.txt] [hw02-sol.cc]
03.11. IR Concepts, Control Flow Graph, SSA Construction [lec03.pdf] [prs03.pdf]
Exercise: in-class exercise on IR design [ex03.txt]
(none)
10.11. LLVM-IR and IR Design Considerations [lec04.pdf]
Exercise: discussion of hw02
[hw04.txt] [hw04-sol.cc]
17.11. Analyses and Transformations [lec05.pdf] [prs05.pdf]
Exercise: writing an LLVM-IR pass [ex05.txt]
(none)
24.11. Instruction Selection [lec06.pdf] [prs06.pdf]
Exercise: discussion of hw04
[hw06.txt]
01.12. (no lecture/exercise, Dies Academicus)
08.12. Register Allocation
Exercise: discussion of hw06
(to be announced)
15.12. Linker, Loader, Debuginfo
Exercise: (to be announced)
(to be announced)
22.12. Sandboxing
Exercise: (to be announced)
(to be announced)
12.01. JIT-compilation Techniques
Exercise: (to be announced)
(to be announced)
19.01. Query Compilation I: Execution Models and IR Design
Exercise: (to be announced)
(to be announced)
26.01. Query Compilation II: Machine Code Generation
Exercise: (to be announced)
(to be announced)
02.02. ISA Emulation/Binary Translation
Exercise: (to be announced)
(to be announced)
09.02. Wrap-up
Exercise: (to be announced)