ACM SIGMOD 2015 Programming Contest

Task Details

The ACID principle, mandating atomicity, consistency, isolation and durability is one of the cornerstones of relational database management systems. In order to achieve the third principle, isolation, concurrent reads and writes must be handled carefully.

In practice, transaction processing is often done in an optimistic fashion, where queries and statements are processed first, and only at the commit point the system validates if there was a conflict between readers and writers.

In this challenge, we simulate this in a slightly simplified way: The system processes already executed transactions and only needs to check efficiently whether concurrent queries conflict with them. So, in short: we issue a list of insert and delete statements and your program needs to figure out whether given predicates match the inserted or deleted data.

Each client has to process a number of requests, provided via standard input, and must provide answers via stdout. The protocol itself is a simple binary protocol, which is described in the C++ reference implementation. It also contains the testdriver and a basic test case to verify the functionality of your program locally. Additionally there are also a small and a medium-sized test case which can be used together with the test driver; the latter can be decompressed using tar xpvf data_medium.tar.xz. We also provide unoffical untested stub implementations in Go, Java and Rust.

A basic description of the workload is described in the following section. After the examples there is a detailed description of the semantics of each request type.

During the contest we will also hand out larger test datasets, similar to those used during evaluation.

Example Session

In the following we give a short example of the challenge task. Note that for this example we use a human-readable variant of our protocol; the actual implementation protocol is defined in the next section.

Each session starts by first defining a schema (in this case 2 relations with 3 and 4 columns each), and loading some initial data (4 tuples in relation 0, 3 in relation 1) in transaction 0. The values in the first column are always unique in each relation (primary key constraint).
```
    defineschema [3 4]
    transaction 0 [] [
       0 [1 1 2     2 1 2     3 4 5     7 7 7]
       1 [1 0 0 0    3 0 0 1     4 1 1 1]
    ]
```
Further, consider the following three update transactions. Transaction 1 inserts the tuple [6 5 4] into relation 0, transaction 2 deletes a tuple identified by the first column value 4 from relation 1, and transaction 3 both deletes and inserts a tuple.
```
   transaction 1 [] [0 [6 5 4]]
   transaction 2 [1 [4]]
   transaction 3 [0 [3]] [0 [3 5 6]]
```
Now, consider 3 validation requests that have to be validated against given transaction ranges. Validations 0 and 1 check transactions 1-2, while validation 2 checks all transactions 1-3.
```
    validation 0 1 2 [0 c0=4] [1 c1>8]
    validation 1 1 2 [1 c2=1]
    validation 2 1 3 [0 c0=3 c1=2] [0 c2=4]
```
The predicate in validation 1 could be written as (exists t in r1 | t.c2=1), which checks whether there is a tuple t in relation 1 for which the column value t.c2 equals 1. Similarly, the predicte in validation 2 could be written as (exists t in r0 | t.c0=3 and t.c1=2) or (exists t in r0 | t.c2=4).
Now, if we check whether these predicates are affected by the indicated transactions
- validation request 0 is not conflicting, because no tuples were modified where c0=4 in R0 or where c1>8 in R1),
- validation request 1 is conflicting (with transaction 2), as the deleted tuple had the value c2=1, and
- validation request 2 is conflicting (with transaction 1). Note that the "[0 c0=3 c1=2]" part is not conflicting with transaction 3, as the c1=2 condition was not met, but "[0 c2=4]" conflicted with the insert.
The expected output of the program is therefore
- 0
- 1
- 1
Indicating the validation requests with conflicts (the second and third one).

Workload Specification

Define Schema

Format

defineschema [<columnCounts>, ...]

This will always be the first message in each data set, and defines the number of relations and the number of columns within each relation. Both relation numbers and column numbers are zero based. Columns are always unsigned 64-bit integers. The first column of every relation is the primary key. Note that relations can differ greatly in size, and that value distributions can differ greatly between columns. An implementation can assume that there will not be more than 10,000 relations and not more than 1,000 columns per relation.

Transaction

Format

transaction <transactionId> [<deleteOperations>, ...] [<insertOperations>, ...]

A transaction consists of a number of delete operations followed by a number of insert operations. Transaction Ids are monotonic increasing, and transactions are executed in id order. Deletions modeled as

delete <relationId> [<keys>, ...]

identifying each tuple by its primary key (i.e., the first column). It can happen that a row is already deleted. Insertions are modeled as

insert <relationId> [<values>, ...]

giving the new tuples as streams of values. Inserts will not cause primary key violations (the same transaction might have deleted the tuple before, though).

Validation

Format

validation <validationId> <fromTransaction> <toTransaction> [<queries>, ...]

A validation request consists of a list of conjunctive queries (see below). The validation fails (i.e., the request is conflicting) if any of the conjunctive queries is satisfied by any tuple that is inserted or deleted by a transaction from the transaction range (both from and to are inclusive). Validation ids will be dense and monotonic increasing.

The conjunctive queries are given as

<relationId> [(<column> <operation> <constant>), ...]

where operation is in {=,!=,<,<=,>,>=}.

Note that the distribution of operations is non-uniform. Tests for equality are much more common than other operations, and some columns will be tested more frequent than others.

Flush

Format

flush <validationId>

All messages up to now did not produce client output, the client was free to re-arrange them and execute them as it seemed fit. The flush request triggers the output of all queries up to this point (including), in validationId order, forcing the client to produce the character '0' if the validation succeeded (i.e., there is no conflict), and a '1' if there was a conflict. As some progamming languages buffer stdout and only flush after a newline, you might need to manually flush the output so that it is send to the driver.

Forget

Format

forget <transactionId>

At some point old transactions are no longer relevant for new incoming transactions. The forget request allows to free all memory up to the given transactionId (including). Future validation requests are guaranteed to not ask for transaction ranges that include "forgotten" transactions.

Done

This message is always sent as last message of the data set to terminate the program. It is mainly useful for debugging to see if the input stream was parsed correctly.

Evaluation Environment

Processor	Intel Core i7-3770 @ 3.40Ghz
Configuration	4 Cores / 8 Hyperthreads
Main Memory	32 GB (the program can use at most 20 GB)
Operating System	Ubuntu 14.10
Compilers	GCC 4.9.1, Oracle JVM 1.8, go 1.4.1 and gccgo, rust (nightly 2015-02-10, will be updated to alpha2 during the contest). You can install the rust nightly version with: curl https://static.rust-lang.org/rustup.sh \| sed 's/^RUST_URL.*/RUST_URL="http:\/\/static.rust-lang.org\/dist\/2015-02-10\"/g' \| sudo sh

Evaluation Process

The winner will be the submission that completes all of the given workload with the smallest total execution time (including the time for program startup). Compile time is limited to 60 seconds and is not included in processing time. Processing time is initially limited to 30 seconds. Submissions that produce correct results within the time limits will appear on the leaderboard ranked by their total processing time. Additional workload configurations may be added in the coming weeks. Time limits may decrease as the contest continues.

Submitting

The SIGMOD 2015 programming contest management system accepts submissions of the form [submission.tar.gz]. (Here is the cpp sample you can modify.) Each submission must contain the following files/directories:

run.sh
This script is responsible for running your executable file(s) that evaluate the query stream (through stdin) and printing the results to standard output (stdout).
README
This file must contain information about the submission, including:
1. the team name
2. for each team member: full name, e-mail, institution, department, and degree program
3. advisor/supervisor name (if any)
4. a brief description of techniques used for executing queries
5. all the third party code used
src/
This directory must contain all of the source code.
compile.sh
This shell script must be able to compile the source contained in the src directory. The produced executable file(s) must be located within the src directory. The benchmark environment is isolated and without internet access. You will have to provide all files required for successfull compilation.

Notes:

Please ensure that the following command places run.sh and src/ in the current directory (you can also use the package.sh script included in the reference implementation):
```
tar -zxvf submission.tar.gz
```
Everything submitted must be open source under some license such that we can publish and make freely available all the code from the finalists’ implementations. Submissions which are not fully open source or which cannot be made freely available will be disqualified.
We request participants submit executable files so that their implementations can be evaluated even if their submitted source code doesn't compile on our machine (due to compatibility issues, access right problems, etc.). During later stages of the contest we might change the process to only accept source and use the locally compile binaries.
Before submission, please ensure that your output of your program matches the simple testcase bundled with the reference code. If you have a question regarding the correctness of the provided answers, please contact us at sigmod15contest@gmail.com

Submissions will be evaluated and results will appear on the leaderboard. Teams are allowed to make several test submissions before their final submission. Final submissions must be made by April 7, 2015, 23:59 GMT+2.

Rules

The 2015 SIGMOD Programming Contest is open to undergraduate and graduate students from degree-granting institutions all over the world. Students associated with the organizers, however, are not eligible to participate.
Teams must consist of individuals currently registered as graduate or undergraduate students in an accredited academic institution. A team may be formed by one or more students who may or may not be enrolled at the same institution. Several teams from the same institution can compete independently, but one person can be a member of only one team. There is no limit on team size. Teams can register on the contest site after February 19, 2015.
All submissions must consist only of code written by the team or open source licensed software, all compatible with the MIT License. For source code from books or public articles, clear reference and attribution must be made. Final submissions must be made by April 7, 2015, 23:59 GMT+2.
All teams must agree to license their code under the MIT License. By participating in this contest, each team agrees to publish its source code. The finalists' implementations will be made public on the contest website.
A team is eligible for the prize if at least one of its members will come to present the work at the SIGMOD 2015 conference. The travel grants will only be granted to eligible teams.

ACM Sigmod 2015 Programming Contest

Task Details

Example Session

Workload Specification

Define Schema

Transaction

Validation

Flush

Forget

Done

Evaluation Environment

Evaluation Process

Submitting

Rules