ACM Sigmod 2015 Programming Contest |
The ACID principle, mandating atomicity, consistency, isolation and durability is one of the cornerstones of relational database management systems. In order to achieve the third principle, isolation, concurrent reads and writes must be handled carefully.
In practice, transaction processing is often done in an optimistic fashion, where queries and statements are processed first, and only at the commit point the system validates if there was a conflict between readers and writers.
In this challenge, we simulate this in a slightly simplified way: The system processes already executed transactions and only needs to check efficiently whether concurrent queries conflict with them. So, in short: we issue a list of insert and delete statements and your program needs to figure out whether given predicates match the inserted or deleted data.
Each client has to process a number of requests, provided via standard input, and must provide answers via stdout. The protocol itself is a simple binary protocol, which is described in the C++ reference implementation. It also contains the testdriver and a basic test case to verify the functionality of your program locally. Additionally there are also a small and a medium-sized test case which can be used together with the test driver; the latter can be decompressed using tar xpvf data_medium.tar.xz. We also provide unoffical untested stub implementations in Go, Java and Rust.
A basic description of the workload is described in the following section. After the examples there is a detailed description of the semantics of each request type.
During the contest we will also hand out larger test datasets, similar to those used during evaluation.
In the following we give a short example of the challenge task. Note that for this example we use a human-readable variant of our protocol; the actual implementation protocol is defined in the next section.
Each session starts by first defining a schema (in this case 2 relations with 3 and 4 columns each), and loading some initial data (4 tuples in relation 0, 3 in relation 1) in transaction 0. The values in the first column are always unique in each relation (primary key constraint).
defineschema [3 4] transaction 0 [] [ 0 [1 1 2 2 1 2 3 4 5 7 7 7] 1 [1 0 0 0 3 0 0 1 4 1 1 1] ]
Further, consider the following three update transactions. Transaction 1 inserts the tuple [6 5 4] into relation 0, transaction 2 deletes a tuple identified by the first column value 4 from relation 1, and transaction 3 both deletes and inserts a tuple.
transaction 1 [] [0 [6 5 4]] transaction 2 [1 [4]] transaction 3 [0 [3]] [0 [3 5 6]]
Now, consider 3 validation requests that have to be validated against given transaction ranges. Validations 0 and 1 check transactions 1-2, while validation 2 checks all transactions 1-3.
validation 0 1 2 [0 c0=4] [1 c1>8] validation 1 1 2 [1 c2=1] validation 2 1 3 [0 c0=3 c1=2] [0 c2=4]
Format | defineschema [<columnCounts>, ...] |
Format | transaction <transactionId> [<deleteOperations>, ...] [<insertOperations>, ...] |
A transaction consists of a number of delete operations followed by a number of insert operations. Transaction Ids are monotonic increasing, and transactions are executed in id order. Deletions modeled as
delete <relationId> [<keys>, ...]
identifying each tuple by its primary key (i.e., the first column). It can happen that a row is already deleted. Insertions are modeled as
insert <relationId> [<values>, ...]
giving the new tuples as streams of values. Inserts will not cause primary key violations (the same transaction might have deleted the tuple before, though).
Format | validation <validationId> <fromTransaction> <toTransaction> [<queries>, ...] |
A validation request consists of a list of conjunctive queries (see below). The validation fails (i.e., the request is conflicting) if any of the conjunctive queries is satisfied by any tuple that is inserted or deleted by a transaction from the transaction range (both from and to are inclusive). Validation ids will be dense and monotonic increasing.
The conjunctive queries are given as
<relationId> [(<column> <operation> <constant>), ...]
where operation is in {=,!=,<,<=,>,>=}.
Note that the distribution of operations is non-uniform. Tests for equality are much more common than other operations, and some columns will be tested more frequent than others.
Format | flush <validationId> |
All messages up to now did not produce client output, the client was free to re-arrange them and execute them as it seemed fit. The flush request triggers the output of all queries up to this point (including), in validationId order, forcing the client to produce the character '0' if the validation succeeded (i.e., there is no conflict), and a '1' if there was a conflict. As some progamming languages buffer stdout and only flush after a newline, you might need to manually flush the output so that it is send to the driver.
Format | forget <transactionId> |
At some point old transactions are no longer relevant for new incoming transactions. The forget request allows to free all memory up to the given transactionId (including). Future validation requests are guaranteed to not ask for transaction ranges that include "forgotten" transactions.
This message is always sent as last message of the data set to terminate the program. It is mainly useful for debugging to see if the input stream was parsed correctly.
Processor | Intel Core i7-3770 @ 3.40Ghz |
Configuration | 4 Cores / 8 Hyperthreads |
Main Memory | 32 GB (the program can use at most 20 GB) |
Operating System | Ubuntu 14.10 |
Compilers | GCC 4.9.1, Oracle JVM 1.8, go 1.4.1 and gccgo, rust (nightly 2015-02-10, will be updated to alpha2 during the contest). You can install the rust nightly version with: curl https://static.rust-lang.org/rustup.sh | sed 's/^RUST_URL.*/RUST_URL="http:\/\/static.rust-lang.org\/dist\/2015-02-10\"/g' | sudo sh |
tar -zxvf submission.tar.gz