CH-benCHmark

Summary

While standardized and widely used benchmarks address either operational or real-time Business Intelligence (BI) workloads, the lack of a hybrid benchmark led us to the definition of a new, complex, mixed workload benchmark, called mixed workload CH-benCHmark.

This benchmark bridges the gap between the established single-workload suites of TPC-C for OLTP and TPC-H for OLAP, and executes a complex mixed workload: a transactional workload based on the order entry processing of TPC-C and a corresponding TPC-H-equivalent OLAP query suite run in parallel on the same tables in a single database system. As it is derived from these two most widely used TPC benchmarks, the CH-benCHmark produces results highly relevant to both hybrid and classic single-workload systems.

People

  • External partners
    • Richard Cole (ParAccel)
    • Leo Giakoumakis (Microsoft)
    • Wey Guy (Microsoft)
    • Harumi Kuno (HP Labs)
    • Raghunath Nambiar (Cisco)
    • Meikel Poess (Oracle)
    • Kai-Uwe Sattler (TU Ilmenau)
    • Eric Simon (SAP)
    • Florian Waas (Greenplum)

Publications

  • Iraklis Psaroudakis, Florian Wolf, Norman May, Thomas Neumann, Alexander Böhm, Anastasia Ailamaki and Kai-Uwe Sattler Scaling up Mixed Workloads: a Battle of Data Freshness, Flexibility, and Scheduling Proceedings of the 6th TPC Technology Conference on Performance Evaluation and Benchmarking (TPC TC) September 2014Source Code
  • Richard Cole, Florian Funke, Leo Giakoumakis, Wey Guy, Alfons Kemper, Stefan Krompass, Harumi Kuno, Raghunath Nambiar, Thomas Neumann, Meikel Poess, Kai-Uwe Sattler, Michael Seibold, Eric Simon, Florian Waas The Mixed Workload CH-benCHmark. Proceedings of the 4th International Workshop on Testing Database Systems (DBTest) June 2011
  • Florian Funke, Alfons Kemper, Stefan Krompass, Harumi Kuno, Thomas Neumann, Anisoara Nica, Meikel Poess, Michael Seibold Metrics for Measuring the Performance of the Mixed Workload CH-benCHmark Proceedings of the 3rd TPC Technology Conference on Performance Evaluation and Benchmarking (TPC TC) August 2011

The Benchmark: Analytical Queries

Query 1

This query reports the total amount and quantity of all shipped orderlines given by a specific time period. Additionally it informs about the average amount and quantity plus the total count of all these orderlines ordered by the individual orderline number.

select   ol_number,
	 sum(ol_quantity) as sum_qty,
	 sum(ol_amount) as sum_amount,
	 avg(ol_quantity) as avg_qty,
	 avg(ol_amount) as avg_amount,
	 count(*) as count_order
from	 orderline
where	 ol_delivery_d > '2007-01-02 00:00:00.000000'
group by ol_number order by ol_number

Query 2

Query for listing suppliers and their distributed items having the lowest stock level for a certain item and certain region.

select 	 su_suppkey, su_name, n_name, i_id, i_name, su_address, su_phone, su_comment
from	 item, supplier, stock, nation, region,
	 (select s_i_id as m_i_id,
		 min(s_quantity) as m_s_quantity
	 from	 stock, supplier, nation, region
	 where	 mod((s_w_id*s_i_id),10000)=su_suppkey
	 	 and su_nationkey=n_nationkey
	 	 and n_regionkey=r_regionkey
	 	 and r_name like 'Europ%'
	 group by s_i_id) m
where 	 i_id = s_i_id
	 and mod((s_w_id * s_i_id), 10000) = su_suppkey
	 and su_nationkey = n_nationkey
	 and n_regionkey = r_regionkey
	 and i_data like '%b'
	 and r_name like 'Europ%'
	 and i_id=m_i_id
	 and s_quantity = m_s_quantity
order by n_name, su_name, i_id

Query 3

Unshipped orders with the highest price amount for a customer will be listed within a given state and with orders newer than a specific timestamp. This list will be sorted by the descending amount.

select   ol_o_id, ol_w_id, ol_d_id,
	 sum(ol_amount) as revenue, o_entry_d
from 	 customer, neworder, orders, orderline
where 	 c_state like 'A%'
	 and c_id = o_c_id
	 and c_w_id = o_w_id
	 and c_d_id = o_d_id
	 and no_w_id = o_w_id
	 and no_d_id = o_d_id
	 and no_o_id = o_id
	 and ol_w_id = o_w_id
	 and ol_d_id = o_d_id
	 and ol_o_id = o_id
	 and o_entry_d > '2007-01-02 00:00:00.000000'
group by ol_o_id, ol_w_id, ol_d_id, o_entry_d
order by revenue desc, o_entry_d

Query 4

This query is listing all orders with orderlines or just parts of them shipped after the entry date of their booking.

select	o_ol_cnt, count(*) as order_count
from	orders
where	o_entry_d >= '2007-01-02 00:00:00.000000'
	and o_entry_d < '2012-01-02 00:00:00.000000'
	and exists (select *
		    from orderline
		    where o_id = ol_o_id
		    and o_w_id = ol_w_id
		    and o_d_id = ol_d_id
		    and ol_delivery_d >= o_entry_d)
group	by o_ol_cnt
order	by o_ol_cnt

Query 5

Query result for getting information about achieved revenues of nations within a given region. All nations are sorted by the total amount of revenue gained since the given date.

select	 n_name,
	 sum(ol_amount) as revenue
from	 customer, orders, orderline, stock, supplier, nation, region
where	 c_id = o_c_id
	 and c_w_id = o_w_id
	 and c_d_id = o_d_id
	 and ol_o_id = o_id
	 and ol_w_id = o_w_id
	 and ol_d_id=o_d_id
	 and ol_w_id = s_w_id
	 and ol_i_id = s_i_id
	 and mod((s_w_id * s_i_id),10000) = su_suppkey
	 and ascii(substr(c_state,1,1)) = su_nationkey
	 and su_nationkey = n_nationkey
	 and n_regionkey = r_regionkey
	 and r_name = 'Europe'
	 and o_entry_d >= '2007-01-02 00:00:00.000000'
group by n_name
order by revenue desc

Query 6

Query lists the total amount of archived revenue from orderlines which were delivered in a specific period and a certain quantity.

select	sum(ol_amount) as revenue
from	orderline
where	ol_delivery_d >= '1999-01-01 00:00:00.000000'
	and ol_delivery_d < '2020-01-01 00:00:00.000000'
	and ol_quantity between 1 and 100000

Query 7

Query for showing the bi-directional trade volume between two given nations sorted by their names and the considered years.

select	 su_nationkey as supp_nation,
	 substr(c_state,1,1) as cust_nation,
	 extract(year from o_entry_d) as l_year,
	 sum(ol_amount) as revenue
from	 supplier, stock, orderline, orders, customer, nation n1, nation n2
where	 ol_supply_w_id = s_w_id
	 and ol_i_id = s_i_id
	 and mod((s_w_id * s_i_id), 10000) = su_suppkey
	 and ol_w_id = o_w_id
	 and ol_d_id = o_d_id
	 and ol_o_id = o_id
	 and c_id = o_c_id
	 and c_w_id = o_w_id
	 and c_d_id = o_d_id
	 and su_nationkey = n1.n_nationkey
	 and ascii(substr(c_state,1,1)) = n2.n_nationkey
	 and (
		(n1.n_name = 'Germany' and n2.n_name = 'Cambodia')
	     or
		(n1.n_name = 'Cambodia' and n2.n_name = 'Germany')
	     )
	 and ol_delivery_d between '2007-01-02 00:00:00.000000' and '2012-01-02 00:00:00.000000'
group by su_nationkey, substr(c_state,1,1), extract(year from o_entry_d)
order by su_nationkey, cust_nation, l_year

Query 8

This query lists the market share of a given nation for customers from a certain region in which kinds of items are "produced".

select	 extract(year from o_entry_d) as l_year,
	 sum(case when n2.n_name = 'Germany' then ol_amount else 0 end) / sum(ol_amount) as mkt_share
from	 item, supplier, stock, orderline, orders, customer, nation n1, nation n2, region
where	 i_id = s_i_id
	 and ol_i_id = s_i_id
	 and ol_supply_w_id = s_w_id
	 and mod((s_w_id * s_i_id),10000) = su_suppkey
	 and ol_w_id = o_w_id
	 and ol_d_id = o_d_id
	 and ol_o_id = o_id
	 and c_id = o_c_id
	 and c_w_id = o_w_id
	 and c_d_id = o_d_id
	 and n1.n_nationkey = ascii(substr(c_state,1,1))
	 and n1.n_regionkey = r_regionkey
	 and ol_i_id < 1000
	 and r_name = 'Europe'
	 and su_nationkey = n2.n_nationkey
	 and o_entry_d between '2007-01-02 00:00:00.000000' and '2012-01-02 00:00:00.000000'
	 and i_data like '%b'
	 and i_id = ol_i_id
group by extract(year from o_entry_d)
order by l_year

Query 9

This query describes how much profit has been made on a selection of items for each nation and each year. The result list will be sorted by the name of the nation and the financial year.

select	 n_name, extract(year from o_entry_d) as l_year, sum(ol_amount) as sum_profit
from	 item, stock, supplier, orderline, orders, nation
where	 ol_i_id = s_i_id
	 and ol_supply_w_id = s_w_id
	 and mod((s_w_id * s_i_id), 10000) = su_suppkey
	 and ol_w_id = o_w_id
	 and ol_d_id = o_d_id
	 and ol_o_id = o_id
	 and ol_i_id = i_id
	 and su_nationkey = n_nationkey
	 and i_data like '%BB'
group by n_name, extract(year from o_entry_d)
order by n_name, l_year desc

Query 10

Query for analyzing the expenses of all customers listing their living country, some detail of them and the amount of money which they have used to take their orders since a specific date. The whole list is sorted by the amount of the customers’ orders.

select	 c_id, c_last, sum(ol_amount) as revenue, c_city, c_phone, n_name
from	 customer, orders, orderline, nation
where	 c_id = o_c_id
	 and c_w_id = o_w_id
	 and c_d_id = o_d_id
	 and ol_w_id = o_w_id
	 and ol_d_id = o_d_id
	 and ol_o_id = o_id
	 and o_entry_d >= '2007-01-02 00:00:00.000000'
	 and o_entry_d <= ol_delivery_d
	 and n_nationkey = ascii(substr(c_state,1,1))
group by c_id, c_last, c_city, c_phone, n_name
order by revenue desc

Query 11

Most important items (items which are often involved in orders and therefore often bought by customers) supplied by supplier of a given nation.

select	 s_i_id, sum(s_order_cnt) as ordercount
from	 stock, supplier, nation
where	 mod((s_w_id * s_i_id),10000) = su_suppkey
	 and su_nationkey = n_nationkey
	 and n_name = 'Germany'
group by s_i_id
having   sum(s_order_cnt) >
		(select sum(s_order_cnt) * .005
		from stock, supplier, nation
		where mod((s_w_id * s_i_id),10000) = su_suppkey
		and su_nationkey = n_nationkey
		and n_name = 'Germany')
order by ordercount desc

Query 12

This query counts the amount of orders grouped by the number of orderlines in each order attending the number of orders which are shipped with a higher or lower order priority.

select	 o_ol_cnt,
	 sum(case when o_carrier_id = 1 or o_carrier_id = 2 then 1 else 0 end) as high_line_count,
	 sum(case when o_carrier_id <> 1 and o_carrier_id <> 2 then 1 else 0 end) as low_line_count
from	 orders, orderline
where	 ol_w_id = o_w_id
	 and ol_d_id = o_d_id
	 and ol_o_id = o_id
	 and o_entry_d <= ol_delivery_d
	 and ol_delivery_d < '2020-01-01 00:00:00.000000'
group by o_ol_cnt
order by o_ol_cnt

Query 13

The query lists the number of customers grouped and sorted by the size of orders they made. The result set of the relation between customers and the size of their orders is sorted by the size of orders and counts how many customers have dealt the same way.

select	 c_count, count(*) as custdist
from	 (select c_id, count(o_id)
	 from customer left outer join orders on (
		c_w_id = o_w_id
		and c_d_id = o_d_id
		and c_id = o_c_id
		and o_carrier_id > 8)
	 group by c_id) as c_orders (c_id, c_count)
group by c_count
order by custdist desc, c_count desc

Query 14

The query result represents the percentage of the revenue in a period of time which has been realized from promotional campaigns.

select	100.00 * sum(case when i_data like 'PR%' then ol_amount else 0 end) / (1+sum(ol_amount)) as promo_revenue
from	orderline, item
where	ol_i_id = i_id and ol_delivery_d >= '2007-01-02 00:00:00.000000'
	and ol_delivery_d < '2020-01-02 00:00:00.000000'

Query 15

This query finds the top supplier or suppliers who contributed the most to the overall revenue for items shipped during a given period of time.

with	 revenue (supplier_no, total_revenue) as (
	 select	mod((s_w_id * s_i_id),10000) as supplier_no,
		sum(ol_amount) as total_revenue
	 from	orderline, stock
		where ol_i_id = s_i_id and ol_supply_w_id = s_w_id
		and ol_delivery_d >= '2007-01-02 00:00:00.000000'
	 group by mod((s_w_id * s_i_id),10000))
select	 su_suppkey, su_name, su_address, su_phone, total_revenue
from	 supplier, revenue
where	 su_suppkey = supplier_no
	 and total_revenue = (select max(total_revenue) from revenue)
order by su_suppkey

Query 16

This query finds out how many suppliers are able to supply items with given attributes sorted in descending order of them. The result is grouped by the identifier of the item.

select	 i_name,
	 substr(i_data, 1, 3) as brand,
	 i_price,
	 count(distinct (mod((s_w_id * s_i_id),10000))) as supplier_cnt
from	 stock, item
where	 i_id = s_i_id
	 and i_data not like 'zz%'
	 and (mod((s_w_id * s_i_id),10000) not in
		(select su_suppkey
		 from supplier
		 where su_comment like '%bad%'))
group by i_name, substr(i_data, 1, 3), i_price
order by supplier_cnt desc

Query 17

The query determines the yearly loss in revenue if orders just with a quantity of more than the average quantity of all orders in the system would be taken and shipped to customers.

select	sum(ol_amount) / 2.0 as avg_yearly
from	orderline, (select   i_id, avg(ol_quantity) as a
		    from     item, orderline
		    where    i_data like '%b'
			     and ol_i_id = i_id
		    group by i_id) t
where	ol_i_id = t.i_id
	and ol_quantity < t.a

Query 18

Query 18 is ranking all customers who have ordered for more than a specific amount of money.

select	 c_last, c_id o_id, o_entry_d, o_ol_cnt, sum(ol_amount)
from	 customer, orders, orderline
where	 c_id = o_c_id
	 and c_w_id = o_w_id
	 and c_d_id = o_d_id
	 and ol_w_id = o_w_id
	 and ol_d_id = o_d_id
	 and ol_o_id = o_id
group by o_id, o_w_id, o_d_id, c_id, c_last, o_entry_d, o_ol_cnt
having	 sum(ol_amount) > 200
order by sum(ol_amount) desc, o_entry_d

Query 19

The query is for reporting the revenue achieved by some specific attributes, as the price, the detailed information of the item and the quantity of the ordered amount of them.

select	sum(ol_amount) as revenue
from	orderline, item
where	(
	  ol_i_id = i_id
          and i_data like '%a'
          and ol_quantity >= 1
          and ol_quantity <= 10
          and i_price between 1 and 400000
          and ol_w_id in (1,2,3)
	) or (
	  ol_i_id = i_id
	  and i_data like '%b'
	  and ol_quantity >= 1
	  and ol_quantity <= 10
	  and i_price between 1 and 400000
	  and ol_w_id in (1,2,4)
	) or (
	  ol_i_id = i_id
	  and i_data like '%c'
	  and ol_quantity >= 1
	  and ol_quantity <= 10
	  and i_price between 1 and 400000
	  and ol_w_id in (1,5,3)
	)

Query 20

Suppliers in a particular nation having selected parts that may be candidates for a promotional offer if the quantity of these items is more than 50 percent of the total quantity which has been ordered since a certain date.

select	 su_name, su_address
from	 supplier, nation
where	 su_suppkey in
		(select  mod(s_i_id * s_w_id, 10000)
		from     stock, orderline
		where    s_i_id in
				(select i_id
				 from item
				 where i_data like 'co%')
			 and ol_i_id=s_i_id
			 and ol_delivery_d > '2010-05-23 12:00:00'
		group by s_i_id, s_w_id, s_quantity
		having   2*s_quantity > sum(ol_quantity))
	 and su_nationkey = n_nationkey
	 and n_name = 'Germany'
order by su_name

Query 21

Query 21 determines the suppliers which have shipped some required items of an order not in a timely manner for a given nation.

select	 su_name, count(*) as numwait
from	 supplier, orderline l1, orders, stock, nation
where	 ol_o_id = o_id
	 and ol_w_id = o_w_id
	 and ol_d_id = o_d_id
	 and ol_w_id = s_w_id
	 and ol_i_id = s_i_id
	 and mod((s_w_id * s_i_id),10000) = su_suppkey
	 and l1.ol_delivery_d > o_entry_d
	 and not exists (select *
			 from	orderline l2
			 where  l2.ol_o_id = l1.ol_o_id
				and l2.ol_w_id = l1.ol_w_id
				and l2.ol_d_id = l1.ol_d_id
				and l2.ol_delivery_d > l1.ol_delivery_d)
	 and su_nationkey = n_nationkey
	 and n_name = 'Germany'
group by su_name
order by numwait desc, su_name

Query 22

This query lists how many customers within a specific range of country codes have not bought anything for the whole period of time and who have a greater than average balance on their account. The county code is represented by the first two characters of the phone number.

select	 substr(c_state,1,1) as country,
	 count(*) as numcust,
	 sum(c_balance) as totacctbal
from	 customer
where	 substr(c_phone,1,1) in ('1','2','3','4','5','6','7')
	 and c_balance > (select avg(c_BALANCE)
			  from 	 customer
			  where  c_balance > 0.00
			 	 and substr(c_phone,1,1) in ('1','2','3','4','5','6','7'))
	 and not exists (select *
			 from	orders
			 where	o_c_id = c_id
			     	and o_w_id = c_w_id
			    	and o_d_id = c_d_id)
group by substr(c_state,1,1)
order by substr(c_state,1,1)