SlideShare a Scribd company logo
Hadoop-DS Benchmark Report 
for 
IBM System x3650 M4 
using 
IBM BigInsights Big SQL 3.0 
and 
Red Hat Enterprise Linux Server Release 6.4 
FIRST EDITION 
Published on 
Oct 24, 2014
Page | 2 
Published – Oct 24, 2014 
The information contained in this document is distributed on an AS IS basis without any warranty either expressed or implied. The use of this information or the implementation of any of these techniques is the customer‟s responsibility and depends on the customer‟s ability to evaluate and integrate them into the customer‟s operational environment. While each item has been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Customers attempting to adapt these techniques to their own environment do so at their own risk. 
Performance data contained in this document were determined in various controlled laboratory environments and are for reference purposes only. Customers should not adapt these performance numbers to their own environments and are for reference purposes only. Customers should not adapt these performance numbers to their own environments as system performance standards. The results that may be obtained in other operating environments may vary significantly. Users of this document should verify the applicable data for their specific environment. 
In this document, any references made to an IBM licensed program are not intended to state or imply that only IBM‟s licensed program may be used; any functionally equivalent program may be used. 
This publication was produced in the United States. IBM may not offer the products, services, or features discussed in this document in other countries, and the information is subject to change without notice. Consult your local IBM representative for information on products and services available in your area. 
© Copyright International Business Machines Corporation 2014. All rights reserved. 
Permission is hereby granted to reproduce this document in whole or in part, provided the copyright notice as printed above is set forth in full text on the title page of each item reproduced. 
U.S. Government Users - Documentation related to restricted rights: Use, duplication, or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp. 
Trademarks 
IBM, the IBM logo, System x and System Storage are trademarks or registered trademarks of International Business Machines Corporation. 
The following terms used in this publication are trademarks of other companies as follows: TPC Benchmark and TPC-DS are trademarks of Transaction Processing Performance Council; Intel and Xeon are trademarks or registered trademarks of Intel Corporation. Other company, product, or service names, which may be denoted by two asterisks (**), may be trademarks or service marks of others. 
Notes 
1 GHz and MHz only measures microprocessor internal clock speed, not application performance. Many factors affect application performance. 
2 When referring to hard disk capacity, GB, or gigabyte, means one thousand million bytes. Total user-accessible capacity may be less.
Page | 3 
Authors 
Simon Harris: Simon is the Big SQL performance lead working in the IBM BigInsights development team. He has 20 years of experience working in information management including MPP RDBMS, federated database technology, tooling and big data. Simon now specializes in SQL over Hadoop technologies. 
Abhayan Sundararajan: Abhayan is a Performance Analyst on IBM BigInsights with a focus on Big SQL. He has held a variety of roles within the IBM DB2 team, including functional verification test and a brief foray into development before joining the performance team to work on DB2 BLU. 
John Poelman: John joined the BigInsights performance team in 2011. While at IBM John has worked as a developer or a performance engineer on a variety of Database, Business Intelligence, and now Big Data software products. 
Matthew Emmerton: Matt Emmerton is a Senior Software Developer in Information Management at the IBM Toronto Software Lab. He has over 10 years of expertise in database performance analysis and scalability testing. He has participated in many successful world-record TPC and SAP benchmarks. His interests include exploring and exploiting key hardware and operating system technologies in DB2, and developing extensible test suites for standard benchmark workloads. 
Special thanks 
Special thanks to the following people for their contribution to the benchmark and content: 
Berni Schiefer – Distinguish Engineer, Information Management Performance and Benchmarks, DB2 LUW, Big Data, MDM, Optim Data Studio Performance Tools Adriana Zubiri – Program Director, Big Data Development 
Mike Ahern – BigInsights Performance 
Mi Shum – Senior Performance Manager, Big Data 
Cindy Saracco - Solution Architect, IM technologies - Big Data 
Avrilia Floratou – IBM Research 
Fatma Özcan – IBM Research 
Glen Sheffield – Big Data Competitive Analyst 
Gord Sissons – BigInsights Product Marketing Manager 
Stewart Tate – Senior Technical Staff Member, Information Management Performance Benchmarks and Solutions 
Jo A Ramos - Executive Solutions Architect - Big Data and Analytics
Page | 4 
Table of Contents 
Authors .............................................................................................................................................................................................. 3 
Introduction ...................................................................................................................................................................................... 5 
Motivation ......................................................................................................................................................................................... 6 
Benchmark Methodology ................................................................................................................................................................. 7 
Design and Implementation ............................................................................................................................................................. 9 
Results .............................................................................................................................................................................................. 11 
Benchmark Audit ............................................................................................................................................................................ 13 
Summary ......................................................................................................................................................................................... 14 
Appendix A: Cluster Topology and Hardware Configuration ................................................................................................... 15 
Appendix B: Create and Load Tables ........................................................................................................................................... 16 
Create Flat files ............................................................................................................................................................................ 16 
Create and Load Tables ............................................................................................................................................................... 16 
Collect Statistics .......................................................................................................................................................................... 34 
Appendix C: Tuning ....................................................................................................................................................................... 39 
Appendix D: Scaling and Database Population ........................................................................................................................... 44 
Appendix E: Queries ...................................................................................................................................................................... 45 
Query Template Modifications .................................................................................................................................................... 45 
Query Execution Order ................................................................................................................................................................ 45 
Appendix F: Attestation Letter.................................................................................................................................................... 108
Page | 5 
Introduction 
Performance benchmarks are an integral part of software and systems development as they can evaluate systems performance in an objective way. They have also become highly visible components of the exciting world of marketing SQL over Hadoop solutions. 
IBM has constructed and used the Hadoop Decision Support (Hadoop-DS) benchmark, which was modeled on the industry standard Transaction Processing Performance Council Benchmark DS (TPC-DS)1 and validated by a TPC certified auditor. While adapting the workload for the nature of a Hadoop system IBM worked to ensure the essential attributes of both typical customer requirements and the benchmark were preserved. TPC-DS was released in January 2012 and most recently revised (Revision 1.2.0) in September 20142. 
The Hadoop-DS Benchmark is a decision support benchmark. It consists of a suite of business-oriented ad hoc queries. The queries and the data populating the database have been chosen to have broad industry wide relevance while maintaining a sufficient degree of ease of implementation. This benchmark illustrates decision support systems that: 
 Examine large volumes of data; 
 Execute queries with a high degree of complexity; 
 Give answers to critical business questions. 
Benchmarks results are highly dependent upon workload, specific application requirements, and systems design and implementation. Relative system performance will vary as a result of these and other factors. Therefore, Hadoop-DS should not be used as a substitute for specific customer application benchmarking when critical capacity planning and/or product evaluation decisions are contemplated. 
1 TPC Benchmark and TPC-DS are trademarks of the Transaction Processing Performance Council (TPC). 
2 The latest revision of the TPC-DS specification can be found at http://www.tpc.org/tpcds/default.asp
Page | 6 
Motivation 
Good benchmarks reflect, in a practical way, an abstraction of the essential elements of real customer workloads. Consequently, the aim of this project was to create a benchmark for SQL over Hadoop products which reflect a scenario common to many organizations adopting the technology today. The most common scenario we see involves moving subsets of workloads from the traditional relational data warehouse to SQL over Hadoop solutions (a process commonly referred to as Warehouse Augmentation). For this reason our Hadoop-DS workload was modeled on the existing relational TPC-DS benchmark. 
The TPC-DS benchmark uses relational database management systems (RDBMSs) to model a decision support system that examines large volumes of data and gives answers to real-world business questions by executing queries of various complexity (such as ad-hoc, reporting, OLAP and data mining type queries). It is therefore an ideal fit to mimic the experience of an organization porting parts of their workload from a traditional warehouse housed on an RDBMS to a SQL over Hadoop technology. As highlighted in IBM‟s “Benchmarking SQL over Hadoop Systems: TPC or not TPC?”3 Research paper, SQL over Hadoop solutions are in the “wild west” of benchmarking. Many vendors use the data generators and queries of existing TPC benchmarks, but cherry pick the parts of the benchmark most likely to highlight their own strengths – thus making comparison between results impossible and meaningless. 
To reflect real-world situations, Hadoop-DS does not cherry pick the parts of the TPC-DS workload that highlight Big SQL‟s strengths. Instead, we included all parts of the TPC-DS workload that are appropriate for SQL over Hadoop solutions; those being data loading, single user performance and multi-user performance. Since TPC-DS is a benchmark designed for relational database engines, some aspects of the benchmark are not applicable to SQL over Hadoop solutions. Broadly speaking, those are the “Data Maintenance” and “Data Persistence” sections of the benchmark. Consequently these sections were omitted from our Hadoop-DS workload. 
The TPC-DS benchmark also defines restrictions related to real-life situations – such as preventing the vendor from changing the queries to include additional predicates based on a customized partitioning schema, employing query specific tuning mechanisms (such as optimizer hints), making configuration changes between the single and multi-user tests etc... We endeavored to stay within the bounds of these restrictions for the Hadoop-DS workload and conducted the comparison with candor and due diligence. To validate our candor, we retained the services of Infosizing4, an established and respected benchmark auditing firm with multiple TPC certified auditors, including one with TPC-DS certification, to review and audit the benchmarking results. 
It is important to note that this is not an official TPC-DS benchmark result since aspects of the standard benchmark that do not apply to SQL over Hadoop solutions were not implemented. However, the independent review of the environment and results by an official auditor shows IBM commitment to openness and fair play in this arena. All deviations from the TPC-DS standard benchmark are noted in the attached auditor‟s attestation letter in Appendix F. In addition, all the information required to reproduce the environment and the Hadoop-DS workload is published in the various Appendices of this document – thus allowing any vendor or third party the ability to independently execute the benchmark and verify the results. 
3 “Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?” http://researcher.ibm.com/researcher/files/us- aflorat/BenchmarkingSQL-on-Hadoop.pdf 
4 Infosizing: http://www.infosizing.com
Page | 7 
Benchmark Methodology 
In this section we provide a high level overview of the Hadoop- DS benchmark process. 
There are three key stages in the Hadoop-DS benchmark (which reflect similar stages in the TPC-DS benchmark). They are: 
1. Data load 
2. Query generation and validation 
3. Performance test 
The flow diagram in Figure 1 outlines these three key stages when conducting the Hadoop-DS benchmark: 
These stages are outlined below. For a detailed description of each phase refer to the “Design and Implementation” section of this document. 
1. Data Load 
The load phase of the benchmark includes all operations required to bring the System Under Test (SUT) to a state where the Performance Test phase can begin. This includes all hardware provisioning and configuration, storage setup, software installation (inc. the OS), verifying the cluster operation, generating the raw data and copying it to HDFS, cluster tuning and all steps required to create and populate the database in order to bring the system into a state ready to accept queries (the Performance Test phase). All desired tuning of the SUT must be completed before the end of the LOAD phase. 
Once the tables are created and loaded with the raw data, relationships between tables such as primary-foreign key relationships and corresponding referential integrity constraints can be defined. Finally, statistics are collected. These statistics help the Big SQL query optimizer generate efficient access plans during the performance test. The SUT is now ready for the Performance Test phase. 
2. Query Generation and Validation 
Before the Performance Test phase can be begin, the queries must be generated and validated by executing each query against a qualification database and comparing the result with a pre-defined answer set. 
There are 99 queries in the Hadoop-DS benchmark. Queries are automatically generated from query templates which contain substitution parameters. Specific parameter values depend on both the context the query is run (scale factor, single or multi- stream), and the seed for the random number generator. The seed used was the end time of the timed LOAD phase of the benchmark. 
Figure 1: High Level Procedure 
Figure 2: Load Phase
Page | 8 
3. Performance Test 
This is the timed phase of the Hadoop-DS benchmark. This phase consists of a single-stream performance test followed immediately by a multi-stream performance test. 
In the single-stream performance test, a single query stream is executed against the database, and the total elapsed time for all 99 queries is measured. 
In the multi-stream performance test, multiple query streams are executed against the database, and the total elapsed time from the start of the first query to the completion of the last query is measured. 
The multi-stream performance test is started immediately following the completion of the single-stream test. There can be no modifications to the system under test, or components restarted between these performance tests. 
The following steps are used to implement the performance test: 
Single-Stream Test 
1. Stream 0 Execution 
Multi-Stream (all steps conducted in parallel) 
1. Stream 1 Execution 
2. Stream 2 Execution 
3. Stream 3 Execution 
4. Stream 4 Execution 
Hadoop-DS uses the “Hadoop-DS Qph” metric to report query performance. The Hadoop-DS Qph metric is the effective query throughput, measured as the number of queries executed over a period of time. A primary factor in the Hadoop-DS metric is the scale factor (SF) -- size of data set -- which is used to scale the actual performance numbers. This means that results have a metric scaled to the database size which helps guard against the fact that cluster hardware doesn't always scale linearly and helps differentiate large clusters from small clusters (since performance is typically a factor of cluster size). 
A Hadoop-DS Qph metric is calculated for each of the single and multi-user runs using the following formula: 
Hadoop-DS Qph @ SF = ( (SF/100) * Q * S ) / T 
Where: 
• SF is the scale factor used in GB (30,000 in our benchmark). SF is divided by 100 in order to normalize the results using 100GB as the baseline. 
• Q is the total number of queries successfully executed 
• S is the number of streams (1 for the single user run) 
• T is the duration of the run measured in hours (with a resolution up to one second) 
Hadoop-DS Qph metrics are reported at a specific scale factor. For example „Hadoop-DS Qph@30TB‟ represents the effective throughput of the SQL over Hadoop solution against a 30TB database.
Page | 9 
Design and Implementation 
This section provides a more detailed description of the configuration used in this implementation of the benchmark. 
Hardware 
The benchmark was conducted on a 17 node cluster with each node being an IBM x3650BD server. A complete specification of the hardware used can be found in Appendix A: Cluster Topology and Hardware Configuration. 
Physical Database Design 
In-line with Big SQL best practices, a single ext4 filesystems was created on each disk used to store data on all nodes in the cluster (including the master). Once mounted, 3 directories were created on each filesystem for the HDFS data directory, the Map-Red cache and the Big SQL Data directory. This configuration simplifies disk layout and evenly spreads the io for all components across all available disks – and consequently provides good performance. For detailed information, refer to the “Installation options”and “OS storage” sections of Appendix C. 
The default HDFS replication factor of 3 was used to replicate HDFS blocks between nodes in the cluster. No other replication (at the filesystem or database level) was used. 
Parquet was chosen as the storage format for the Big SQL tables. Parquet is the optimal format for Big SQL 3.0 – both in terms of performance and disk space consumption. The Paquet storage format has Snappy compression enabled by default in Big SQL. 
Logical Database Design 
Big SQL‟s Parquet storage format does not support DATE or TIME data types, so VARCHAR(10) and VARCHAR(16) were used respectively. As a result, a small number of queries required these columns to be CAST to the appropriate DATE or TIME types in order for date arithmetic to be performed upon them. 
Other than the natural scatter partitioning providing by HDFS, no other explicit horizontal or vertical partitioning was implemented. 
For detailed information on the DDL used to create the tables, see Appendix B. 
Data Generation 
Although data generation is not a timed operation, generation and copy of the raw data to HDFS was parallelized across the data nodes in the cluster to improve efficiency. After generation, a directory exists on HDFS for each table in the schema. This directory is used as the source for the LOAD command. 
Database Population 
Tables were populated using the data generated and stored on HDFS during the data generation phase. The tables were loaded sequentially, one after the other. Tables were populated using the Big SQL LOAD command which uses Map-Reduce jobs to read the source data and populate the target file using the specified storage format (parquet in our case). The number of Map-Reduce tasks used to load each of the large fact tables where individually tuned via the num.map.tasks property in order to improve load performance. 
Details of the LOAD command used can be found in Appendix B. 
Referential Integrity / Informational Constraints 
In a traditional data warehouse, referential integrity (“RI”) constraints are often employed to ensure that relationship between tables are maintained over time, as additional data is loaded and/or existing data is refreshed.
Page | 10 
While today‟s “big data” products support RI constraints, they do not have the maturity required to enforce these constraints, as ingest and lookup capabilities are optimized for large parallel scans, not singleton lookups. As a result, care should be taken to enforce RI at the source, or during the ETL process. A good example is when moving data from an existing data warehouse to a big data platform – if the RI constraints existed in the source RDBMs then it is safe to create equivalent informational constraints on the big data platform as the constraints were already enforced by the RDBMs. 
The presence of these constraints in the schema provides valuable information to the query compiler / optimizer, and thus these RI constraints are still created, but are unenforced. Unenforced RI constraints are often called informational constraints (“IC”). 
Big SQL supports the use of Informational Constraints, and ICs were created for every PK and FK relationship in the TPC-DS schema. Appendix B provides full details on all informational constraints created. 
Statistics 
As Big SQL uses a cost-based query optimizer, the presence of accurate statistics is essential. Statistics can be gathered in various forms, ranging from simple cardinality statistics on tables, to distribution statistics for single columns and groups of columns within a table, to “statistical views” that provide cardinality and distribution statistics across join products. 
For this test, a combination of all of these methods was employed. 
a. cardinality statistics collected on all tables 
b. distribution statistics collected on all columns 
c. group distribution statistics collected for composite PKs in all 7 fact tables 
d. statistical views created on a combination of join predicates 
See Appendix B for full details of the statistics collected for the Hadoop-DS benchmark. 
Query Generation and Validation 
Since there are many variations of SQL dialects, the specification allows the sponsor to make pre-defined minor modifications to the queries so they can be successfully compiled and executed. In Big SQL, 87 of the 99 queries worked directly from generated query templates. The other 12 queries required only simple minor query modifications (mainly type casts) and took less than one hour to complete. Chart 1 shows the query breakdown. 
Full query text for all 99 queries used during the single-stream run can be found in Appendix E. 
Performance Test 
Once the data is loaded, statistics gathered and queries generated the performance phase of the benchmark can commence. During the performance phase, a single stream run is executed, followed immediately by a multi-stream run. For the multi-stream run, four query streams were used. 
Chart 1: Big SQL 3.0 query breakdown at 30TB
Page | 11 
Results 
Figures 3 and 4 summarize the results for executing the Hadoop-DS benchmark against Big SQL using a scale factor of 30TB. 
Figure 3: Big SQL Results for Hadoop-DS @ 30TB 
IBM System x3650BD with 
IBM BigInsights Big SQL v3.0 
Hadoop-DS (*) 
October 24, 2014 
Single-Stream Performance 
Multi-Stream Performance 
1,023 Hadoop-DS Qph @ 30TB 
2,274 Hadoop-DS Qph @ 30TB 
Database Size Query Engine Operating System 
30 TB IBM BigInsights Big SQL v3.0 
Red Hat Enterprise Linux Server 
Release 6.4 
System Components (per cluster node) 
Processors/Cores/Threads 
Memory 
Disk Controllers 
Disk Drives 
Network 
Quantity 
2/20/40 
8 
1 
10 
1 
1 
Description 
Intel Xeon E5-2680 v2, 2.80GHz, 25MB L3 Cache 
16GB ECC DDR3 1866MHz LRDIMM 
IBM ServeRAID-M5210 SAS/SATA Controller 
2TB SATA 3.5” HDD (HDFS) 
128GB SATA 2.5” SSD (Swap) 
Onboard dual-port GigE Adapter 
This implementation of the Hadoop-DS benchmark audited by Francois Raab of Infosizing (www.sizing.com) 
(*) The Hadoop-DS benchmark is derived from TPC Benchmark DS (TPC-DS) and is not comparable to published TPC-DS results. 
TPC Benchmark is a trademark of the Transaction Processing Performance Council. 
Master Node 
16 Data Nodes 
Network 
(10GigE)
Page | 12 
IBM System x3650BD with 
IBM BigInsights Big SQL v3.0 
Hadoop-DS (*) 
November 14, 2014 
Start and End Times 
Test 
Start Date 
Start Time 
End Date 
End Time 
Elapsed Time 
Single Stream 
10/15/14 
16:01:45 
10/16/14 
21:02:30 
29:00:45 
Multi-Stream 
10/16/14 
21:41:50 
10/19/14 
01:55:03 
52:13:13 
Number of Query Streams for Multi-Stream Test 
4 
Figure 4: Big SQL Elapsed Times for Hadoop-DS@30TB 
Of particular note is the fact that 4 concurrent query streams (and therefore 4 times as many queries) take only 1.8x longer than a single query stream. Chart 2 highlights Big SQL‟s impressive multi-user scalability. 
Chart 2: Big SQL Multi-user Scalability using 4 Query Streams @30TB
Page | 13 
Benchmark Audit 
Auditor 
This implementation of the IBM Hadoop-DS Benchmark was audited by Francois Raab of Infosizing. 
Further information regarding the audit process may be obtained from: 
InfoSizing 
531 Crystal Hills Boulevard 
Crystal Springs, CO 80829 
Telephone: (719) 473-7555 
Attestation Letter 
The auditor‟s attestation letter can be found in Appendix F.
Page | 14 
Summary 
The Hadoop-DS benchmark takes a Data Warehousing workload from an RDBMs and ports it to IBM‟s SQL over Hadoop solution – namely Big SQL 3.0. Porting workloads from existing warehouses to SQL over Hadoop is a common scenario for organizations seeking to reduce the cost of their existing data warehousing platforms. For this reason the Hadoop-DS workload was modeled on the Transaction Processing Performance Council Benchmark DS (TPC-DS). The services of a TPC approved auditor were secured to review the benchmark process and results. 
The results of this benchmark demonstrate the ease with which existing data warehouses can be augmented with IBM Big SQL 3.0. It highlights how Big SQL is able to implement rich SQL with outstanding performance, on a large data set with multiple concurrent users. 
These findings will be compelling to organizations augmenting data warehouse environments with Hadoop-based technologies. Strict SQL compliance can translate into significant cost savings by allowing customers to leverage existing investments in databases, applications and skills and take advantage of SQL over Hadoop with minimal disruption to existing environments. Enterprise customers cannot afford to have different dialects of SQL across different data management platforms. This benchmark shows that IBM‟s Big SQL demonstrates a high degree of SQL language compatibility with existing RDBMs workloads. 
Not only is IBM Big SQL compatible with existing RDBMs, it also demonstrates very good performance and scalability for a SQL over Hadoop solution. This means that customers can realize business results faster, ask more complex questions, and realize great efficiencies per unit investment in infrastructure. All of these factors help provide a competitive advantage. 
The performance and SQL language richness demonstrated through-out this paper demonstrates that IBM Big SQL is the industry leading SQL over Hadoop solution available today.
Page | 15 
Appendix A: Cluster Topology and Hardware Configuration 
Figure 5: IBM System x3650BD M4 
The measured configuration was a cluster of 17 identical IBM XSeries x3650BD M4 servers with 1 master node and 16 data nodes. Each contained: 
 CPU: E5-2680@2.8GHz v2 2 sockets, 10 cores each, hyper threading enabled. Total of 40 Logical CPUs 
 Memory: 128 GB RAM at 1866 MHz 
 Storage: 10 x 2TB 3.5” Serial SATA, 7200RPM. One disk for OS, 9 for data 
 Storage:4 x 128GB SSD. A single SSD was used for OS Swap. Other SSDs were not used. 
 Network: Dual port 10 Gb Ethernet 
 OS: Red Hat Enterprise Linux Server release 6.4 (Santiago)
Page | 16 
Appendix B: Create and Load Tables 
Create Flat files 
Scripts: 
001.gen-data-v3-tpcds.sh 
002.gen-data-v3-tpcds-forParquet.sh 
These scripts are essentially wrappers for dsdgen to generate the TPC-DS flat files. They provide support for parallel data generation, as well as the generation of the data directly in HDFS (through the use of named pipes) rather than first staging the flat files on a local disk. 
Create and Load Tables 
040.create-tables-parquet.jsq: 
The following DDL was used to create the tables in Big SQL: 
set schema $schema; 
create hadoop table call_center 
( 
cc_call_center_sk bigint not null, 
cc_call_center_id varchar(16) not null, 
cc_rec_start_date varchar(10) , 
cc_rec_end_date varchar(10) , 
cc_closed_date_sk bigint , 
cc_open_date_sk bigint , 
cc_name varchar(50) , 
cc_class varchar(50) , 
cc_employees bigint , 
cc_sq_ft bigint , 
cc_hours varchar(20) , 
cc_manager varchar(40) , 
cc_mkt_id bigint , 
cc_mkt_class varchar(50) , 
cc_mkt_desc varchar(100) , 
cc_market_manager varchar(40) , 
cc_division bigint , 
cc_division_name varchar(50) , 
cc_company bigint , 
cc_company_name varchar(50) , 
cc_street_number varchar(10) , 
cc_street_name varchar(60) , 
cc_street_type varchar(15) , 
cc_suite_number varchar(10) , 
cc_city varchar(60) , 
cc_county varchar(30) , 
cc_state varchar(2) , 
cc_zip varchar(10) , 
cc_country varchar(20) , 
cc_gmt_offset double , 
cc_tax_percentage double 
) 
STORED AS PARQUETFILE; 
create hadoop table catalog_page 
( 
cp_catalog_page_sk bigint not null, 
cp_catalog_page_id varchar(16) not null, 
cp_start_date_sk bigint , 
cp_end_date_sk bigint , 
cp_department varchar(50) , 
cp_catalog_number bigint , 
cp_catalog_page_number bigint , 
cp_description varchar(100) , 
cp_type varchar(100) 
)
Page | 17 
STORED AS PARQUETFILE; 
create hadoop table catalog_returns 
( 
cr_returned_date_sk bigint , 
cr_returned_time_sk bigint , 
cr_item_sk bigint not null, 
cr_refunded_customer_sk bigint , 
cr_refunded_cdemo_sk bigint , 
cr_refunded_hdemo_sk bigint , 
cr_refunded_addr_sk bigint , 
cr_returning_customer_sk bigint , 
cr_returning_cdemo_sk bigint , 
cr_returning_hdemo_sk bigint , 
cr_returning_addr_sk bigint , 
cr_call_center_sk bigint , 
cr_catalog_page_sk bigint , 
cr_ship_mode_sk bigint , 
cr_warehouse_sk bigint , 
cr_reason_sk bigint , 
cr_order_number bigint not null, 
cr_return_quantity bigint , 
cr_return_amount double , 
cr_return_tax double , 
cr_return_amt_inc_tax double , 
cr_fee double , 
cr_return_ship_cost double , 
cr_refunded_cash double , 
cr_reversed_charge double , 
cr_store_credit double , 
cr_net_loss double 
) 
STORED AS PARQUETFILE; 
create hadoop table catalog_sales 
( 
cs_sold_date_sk bigint , 
cs_sold_time_sk bigint , 
cs_ship_date_sk bigint , 
cs_bill_customer_sk bigint , 
cs_bill_cdemo_sk bigint , 
cs_bill_hdemo_sk bigint , 
cs_bill_addr_sk bigint , 
cs_ship_customer_sk bigint , 
cs_ship_cdemo_sk bigint , 
cs_ship_hdemo_sk bigint , 
cs_ship_addr_sk bigint , 
cs_call_center_sk bigint , 
cs_catalog_page_sk bigint , 
cs_ship_mode_sk bigint , 
cs_warehouse_sk bigint , 
cs_item_sk bigint not null, 
cs_promo_sk bigint , 
cs_order_number bigint not null, 
cs_quantity bigint , 
cs_wholesale_cost double , 
cs_list_price double , 
cs_sales_price double , 
cs_ext_discount_amt double , 
cs_ext_sales_price double , 
cs_ext_wholesale_cost double , 
cs_ext_list_price double , 
cs_ext_tax double , 
cs_coupon_amt double , 
cs_ext_ship_cost double , 
cs_net_paid double , 
cs_net_paid_inc_tax double , 
cs_net_paid_inc_ship double , 
cs_net_paid_inc_ship_tax double , 
cs_net_profit double 
) 
STORED AS PARQUETFILE; 
create hadoop table customer 
( 
c_customer_sk bigint not null,
Page | 18 
c_customer_id varchar(16) not null, 
c_current_cdemo_sk bigint , 
c_current_hdemo_sk bigint , 
c_current_addr_sk bigint , 
c_first_shipto_date_sk bigint , 
c_first_sales_date_sk bigint , 
c_salutation varchar(10) , 
c_first_name varchar(20) , 
c_last_name varchar(30) , 
c_preferred_cust_flag varchar(1) , 
c_birth_day bigint , 
c_birth_month bigint , 
c_birth_year bigint , 
c_birth_country varchar(20) , 
c_login varchar(13) , 
c_email_address varchar(50) , 
c_last_review_date bigint 
) 
STORED AS PARQUETFILE; 
create hadoop table customer_address 
( 
ca_address_sk bigint not null, 
ca_address_id varchar(16) not null, 
ca_street_number varchar(10) , 
ca_street_name varchar(60) , 
ca_street_type varchar(15) , 
ca_suite_number varchar(10) , 
ca_city varchar(60) , 
ca_county varchar(30) , 
ca_state varchar(2) , 
ca_zip varchar(10) , 
ca_country varchar(20) , 
ca_gmt_offset double , 
ca_location_type varchar(20) 
) 
STORED AS PARQUETFILE; 
create hadoop table customer_demographics 
( 
cd_demo_sk bigint not null, 
cd_gender varchar(1) , 
cd_marital_status varchar(1) , 
cd_education_status varchar(20) , 
cd_purchase_estimate bigint , 
cd_credit_rating varchar(10) , 
cd_dep_count bigint , 
cd_dep_employed_count bigint , 
cd_dep_college_count bigint 
) 
STORED AS PARQUETFILE; 
create hadoop table date_dim 
( 
d_date_sk bigint not null, 
d_date_id varchar(16) not null, 
d_date varchar(10) , 
d_month_seq bigint , 
d_week_seq bigint , 
d_quarter_seq bigint , 
d_year bigint , 
d_dow bigint , 
d_moy bigint , 
d_dom bigint , 
d_qoy bigint , 
d_fy_year bigint , 
d_fy_quarter_seq bigint , 
d_fy_week_seq bigint , 
d_day_name varchar(9) , 
d_quarter_name varchar(6) , 
d_holiday varchar(1) , 
d_weekend varchar(1) , 
d_following_holiday varchar(1) , 
d_first_dom bigint , 
d_last_dom bigint , 
d_same_day_ly bigint ,
Page | 19 
d_same_day_lq bigint , 
d_current_day varchar(1) , 
d_current_week varchar(1) , 
d_current_month varchar(1) , 
d_current_quarter varchar(1) , 
d_current_year varchar(1) 
) 
STORED AS PARQUETFILE; 
create hadoop table household_demographics 
( 
hd_demo_sk bigint not null, 
hd_income_band_sk bigint , 
hd_buy_potential varchar(15) , 
hd_dep_count bigint , 
hd_vehicle_count bigint 
) 
STORED AS PARQUETFILE; 
create hadoop table income_band 
( 
ib_income_band_sk bigint not null, 
ib_lower_bound bigint , 
ib_upper_bound bigint 
) 
STORED AS PARQUETFILE; 
create hadoop table inventory 
( 
inv_date_sk bigint not null, 
inv_item_sk bigint not null, 
inv_warehouse_sk bigint not null, 
inv_quantity_on_hand bigint 
) 
STORED AS PARQUETFILE; 
create hadoop table item 
( 
i_item_sk bigint not null, 
i_item_id varchar(16) not null, 
i_rec_start_date varchar(10) , 
i_rec_end_date varchar(10) , 
i_item_desc varchar(200) , 
i_current_price double , 
i_wholesale_cost double , 
i_brand_id bigint , 
i_brand varchar(50) , 
i_class_id bigint , 
i_class varchar(50) , 
i_category_id bigint , 
i_category varchar(50) , 
i_manufact_id bigint , 
i_manufact varchar(50) , 
i_size varchar(20) , 
i_formulation varchar(20) , 
i_color varchar(20) , 
i_units varchar(10) , 
i_container varchar(10) , 
i_manager_id bigint , 
i_product_name varchar(50) 
) 
STORED AS PARQUETFILE; 
create hadoop table promotion 
( 
p_promo_sk bigint not null, 
p_promo_id varchar(16) not null, 
p_start_date_sk bigint , 
p_end_date_sk bigint , 
p_item_sk bigint , 
p_cost double , 
p_response_target bigint , 
p_promo_name varchar(50) , 
p_channel_dmail varchar(1) , 
p_channel_email varchar(1) , 
p_channel_catalog varchar(1) ,
Page | 20 
p_channel_tv varchar(1) , 
p_channel_radio varchar(1) , 
p_channel_press varchar(1) , 
p_channel_event varchar(1) , 
p_channel_demo varchar(1) , 
p_channel_details varchar(100) , 
p_purpose varchar(15) , 
p_discount_active varchar(1) 
) 
STORED AS PARQUETFILE; 
create hadoop table reason 
( 
r_reason_sk bigint not null, 
r_reason_id varchar(16) not null, 
r_reason_desc varchar(100) 
) 
STORED AS PARQUETFILE; 
create hadoop table ship_mode 
( 
sm_ship_mode_sk bigint not null, 
sm_ship_mode_id varchar(16) not null, 
sm_type varchar(30) , 
sm_code varchar(10) , 
sm_carrier varchar(20) , 
sm_contract varchar(20) 
) 
STORED AS PARQUETFILE; 
create hadoop table store 
( 
s_store_sk bigint not null, 
s_store_id varchar(16) not null, 
s_rec_start_date varchar(10) , 
s_rec_end_date varchar(10) , 
s_closed_date_sk bigint , 
s_store_name varchar(50) , 
s_number_employees bigint , 
s_floor_space bigint , 
s_hours varchar(20) , 
s_manager varchar(40) , 
s_market_id bigint , 
s_geography_class varchar(100) , 
s_market_desc varchar(100) , 
s_market_manager varchar(40) , 
s_division_id bigint , 
s_division_name varchar(50) , 
s_company_id bigint , 
s_company_name varchar(50) , 
s_street_number varchar(10) , 
s_street_name varchar(60) , 
s_street_type varchar(15) , 
s_suite_number varchar(10) , 
s_city varchar(60) , 
s_county varchar(30) , 
s_state varchar(2) , 
s_zip varchar(10) , 
s_country varchar(20) , 
s_gmt_offset double , 
s_tax_precentage double 
) 
STORED AS PARQUETFILE; 
create hadoop table store_returns 
( 
sr_returned_date_sk bigint , 
sr_return_time_sk bigint , 
sr_item_sk bigint not null, 
sr_customer_sk bigint , 
sr_cdemo_sk bigint , 
sr_hdemo_sk bigint , 
sr_addr_sk bigint , 
sr_store_sk bigint , 
sr_reason_sk bigint , 
sr_ticket_number bigint not null,
Page | 21 
sr_return_quantity bigint , 
sr_return_amt double , 
sr_return_tax double , 
sr_return_amt_inc_tax double , 
sr_fee double , 
sr_return_ship_cost double , 
sr_refunded_cash double , 
sr_reversed_charge double , 
sr_store_credit double , 
sr_net_loss double 
) 
STORED AS PARQUETFILE; 
create hadoop table store_sales 
( 
ss_sold_date_sk bigint , 
ss_sold_time_sk bigint , 
ss_item_sk bigint not null, 
ss_customer_sk bigint , 
ss_cdemo_sk bigint , 
ss_hdemo_sk bigint , 
ss_addr_sk bigint , 
ss_store_sk bigint , 
ss_promo_sk bigint , 
ss_ticket_number bigint not null, 
ss_quantity bigint , 
ss_wholesale_cost double , 
ss_list_price double , 
ss_sales_price double , 
ss_ext_discount_amt double , 
ss_ext_sales_price double , 
ss_ext_wholesale_cost double , 
ss_ext_list_price double , 
ss_ext_tax double , 
ss_coupon_amt double , 
ss_net_paid double , 
ss_net_paid_inc_tax double , 
ss_net_profit double 
) 
STORED AS PARQUETFILE; 
create hadoop table time_dim 
( 
t_time_sk bigint not null, 
t_time_id varchar(16) not null, 
t_time bigint , 
t_hour bigint , 
t_minute bigint , 
t_second bigint , 
t_am_pm varchar(2) , 
t_shift varchar(20) , 
t_sub_shift varchar(20) , 
t_meal_time varchar(20) 
) 
STORED AS PARQUETFILE; 
create hadoop table warehouse 
( 
w_warehouse_sk bigint not null, 
w_warehouse_id varchar(16) not null, 
w_warehouse_name varchar(20) , 
w_warehouse_sq_ft bigint , 
w_street_number varchar(10) , 
w_street_name varchar(60) , 
w_street_type varchar(15) , 
w_suite_number varchar(10) , 
w_city varchar(60) , 
w_county varchar(30) , 
w_state varchar(2) , 
w_zip varchar(10) , 
w_country varchar(20) , 
w_gmt_offset double 
) 
STORED AS PARQUETFILE; 
create hadoop table web_page
Page | 22 
( 
wp_web_page_sk bigint not null, 
wp_web_page_id varchar(16) not null, 
wp_rec_start_date varchar(10) , 
wp_rec_end_date varchar(10) , 
wp_creation_date_sk bigint , 
wp_access_date_sk bigint , 
wp_autogen_flag varchar(1) , 
wp_customer_sk bigint , 
wp_url varchar(100) , 
wp_type varchar(50) , 
wp_char_count bigint , 
wp_link_count bigint , 
wp_image_count bigint , 
wp_max_ad_count bigint 
) 
STORED AS PARQUETFILE; 
create hadoop table web_returns 
( 
wr_returned_date_sk bigint , 
wr_returned_time_sk bigint , 
wr_item_sk bigint not null, 
wr_refunded_customer_sk bigint , 
wr_refunded_cdemo_sk bigint , 
wr_refunded_hdemo_sk bigint , 
wr_refunded_addr_sk bigint , 
wr_returning_customer_sk bigint , 
wr_returning_cdemo_sk bigint , 
wr_returning_hdemo_sk bigint , 
wr_returning_addr_sk bigint , 
wr_web_page_sk bigint , 
wr_reason_sk bigint , 
wr_order_number bigint not null, 
wr_return_quantity bigint , 
wr_return_amt double , 
wr_return_tax double , 
wr_return_amt_inc_tax double , 
wr_fee double , 
wr_return_ship_cost double , 
wr_refunded_cash double , 
wr_reversed_charge double , 
wr_account_credit double , 
wr_net_loss double 
) 
STORED AS PARQUETFILE; 
create hadoop table web_sales 
( 
ws_sold_date_sk bigint , 
ws_sold_time_sk bigint , 
ws_ship_date_sk bigint , 
ws_item_sk bigint not null, 
ws_bill_customer_sk bigint , 
ws_bill_cdemo_sk bigint , 
ws_bill_hdemo_sk bigint , 
ws_bill_addr_sk bigint , 
ws_ship_customer_sk bigint , 
ws_ship_cdemo_sk bigint , 
ws_ship_hdemo_sk bigint , 
ws_ship_addr_sk bigint , 
ws_web_page_sk bigint , 
ws_web_site_sk bigint , 
ws_ship_mode_sk bigint , 
ws_warehouse_sk bigint , 
ws_promo_sk bigint , 
ws_order_number bigint not null, 
ws_quantity bigint , 
ws_wholesale_cost double , 
ws_list_price double , 
ws_sales_price double , 
ws_ext_discount_amt double , 
ws_ext_sales_price double , 
ws_ext_wholesale_cost double , 
ws_ext_list_price double , 
ws_ext_tax double ,
Page | 23 
ws_coupon_amt double , 
ws_ext_ship_cost double , 
ws_net_paid double , 
ws_net_paid_inc_tax double , 
ws_net_paid_inc_ship double , 
ws_net_paid_inc_ship_tax double , 
ws_net_profit double 
) 
STORED AS PARQUETFILE; 
create hadoop table web_site 
( 
web_site_sk bigint not null, 
web_site_id varchar(16) not null, 
web_rec_start_date varchar(10) , 
web_rec_end_date varchar(10) , 
web_name varchar(50) , 
web_open_date_sk bigint , 
web_close_date_sk bigint , 
web_class varchar(50) , 
web_manager varchar(40) , 
web_mkt_id bigint , 
web_mkt_class varchar(50) , 
web_mkt_desc varchar(100) , 
web_market_manager varchar(40) , 
web_company_id bigint , 
web_company_name varchar(50) , 
web_street_number varchar(10) , 
web_street_name varchar(60) , 
web_street_type varchar(15) , 
web_suite_number varchar(10) , 
web_city varchar(60) , 
web_county varchar(30) , 
web_state varchar(2) , 
web_zip varchar(10) , 
web_country varchar(20) , 
web_gmt_offset double , 
web_tax_percentage double 
) 
STORED AS PARQUETFILE; 
commit; 
045.load-tables.jsq: 
The following script was used to load the flatfiles into Big SQL in Parquet format: 
set schema $schema; 
load hadoop using file url '/HADOOPDS30000G_PARQ/call_center' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table call_center overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); 
load hadoop using file url '/HADOOPDS30000G_PARQ/catalog_page' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table catalog_page overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); 
load hadoop using file url '/HADOOPDS30000G_PARQ/catalog_returns' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table catalog_returns overwrite WITH LOAD PROPERTIES ('num.map.tasks'='425'); 
load hadoop using file url '/HADOOPDS30000G_PARQ/catalog_sales' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table catalog_sales overwrite WITH LOAD PROPERTIES ('num.map.tasks'='4250'); 
load hadoop using file url '/HADOOPDS30000G_PARQ/customer' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table customer overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); 
load hadoop using file url '/HADOOPDS30000G_PARQ/customer_address' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table customer_address overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1');
Page | 24 
load hadoop using file url '/HADOOPDS30000G_PARQ/customer_demographics' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table customer_demographics overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); 
load hadoop using file url '/HADOOPDS30000G_PARQ/date_dim' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table date_dim overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); 
load hadoop using file url '/HADOOPDS30000G_PARQ/household_demographics' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table household_demographics overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); 
load hadoop using file url '/HADOOPDS30000G_PARQ/income_band' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table income_band overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); 
load hadoop using file url '/HADOOPDS30000G_PARQ/inventory' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table inventory overwrite WITH LOAD PROPERTIES ('num.map.tasks'='160'); 
load hadoop using file url '/HADOOPDS30000G_PARQ/item' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table item overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); 
load hadoop using file url '/HADOOPDS30000G_PARQ/promotion' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table promotion overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); 
load hadoop using file url '/HADOOPDS30000G_PARQ/reason' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table reason overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); 
load hadoop using file url '/HADOOPDS30000G_PARQ/ship_mode' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table ship_mode overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); 
load hadoop using file url '/HADOOPDS30000G_PARQ/store' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table store overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); 
load hadoop using file url '/HADOOPDS30000G_PARQ/store_returns' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table store_returns overwrite WITH LOAD PROPERTIES ('num.map.tasks'='700'); 
load hadoop using file url '/HADOOPDS30000G_PARQ/store_sales' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table store_sales overwrite WITH LOAD PROPERTIES ('num.map.tasks'='5500'); 
load hadoop using file url '/HADOOPDS30000G_PARQ/time_dim' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table time_dim overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); 
load hadoop using file url '/HADOOPDS30000G_PARQ/warehouse/' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table warehouse overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); 
load hadoop using file url '/HADOOPDS30000G_PARQ/web_page' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table web_page overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); 
load hadoop using file url '/HADOOPDS30000G_PARQ/web_returns' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table web_returns overwrite WITH LOAD PROPERTIES ('num.map.tasks'='200'); 
load hadoop using file url '/HADOOPDS30000G_PARQ/web_sales' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table web_sales overwrite WITH LOAD PROPERTIES ('num.map.tasks'='2000'); 
load hadoop using file url '/HADOOPDS30000G_PARQ/web_site' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table web_site overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); 
046.load-files-individually.sh: 
Since the benchmark uses 1GB block sizes for the Parquet files, any table which is smaller than 16 GB but still significant in size (at least 1 GB) will not be spread across all data nodes in the cluster when using the load script above. For this reason, the flat files for customer, customer_address and inventory were generated in several pieces (1 file per data node). Each file was then loaded individually in order to spread the files and blocks across as many of the data nodes as possible. This allows Big SQL to fully parallelize a table scan against these tables across all data nodes . 
The following script was used to achieve this distribution for the 3 tables mentioned: 
FLATDIR=$1 
TABLE=$2
Page | 25 
FILE="046.load-files-individually-${TABLE}.jsq" 
i=0 
schema=HADOOPDS30000G_PARQ 
rm -rf ${FILE} 
echo "set schema $schema;" >> ${FILE} 
echo >> ${FILE} 
hadoop fs -ls ${FLATDIR} | grep -v Found | awk '{print $8}' |  
while read f 
do 
if [[ $i == 0 ]] ; then 
echo "load hadoop using file url '$f' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table ${TABLE} overwrite ;" >> ${FILE} 
i=1 
else 
echo "load hadoop using file url '$f' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table ${TABLE} append ;" >> ${FILE} 
fi 
done 
055.ri.jsq: 
Primary Key (PK) and Foreign Key (FK) constraints cannot be enforced by BigSQL but “not enforced” constraints can be used to give the optimizer some added information when it is considering access plans. 
The following informational constraints were used in the environment (all PK + FK relationships outlined in the TPC-DS specification): 
set schema $schema; 
------------------------------------------------------------ 
-- primary key definitions 
------------------------------------------------------------ 
alter table call_center 
add primary key (cc_call_center_sk) 
not enforced enable query optimization; 
commit work; 
alter table catalog_page 
add primary key (cp_catalog_page_sk) 
not enforced enable query optimization; 
commit work; 
alter table catalog_returns 
add primary key (cr_item_sk, cr_order_number) 
not enforced enable query optimization; 
commit work; 
alter table catalog_sales 
add primary key (cs_item_sk, cs_order_number) 
not enforced enable query optimization; 
commit work; 
alter table customer 
add primary key (c_customer_sk) 
not enforced enable query optimization; 
commit work; 
alter table customer_address 
add primary key (ca_address_sk) 
not enforced enable query optimization; 
commit work; 
alter table customer_demographics 
add primary key (cd_demo_sk) 
not enforced enable query optimization; 
commit work;
Page | 26 
alter table date_dim 
add primary key (d_date_sk) 
not enforced enable query optimization; 
commit work; 
alter table household_demographics 
add primary key (hd_demo_sk) 
not enforced enable query optimization; 
commit work; 
alter table income_band 
add primary key (ib_income_band_sk) 
not enforced enable query optimization; 
commit work; 
alter table inventory 
add primary key (inv_date_sk, inv_item_sk, inv_warehouse_sk) 
not enforced enable query optimization; 
commit work; 
alter table item 
add primary key (i_item_sk) 
not enforced enable query optimization; 
commit work; 
alter table promotion 
add primary key (p_promo_sk) 
not enforced enable query optimization; 
commit work; 
alter table reason 
add primary key (r_reason_sk) 
not enforced enable query optimization; 
commit work; 
alter table ship_mode 
add primary key (sm_ship_mode_sk) 
not enforced enable query optimization; 
commit work; 
alter table store 
add primary key (s_store_sk) 
not enforced enable query optimization; 
commit work; 
alter table store_returns 
add primary key (sr_item_sk, sr_ticket_number) 
not enforced enable query optimization; 
commit work; 
alter table store_sales 
add primary key (ss_item_sk, ss_ticket_number) 
not enforced enable query optimization; 
commit work; 
alter table time_dim 
add primary key (t_time_sk) 
not enforced enable query optimization; 
commit work; 
alter table warehouse 
add primary key (w_warehouse_sk) 
not enforced enable query optimization; 
commit work; 
alter table web_page 
add primary key (wp_web_page_sk) 
not enforced enable query optimization; 
commit work; 
alter table web_returns 
add primary key (wr_item_sk, wr_order_number) 
not enforced enable query optimization; 
commit work; 
alter table web_sales
Page | 27 
add primary key (ws_item_sk, ws_order_number) 
not enforced enable query optimization; 
commit work; 
alter table web_site 
add primary key (web_site_sk) 
not enforced enable query optimization; 
commit work; 
------------------------------------------------------------ 
-- foreign key definitions 
------------------------------------------------------------ 
-- tables with no FKs 
-- customer_address 
-- customer_demographics 
-- item 
-- date_dim 
-- warehouse 
-- ship_mode 
-- time_dim 
-- reason 
-- income_band 
alter table promotion 
add constraint fk1 foreign key (p_start_date_sk) 
references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table promotion 
add constraint fk2 foreign key (p_end_date_sk) 
references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table promotion 
add constraint fk3 foreign key (p_item_sk) 
references item (i_item_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table store 
add constraint fk foreign key (s_closed_date_sk) 
references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table call_center 
add constraint fk1 foreign key (cc_closed_date_sk) 
references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table call_center 
add constraint fk2 foreign key (cc_open_date_sk) 
references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table customer 
add constraint fk1 foreign key (c_current_cdemo_sk) 
references customer_demographics (cd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table customer 
add constraint fk2 foreign key (c_current_hdemo_sk) 
references household_demographics (hd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table customer 
add constraint fk3 foreign key (c_current_addr_sk) 
references customer_address (ca_address_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table customer 
add constraint fk4 foreign key (c_first_shipto_date_sk) 
references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table customer
Page | 28 
add constraint fk5 foreign key (c_first_sales_date_sk) 
references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_site 
add constraint fk1 foreign key (web_open_date_sk) 
references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_site 
add constraint fk2 foreign key (web_close_date_sk) 
references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_page 
add constraint fk1 foreign key (cp_start_date_sk) 
references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_page 
add constraint fk2 foreign key (cp_end_date_sk) 
references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table household_demographics 
add constraint fk foreign key (hd_income_band_sk) 
references income_band (ib_income_band_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_page 
add constraint fk1 foreign key (wp_creation_date_sk) 
references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_page 
add constraint fk2 foreign key (wp_access_date_sk) 
references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_page 
add constraint fk3 foreign key (wp_customer_sk) 
references customer (c_customer_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table store_sales 
add constraint fk1 foreign key (ss_sold_date_sk) 
references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table store_sales 
add constraint fk2 foreign key (ss_sold_time_sk) 
references time_dim (t_time_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table store_sales 
add constraint fk3a foreign key (ss_item_sk) 
references item (i_item_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table store_sales 
add constraint fk4 foreign key (ss_customer_sk) 
references customer (c_customer_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table store_sales 
add constraint fk5 foreign key (ss_cdemo_sk) 
references customer_demographics (cd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table store_sales 
add constraint fk6 foreign key (ss_hdemo_sk) 
references household_demographics (hd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table store_sales 
add constraint fk7 foreign key (ss_addr_sk)
Page | 29 
references customer_address (ca_address_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table store_sales 
add constraint fk8 foreign key (ss_store_sk) 
references store (s_store_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table store_sales 
add constraint fk9 foreign key (ss_promo_sk) 
references promotion (p_promo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table store_returns 
add constraint fk1 foreign key (sr_returned_date_sk) 
references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table store_returns 
add constraint fk2 foreign key (sr_return_time_sk) 
references time_dim (t_time_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table store_returns 
add constraint fk3a foreign key (sr_item_sk) 
references item (i_item_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table store_returns 
add constraint fk3b foreign key (sr_item_sk, sr_ticket_number) 
references store_sales (ss_item_sk, ss_ticket_number) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table store_returns 
add constraint fk4 foreign key (sr_customer_sk) 
references customer (c_customer_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table store_returns 
add constraint fk5 foreign key (sr_cdemo_sk) 
references customer_demographics (cd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table store_returns 
add constraint fk6 foreign key (sr_hdemo_sk) 
references household_demographics (hd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table store_returns 
add constraint fk7 foreign key (sr_addr_sk) 
references customer_address (ca_address_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table store_returns 
add constraint fk8 foreign key (sr_store_sk) 
references store (s_store_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table store_returns 
add constraint fk9 foreign key (sr_reason_sk) 
references reason (r_reason_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_sales 
add constraint fk1 foreign key (cs_sold_date_sk) 
references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_sales 
add constraint fk2 foreign key (cs_sold_time_sk) 
references time_dim (t_time_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_sales 
add constraint fk3 foreign key (cs_ship_date_sk) 
references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION;
Page | 30 
commit work; 
alter table catalog_sales 
add constraint fk4 foreign key (cs_bill_customer_sk) 
references customer (c_customer_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_sales 
add constraint fk5 foreign key (cs_bill_cdemo_sk) 
references customer_demographics (cd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_sales 
add constraint fk6 foreign key (cs_bill_hdemo_sk) 
references household_demographics (hd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_sales 
add constraint fk7 foreign key (cs_bill_addr_sk) 
references customer_address (ca_address_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_sales 
add constraint fk8 foreign key (cs_ship_customer_sk) 
references customer (c_customer_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_sales 
add constraint fk9 foreign key (cs_ship_cdemo_sk) 
references customer_demographics (cd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_sales 
add constraint fk10 foreign key (cs_ship_hdemo_sk) 
references household_demographics (hd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_sales 
add constraint fk11 foreign key (cs_ship_addr_sk) 
references customer_address (ca_address_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_sales 
add constraint fk12 foreign key (cs_call_center_sk) 
references call_center (cc_call_center_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_sales 
add constraint fk13 foreign key (cs_catalog_page_sk) 
references catalog_page (cp_catalog_page_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_sales 
add constraint fk14 foreign key (cs_ship_mode_sk) 
references ship_mode (sm_ship_mode_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_sales 
add constraint fk15 foreign key (cs_warehouse_sk) 
references warehouse (w_warehouse_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_sales 
add constraint fk16a foreign key (cs_item_sk) 
references item (i_item_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_sales 
add constraint fk17 foreign key (cs_promo_sk) 
references promotion (p_promo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_returns 
add constraint fk1 foreign key (cr_returned_date_sk) 
references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work;
Page | 31 
alter table catalog_returns 
add constraint fk2 foreign key (cr_returned_time_sk) 
references time_dim (t_time_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_returns 
add constraint fk3 foreign key (cr_item_sk, cr_order_number) 
references catalog_sales (cs_item_sk, cs_order_number) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_returns 
add constraint fk4 foreign key (cr_item_sk) 
references item (i_item_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_returns 
add constraint fk5 foreign key (cr_refunded_customer_sk) 
references customer (c_customer_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_returns 
add constraint fk6 foreign key (cr_refunded_cdemo_sk) 
references customer_demographics (cd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_returns 
add constraint fk7 foreign key (cr_refunded_hdemo_sk) 
references household_demographics (hd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_returns 
add constraint fk8 foreign key (cr_refunded_addr_sk) 
references customer_address (ca_address_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_returns 
add constraint fk9 foreign key (cr_returning_customer_sk) 
references customer (c_customer_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_returns 
add constraint fk10 foreign key (cr_returning_cdemo_sk) 
references customer_demographics (cd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_returns 
add constraint fk11 foreign key (cr_returning_hdemo_sk) 
references household_demographics (hd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_returns 
add constraint fk12 foreign key (cr_returning_addr_sk) 
references customer_address (ca_address_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_returns 
add constraint fk13 foreign key (cr_call_center_sk) 
references call_center (cc_call_center_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_returns 
add constraint fk14 foreign key (cr_catalog_page_sk) 
references catalog_page (cp_catalog_page_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_returns 
add constraint fk15 foreign key (cr_ship_mode_sk) 
references ship_mode (sm_ship_mode_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table catalog_returns 
add constraint fk16 foreign key (cr_warehouse_sk) 
references warehouse (w_warehouse_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work;
Page | 32 
alter table catalog_returns 
add constraint fk17 foreign key (cr_reason_sk) 
references reason (r_reason_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_sales 
add constraint fk1 foreign key (ws_sold_date_sk) 
references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_sales 
add constraint fk2 foreign key (ws_sold_time_sk) 
references time_dim (t_time_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_sales 
add constraint fk3 foreign key (ws_ship_date_sk) 
references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_sales 
add constraint fk4a foreign key (ws_item_sk) 
references item (i_item_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_sales 
add constraint fk5 foreign key (ws_bill_customer_sk) 
references customer (c_customer_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_sales 
add constraint fk6 foreign key (ws_bill_cdemo_sk) 
references customer_demographics (cd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_sales 
add constraint fk7 foreign key (ws_bill_hdemo_sk) 
references household_demographics (hd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_sales 
add constraint fk8 foreign key (ws_bill_addr_sk) 
references customer_address (ca_address_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_sales 
add constraint fk9 foreign key (ws_ship_customer_sk) 
references customer (c_customer_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_sales 
add constraint fk10 foreign key (ws_ship_cdemo_sk) 
references customer_demographics (cd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_sales 
add constraint fk11 foreign key (ws_ship_hdemo_sk) 
references household_demographics (hd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_sales 
add constraint fk12 foreign key (ws_ship_addr_sk) 
references customer_address (ca_address_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_sales 
add constraint fk13 foreign key (ws_web_page_sk) 
references web_page (wp_web_page_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_sales 
add constraint fk14 foreign key (ws_web_site_sk) 
references web_site (web_site_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_sales
Page | 33 
add constraint fk15 foreign key (ws_ship_mode_sk) 
references ship_mode (sm_ship_mode_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_sales 
add constraint fk16 foreign key (ws_warehouse_sk) 
references warehouse (w_warehouse_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_sales 
add constraint fk17 foreign key (ws_promo_sk) 
references promotion (p_promo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_returns 
add constraint fk1 foreign key (wr_returned_date_sk) 
references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_returns 
add constraint fk2 foreign key (wr_returned_time_sk) 
references time_dim (t_time_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_returns 
add constraint fk3a foreign key (wr_item_sk) 
references item (i_item_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_returns 
add constraint fk3b foreign key (wr_item_sk, wr_order_number) 
references web_sales (ws_item_sk, ws_order_number) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_returns 
add constraint fk4 foreign key (wr_refunded_customer_sk) 
references customer (c_customer_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_returns 
add constraint fk5 foreign key (wr_refunded_cdemo_sk) 
references customer_demographics (cd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_returns 
add constraint fk6 foreign key (wr_refunded_hdemo_sk) 
references household_demographics (hd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_returns 
add constraint fk7 foreign key (wr_refunded_addr_sk) 
references customer_address (ca_address_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_returns 
add constraint fk8 foreign key (wr_returning_customer_sk) 
references customer (c_customer_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_returns 
add constraint fk9 foreign key (wr_returning_cdemo_sk) 
references customer_demographics (cd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_returns 
add constraint fk10 foreign key (wr_returning_hdemo_sk) 
references household_demographics (hd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_returns 
add constraint fk11 foreign key (wr_returning_addr_sk) 
references customer_address (ca_address_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_returns 
add constraint fk12 foreign key (wr_web_page_sk)
Page | 34 
references web_page (wp_web_page_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table web_returns 
add constraint fk13 foreign key (wr_reason_sk) 
references reason (r_reason_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table inventory 
add constraint fk1 foreign key (inv_date_sk) 
references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table inventory 
add constraint fk2 foreign key (inv_item_sk) 
references item (i_item_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
alter table inventory 
add constraint fk3 foreign key (inv_warehouse_sk) 
references warehouse (w_warehouse_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; 
commit work; 
Collect Statistics 
060.analyze-withCGS.jsq: 
The following script was used to collect statistics for the database. 
Distribution stats were collected for every column in the database and group distribution stats were collected for the composite primary keys in the 7 fact tables. 
set schema $schema; 
ANALYZE TABLE call_center COMPUTE STATISTICS FOR COLUMNS cc_call_center_sk, cc_call_center_id, cc_rec_start_date, cc_rec_end_date, cc_closed_date_sk, cc_open_date_sk, cc_name, cc_class, cc_employees, cc_sq_ft, cc_hours, cc_manager, cc_mkt_id, cc_mkt_class, cc_mkt_desc, cc_market_manager, cc_division, cc_division_name, cc_company, cc_company_name, cc_street_number, cc_street_name, cc_street_type, cc_suite_number, cc_city, cc_county, cc_state, cc_zip, cc_country, cc_gmt_offset, cc_tax_percentage; 
ANALYZE TABLE catalog_page COMPUTE STATISTICS FOR COLUMNS cp_catalog_page_sk, cp_catalog_page_id, cp_start_date_sk, cp_end_date_sk, cp_department, cp_catalog_number, cp_catalog_page_number, cp_description, cp_type; 
ANALYZE TABLE catalog_returns COMPUTE STATISTICS FOR COLUMNS cr_returned_date_sk, cr_returned_time_sk, cr_item_sk, cr_refunded_customer_sk, cr_refunded_cdemo_sk, cr_refunded_hdemo_sk, cr_refunded_addr_sk, cr_returning_customer_sk, cr_returning_cdemo_sk, cr_returning_hdemo_sk, cr_returning_addr_sk, cr_call_center_sk, cr_catalog_page_sk, cr_ship_mode_sk, cr_warehouse_sk, cr_reason_sk, cr_order_number, cr_return_quantity, cr_return_amount, cr_return_tax, cr_return_amt_inc_tax, cr_fee, cr_return_ship_cost, cr_refunded_cash, cr_reversed_charge, cr_store_credit, cr_net_loss, (cr_item_sk, cr_order_number); 
ANALYZE TABLE catalog_sales COMPUTE STATISTICS FOR COLUMNS cs_sold_date_sk, cs_sold_time_sk, cs_ship_date_sk, cs_bill_customer_sk, cs_bill_cdemo_sk, cs_bill_hdemo_sk, cs_bill_addr_sk, cs_ship_customer_sk, cs_ship_cdemo_sk, cs_ship_hdemo_sk, cs_ship_addr_sk, cs_call_center_sk, cs_catalog_page_sk, cs_ship_mode_sk, cs_warehouse_sk, cs_item_sk, cs_promo_sk, cs_order_number, cs_quantity, cs_wholesale_cost, cs_list_price, cs_sales_price, cs_ext_discount_amt, cs_ext_sales_price, cs_ext_wholesale_cost, cs_ext_list_price, cs_ext_tax, cs_coupon_amt, cs_ext_ship_cost, cs_net_paid, cs_net_paid_inc_tax, cs_net_paid_inc_ship, cs_net_paid_inc_ship_tax, cs_net_profit, (cs_item_sk, cs_order_number); 
ANALYZE TABLE customer COMPUTE STATISTICS FOR COLUMNS c_customer_sk, c_customer_id, c_current_cdemo_sk, c_current_hdemo_sk, c_current_addr_sk, c_first_shipto_date_sk, c_first_sales_date_sk, c_salutation, c_first_name, c_last_name, c_preferred_cust_flag, c_birth_day, c_birth_month, c_birth_year, c_birth_country, c_login, c_email_address, c_last_review_date; 
ANALYZE TABLE customer_address COMPUTE STATISTICS FOR COLUMNS ca_address_sk, ca_address_id, ca_street_number, ca_street_name, ca_street_type, ca_suite_number, ca_city, ca_county, ca_state, ca_zip, ca_country, ca_gmt_offset, ca_location_type; 
ANALYZE TABLE customer_demographics COMPUTE STATISTICS FOR COLUMNS cd_demo_sk, cd_gender, cd_marital_status, cd_education_status, cd_purchase_estimate, cd_credit_rating, cd_dep_count, cd_dep_employed_count, cd_dep_college_count;
Page | 35 
ANALYZE TABLE date_dim COMPUTE STATISTICS FOR COLUMNS d_date_sk, d_date_id, d_date, d_month_seq, d_week_seq, d_quarter_seq, d_year, d_dow, d_moy, d_dom, d_qoy, d_fy_year, d_fy_quarter_seq, d_fy_week_seq, d_day_name, d_quarter_name, d_holiday, d_weekend, d_following_holiday, d_first_dom, d_last_dom, d_same_day_ly, d_same_day_lq, d_current_day, d_current_week, d_current_month, d_current_quarter, d_current_year; 
ANALYZE TABLE household_demographics COMPUTE STATISTICS FOR COLUMNS hd_demo_sk, hd_income_band_sk, hd_buy_potential, hd_dep_count, hd_vehicle_count; 
ANALYZE TABLE income_band COMPUTE STATISTICS FOR COLUMNS ib_income_band_sk, ib_lower_bound, ib_upper_bound; 
ANALYZE TABLE inventory COMPUTE STATISTICS FOR COLUMNS inv_date_sk, inv_item_sk, inv_warehouse_sk, inv_quantity_on_hand, (inv_date_sk, inv_item_sk, inv_warehouse_sk); 
ANALYZE TABLE item COMPUTE STATISTICS FOR COLUMNS i_item_sk, i_item_id, i_rec_start_date, i_rec_end_date, i_item_desc, i_current_price, i_wholesale_cost, i_brand_id, i_brand, i_class_id, i_class, i_category_id, i_category, i_manufact_id, i_manufact, i_size, i_formulation, i_color, i_units, i_container, i_manager_id, i_product_name; 
ANALYZE TABLE promotion COMPUTE STATISTICS FOR COLUMNS p_promo_sk, p_promo_id, p_start_date_sk, p_end_date_sk, p_item_sk, p_cost, p_response_target, p_promo_name, p_channel_dmail, p_channel_email, p_channel_catalog, p_channel_tv, p_channel_radio, p_channel_press, p_channel_event, p_channel_demo, p_channel_details, p_purpose, p_discount_active; 
ANALYZE TABLE reason COMPUTE STATISTICS FOR COLUMNS r_reason_sk, r_reason_id, r_reason_desc; 
ANALYZE TABLE ship_mode COMPUTE STATISTICS FOR COLUMNS sm_ship_mode_sk, sm_ship_mode_id, sm_type, sm_code, sm_carrier, sm_contract; 
ANALYZE TABLE store COMPUTE STATISTICS FOR COLUMNS s_store_sk, s_store_id, s_rec_start_date, s_rec_end_date, s_closed_date_sk, s_store_name, s_number_employees, s_floor_space, s_hours, s_manager, s_market_id, s_geography_class, s_market_desc, s_market_manager, s_division_id, s_division_name, s_company_id, s_company_name, s_street_number, s_street_name, s_street_type, s_suite_number, s_city, s_county, s_state, s_zip, s_country, s_gmt_offset, s_tax_precentage; 
ANALYZE TABLE store_returns COMPUTE STATISTICS FOR COLUMNS sr_returned_date_sk, sr_return_time_sk, sr_item_sk, sr_customer_sk, sr_cdemo_sk, sr_hdemo_sk, sr_addr_sk, sr_store_sk, sr_reason_sk, sr_ticket_number, sr_return_quantity, sr_return_amt, sr_return_tax, sr_return_amt_inc_tax, sr_fee, sr_return_ship_cost, sr_refunded_cash, sr_reversed_charge, sr_store_credit, sr_net_loss, (sr_item_sk, sr_ticket_number); 
ANALYZE TABLE store_sales COMPUTE STATISTICS FOR COLUMNS ss_sold_date_sk, ss_sold_time_sk, ss_item_sk, ss_customer_sk, ss_cdemo_sk, ss_hdemo_sk, ss_addr_sk, ss_store_sk, ss_promo_sk, ss_ticket_number, ss_quantity, ss_wholesale_cost, ss_list_price, ss_sales_price, ss_ext_discount_amt, ss_ext_sales_price, ss_ext_wholesale_cost, ss_ext_list_price, ss_ext_tax, ss_coupon_amt, ss_net_paid, ss_net_paid_inc_tax, ss_net_profit, (ss_item_sk, ss_ticket_number); 
ANALYZE TABLE time_dim COMPUTE STATISTICS FOR COLUMNS t_time_sk, t_time_id, t_time, t_hour, t_minute, t_second, t_am_pm, t_shift, t_sub_shift, t_meal_time; 
ANALYZE TABLE warehouse COMPUTE STATISTICS FOR COLUMNS w_warehouse_sk, w_warehouse_id, w_warehouse_name, w_warehouse_sq_ft, w_street_number, w_street_name, w_street_type, w_suite_number, w_city, w_county, w_state, w_zip, w_country, w_gmt_offset; 
ANALYZE TABLE web_page COMPUTE STATISTICS FOR COLUMNS wp_web_page_sk, wp_web_page_id, wp_rec_start_date, wp_rec_end_date, wp_creation_date_sk, wp_access_date_sk, wp_autogen_flag, wp_customer_sk, wp_url, wp_type, wp_char_count, wp_link_count, wp_image_count, wp_max_ad_count; 
ANALYZE TABLE web_returns COMPUTE STATISTICS FOR COLUMNS wr_returned_date_sk, wr_returned_time_sk, wr_item_sk, wr_refunded_customer_sk, wr_refunded_cdemo_sk, wr_refunded_hdemo_sk, wr_refunded_addr_sk, wr_returning_customer_sk, wr_returning_cdemo_sk, wr_returning_hdemo_sk, wr_returning_addr_sk, wr_web_page_sk, wr_reason_sk, wr_order_number, wr_return_quantity, wr_return_amt, wr_return_tax, wr_return_amt_inc_tax, wr_fee, wr_return_ship_cost, wr_refunded_cash, wr_reversed_charge, wr_account_credit, wr_net_loss, (wr_item_sk, wr_order_number); 
ANALYZE TABLE web_sales COMPUTE STATISTICS FOR COLUMNS ws_sold_date_sk, ws_sold_time_sk, ws_ship_date_sk, ws_item_sk, ws_bill_customer_sk, ws_bill_cdemo_sk, ws_bill_hdemo_sk, ws_bill_addr_sk, ws_ship_customer_sk, ws_ship_cdemo_sk, ws_ship_hdemo_sk, ws_ship_addr_sk, ws_web_page_sk, ws_web_site_sk, ws_ship_mode_sk, ws_warehouse_sk, ws_promo_sk, ws_order_number, ws_quantity, ws_wholesale_cost, ws_list_price, ws_sales_price, ws_ext_discount_amt, ws_ext_sales_price, ws_ext_wholesale_cost, ws_ext_list_price, ws_ext_tax, ws_coupon_amt, ws_ext_ship_cost, ws_net_paid, ws_net_paid_inc_tax, ws_net_paid_inc_ship, ws_net_paid_inc_ship_tax, ws_net_profit, (ws_item_sk, ws_order_number); 
ANALYZE TABLE web_site COMPUTE STATISTICS FOR COLUMNS web_site_sk, web_site_id, web_rec_start_date, web_rec_end_date, web_name, web_open_date_sk, web_close_date_sk, web_class, web_manager, web_mkt_id, web_mkt_class, web_mkt_desc, web_market_manager, web_company_id, web_company_name, web_street_number,
Page | 36 
web_street_name, web_street_type, web_suite_number, web_city, web_county, web_state, web_zip, web_country, web_gmt_offset, web_tax_percentage; 
064.statviews.sh: 
This script was used to create “statviews” and collect statistics about them. 
The statviews give Big SQL‟s optimizer more information about joins on PK-FK columns. Only a subset of joins are modeled. 
DBNAME=$1 
schema=$2 
db2 connect to ${DBNAME} 
db2 -v set schema ${schema} 
# workaround for bug with statviews 
# need to select from any random table at the begninning of the connection 
# or we'll get a -901 during runstats on CS_GVIEW or CR_GVIEW 
db2 -v "select count(*) from date_dim" 
db2 -v "drop view cr_gview" 
db2 -v "drop view cs_gview" 
db2 -v "drop view sr_gview" 
db2 -v "drop view ss_gview" 
db2 -v "drop view wr_gview" 
db2 -v "drop view ws_gview" 
db2 -v "drop view c_gview" 
db2 -v "drop view inv_gview" 
db2 -v "drop view sv_date_dim" 
db2 -v "create view CR_GVIEW (c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64, c65, c66, c67, c68, c69, c70, c71, c72, c73, c74, c75, c76, c77, c78, c79, c80, c81, c82, c83,c84, c85, c86, c87, c88, c89, c90, c91, c92, c93, c94, c95, c96, c97, c98, c99, d_d_date) as 
( 
select T2.*, T3.*, T4.*, T5.*, T6.*, T7.*, DATE(T5.D_DATE) as D_D_DATE 
from CATALOG_RETURNS as T1, 
CATALOG_PAGE as T2, CUSTOMER_ADDRESS as T3, CUSTOMER as T4, 
DATE_DIM as T5, CUSTOMER_ADDRESS as T6, CUSTOMER as T7 
where T1.CR_CATALOG_PAGE_SK = T2.CP_CATALOG_PAGE_SK and 
T1.CR_REFUNDED_ADDR_SK = T3.CA_ADDRESS_SK and 
T1.CR_REFUNDED_CUSTOMER_SK = T4.C_CUSTOMER_SK and 
T1.CR_RETURNED_DATE_SK = T5.D_DATE_SK and 
T1.CR_RETURNING_ADDR_SK = T6.CA_ADDRESS_SK and 
T1.CR_RETURNING_CUSTOMER_SK = T7.C_CUSTOMER_SK 
)" 
db2 -v "create view CS_GVIEW (c1, c2, c3, c4, c5,c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64, c65, c66, c67, c68, c69, c70, c71, c72, c73, c74, c75, c76, c77, c78, c79, c80, c81, c82, c83,c84, c85, c86, c87, c88, c89, c90, c91, c92, c93, c94, c95, c96, c97, c98, c99, c100, c101, d_d_date1, d_d_date2) as 
( 
select T2.*, T3.*, T4.*, T5.*, T6.*, DATE(T4.D_DATE) as D_D_DATE1, DATE(T6.D_DATE) as D_D_DATE2 
from CATALOG_SALES as T1, 
CUSTOMER as T2, CATALOG_PAGE as T3, DATE_DIM as T4, 
CUSTOMER as T5, DATE_DIM as T6 
where T1.CS_BILL_CUSTOMER_SK = T2.C_CUSTOMER_SK and 
T1.CS_CATALOG_PAGE_SK = T3.CP_CATALOG_PAGE_SK and 
T1.CS_SHIP_DATE_SK = T4.D_DATE_SK and 
T1.CS_SHIP_CUSTOMER_SK = T5.C_CUSTOMER_SK and 
T1.CS_SOLD_DATE_SK = T6.D_DATE_SK 
)" 
db2 -v "create view SR_GVIEW as 
( 
select T2.*, T3.*, T4.*, T5.*, DATE(T3.D_DATE) as D_D_DATE 
from STORE_RETURNS as T1, 
CUSTOMER as T2, DATE_DIM as T3, TIME_DIM as T4, STORE as T5
Page | 37 
where T1.SR_CUSTOMER_SK = T2.C_CUSTOMER_SK and 
T1.SR_RETURNED_DATE_SK = T3.D_DATE_SK and 
T1.SR_RETURN_TIME_SK = T4.T_TIME_SK and 
T1.SR_STORE_SK = T5.S_STORE_SK 
)" 
db2 -v "create view SS_GVIEW as 
( 
select T2.*, T3.*, T4.*, DATE(T2.D_DATE) as D_D_DATE 
from STORE_SALES as T1, 
DATE_DIM as T2, TIME_DIM as T3, STORE as T4 
where T1.SS_SOLD_DATE_SK = T2.D_DATE_SK and 
T1.SS_SOLD_TIME_SK = T3.T_TIME_SK and 
T1.SS_STORE_SK = T4.S_STORE_SK 
)" 
db2 -v "create view WR_GVIEW (c1, c2, c3, c4, c5,c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64, c65, c66, c67, c68, c69, c70, c71, c72, c73, c74, c75, c76, c77, c78, c79, c80, c81, c82, c83,c84, c85, c86, c87, c88, c89, c90, c91, c92, c93, c94, c95, c96, c97, c98, c99, c100, c101, c102, c103, c104, c105, c106, c107, c108, D_D_DATE) as 
( 
select T2.*, T3.*, T4.*, T5.*, T6.*, T7.*, T8.*, DATE(T5.D_DATE) as D_D_DATE 
from WEB_RETURNS as T1, 
CUSTOMER_ADDRESS as T2, CUSTOMER_DEMOGRAPHICS as T3, CUSTOMER as T4, DATE_DIM as T5, 
CUSTOMER_ADDRESS as T6, CUSTOMER_DEMOGRAPHICS as T7, CUSTOMER as T8 
where T1.WR_REFUNDED_ADDR_SK = T2.CA_ADDRESS_SK and 
T1.WR_REFUNDED_CDEMO_SK = T3.CD_DEMO_SK and 
T1.WR_REFUNDED_CUSTOMER_SK = T4.C_CUSTOMER_SK and 
T1.WR_RETURNED_DATE_SK = T5.D_DATE_SK and 
T1.WR_RETURNING_ADDR_SK = T6.CA_ADDRESS_SK and 
T1.WR_RETURNING_CDEMO_SK = T7.CD_DEMO_SK and 
T1.WR_RETURNING_CUSTOMER_SK = T8.C_CUSTOMER_SK 
)" 
db2 -v "create view WS_GVIEW (c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64, c65, c66, c67, c68, c69, c70, c71, c72, c73, c74, c75, c76, c77, c78, c79, c80, c81, c82, c83,c84, c85, c86, c87, c88, c89, c90, c91, c92, D_D_DATE, E_D_DATE) as 
( 
select T2.*, T3.*, T4.*, T5.*, DATE(T3.D_DATE) as D_D_DATE, DATE(T5.D_DATE) as E_D_DATE 
from WEB_SALES as T1, 
CUSTOMER as T2, DATE_DIM as T3, CUSTOMER as T4, DATE_DIM as T5 
where T1.WS_BILL_CUSTOMER_SK = T2.C_CUSTOMER_SK and 
T1.WS_SHIP_CUSTOMER_SK = T4.C_CUSTOMER_SK and 
T1.WS_SHIP_DATE_SK = T3.D_DATE_SK and 
T1.WS_SOLD_DATE_SK = T5.D_DATE_SK 
)" 
db2 -v "create view C_GVIEW (c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64, c65, c66, c67, c68, c69, c70, c71, c72, c73, c74, c75, c76, c77, c78, D_D_DATE, E_D_DATE) as 
( 
select T2.*, T3.*, T4.*, T5.*, DATE(T4.D_DATE) as D_D_DATE, DATE(T5.D_DATE) as E_D_DATE 
from CUSTOMER as T1, 
CUSTOMER_ADDRESS as T2, CUSTOMER_DEMOGRAPHICS as T3, DATE_DIM as T4, DATE_DIM as T5 
where T1.C_CURRENT_ADDR_SK = T2.CA_ADDRESS_SK and 
T1.C_CURRENT_CDEMO_SK = T3.CD_DEMO_SK and 
T1.C_FIRST_SALES_DATE_SK = T4.D_DATE_SK and 
T1.C_FIRST_SHIPTO_DATE_SK = T5.D_DATE_SK 
)" 
db2 -v "create view INV_GVIEW as (select T2.*, DATE(T2.D_DATE) as D_D_DATE from INVENTORY as T1, DATE_DIM as T2 where T1.INV_DATE_SK=T2.D_DATE_SK)" 
db2 -v "create view SV_DATE_DIM as (select date(d_date) as d_d_date from DATE_DIM)" 
db2 -v "alter view CR_GVIEW enable query optimization"
Page | 38 
db2 -v "alter view CS_GVIEW enable query optimization" 
db2 -v "alter view SR_GVIEW enable query optimization" 
db2 -v "alter view SS_GVIEW enable query optimization" 
db2 -v "alter view WR_GVIEW enable query optimization" 
db2 -v "alter view WS_GVIEW enable query optimization" 
db2 -v "alter view C_GVIEW enable query optimization" 
db2 -v "alter view INV_GVIEW enable query optimization" 
db2 -v "alter view SV_DATE_DIM enable query optimization" 
# workaround for bug with statviews 
# need to run first runstats twice or we don't actually get any stats 
time db2 -v "runstats on table SV_DATE_DIM with distribution" 
time db2 -v "runstats on table SV_DATE_DIM with distribution" 
time db2 -v "runstats on table CR_GVIEW with distribution tablesample BERNOULLI(1)" 
time db2 -v "runstats on table CS_GVIEW with distribution tablesample BERNOULLI(1)" 
time db2 -v "runstats on table SR_GVIEW with distribution tablesample BERNOULLI(1)" 
time db2 -v "runstats on table SS_GVIEW with distribution tablesample BERNOULLI(1)" 
time db2 -v "runstats on table WR_GVIEW with distribution tablesample BERNOULLI(1)" 
time db2 -v "runstats on table WS_GVIEW with distribution tablesample BERNOULLI(1)" 
time db2 -v "runstats on table C_GVIEW with distribution tablesample BERNOULLI(1)" 
time db2 -v "runstats on table INV_GVIEW with distribution tablesample BERNOULLI(1)" 
db2 commit 
db2 terminate
Page | 39 
Appendix C: Tuning 
Installation options: 
During install, the following Big SQL properties were set. Node resource percentage was set to 90% in order to provide as much of the cluster resources as possible to Big SQL: 
Big SQL administrator user: bigsql 
Big SQL FCM start port: 62000 
Big SQL 1 server port: 7052 
Scheduler service port: 7053 
Scheduler administration port: 7054 
Big SQL server port: 51000 
Node resources percentage: 90% 
The following disk layout is in accordance with current BigInsights and Big SQL 3.0 best practices which recommend distributing all I/O for the Hadoop cluster across all disks: 
BigSQL2 data directory: /data1/db2/bigsql,/data2/db2/bigsql,/data3/db2/bigsql,/data4/db2/bigsql,/data5/db2/bigsql,/data6/db2/bigsql, /data7/db2/bigsql,/data8/db2/bigsql,/data9/db2/bigsql 
Cache directory: /data1/hadoop/mapred/local,/data2/hadoop/mapred/local,/data3/hadoop/mapred/local,/data4/hadoop/mapred/local, /data5/hadoop/mapred/local,/data6/hadoop/mapred/local,/data7/hadoop/mapred/local,/data8/hadoop/mapred/local, /data9/hadoop/mapred/local 
DataNode data directory: /data1/hadoop/hdfs/data,/data2/hadoop/hdfs/data,/data3/hadoop/hdfs/data,/data4/hadoop/hdfs/data,/data5/hadoop/hdfs/data,/data6/hadoop/hdfs/data,/data7/hadoop/hdfs/data,/data8/hadoop/hdfs/data,/data9/hadoop/hdfs/data 
Big SQL tuning options: 
## Configured for 128 GB of memory per node 
## 30 GB bufferpool 
## 3.125 GB sortheap / 50 GB sheapthres_shr 
## reader memory: 20% of total memory by default (user can raise it to 30%) 
## 
## other useful conf changes: 
## mapred-site.xml 
## mapred.tasktracker.map.tasks.maximum=20 
## mapred.tasktracker.reduce.tasks.maximum=6 
## mapreduce.map.java.opts="-Xmx3000m ..." 
## mapreduce.reduce.java.opts="-Xmx3000m ..." 
## 
## bigsql-conf.xml 
## dfsio.num_scanner_threads=12 
## dfsio.read_size=4194304 
## dfsio.num_threads_per_disk=2 
## scheduler.client.request.timeout=600000 
DBNAME=$1
Page | 40 
db2 connect to ${DBNAME} 
db2 -v "call syshadoop.big_sql_service_mode('on')" 
db2 -v "alter bufferpool IBMDEFAULTBP size 891520 " 
db2 -v "alter tablespace TEMPSPACE1 no file system caching" 
db2 -v "update db cfg for ${DBNAME} using sortheap 819200 sheapthres_shr 13107200" 
db2 -v "update db cfg for ${DBNAME} using dft_degree 8" 
db2 -v "update dbm cfg using max_querydegree ANY" 
db2 -v "update dbm cfg using aslheapsz 15" 
db2 -v "update dbm cfg using cpuspeed 1.377671e-07" 
db2 -v "update dbm cfg using INSTANCE_MEMORY 85" 
db2 -v "update dbm cfg using CONN_ELAPSE 18" 
## Disable auto maintenance 
db2 -v "update db cfg for bigsql using AUTO_MAINT OFF AUTO_TBL_MAINT OFF AUTO_RUNSTATS OFF AUTO_STMT_STATS OFF" 
db2 terminate 
BigInsights mapred-site.xml tuning: 
The following changes (highlighted) were made to the Hadoop mapred-site.xml file to tune the number of map-reduce slots, and the maximum memory allocated to these slots. In Big SQL, Map-Reduce is used for the LOAD and ANALYZE commands only, not query execution. The properties were tuned in order to get the best possible performance from these commands. 
<property> 
<!-- The maximum number of map tasks that will be run simultaneously by a 
task tracker. Default: 2. Recommendations: set relevant to number of 
CPUs and amount of memory on each data node. --> 
<name>mapred.tasktracker.map.tasks.maximum</name> 
<!--value><%= Math.max(2, Math.ceil(0.66 * Math.min(numOfDisks, numOfCores, totalMem/1000) * 1.75) - 2) %></value--> 
<value>20</value> 
</property> 
<property> 
<!-- The maximum number of reduce tasks that will be run simultaneously by 
a task tracker. Default: 2. Recommendations: set relevant to number of 
CPUs and amount of memory on each data node, note that reduces usually 
take more memory and do more I/O than maps. --> 
<name>mapred.tasktracker.reduce.tasks.maximum</name> 
<!--value><%= Math.max(2, Math.ceil(0.33 * Math.min(numOfDisks, numOfCores, totalMem/1000) * 1.75) - 2)%></value--> 
<value>6</value> 
</property> 
<property>
Page | 41 
<!-- Max heap of child JVM spawned by tasktracker. Ideally as large as the 
task machine can afford. The default -Xmx200m is usually too small. --> 
<name>mapreduce.map.java.opts</name> 
<value>-Xmx3000m -Xms1000m -Xmn100m -Xtune:virtualized - Xshareclasses:name=mrscc_%g,groupAccess,cacheDir=/var/ibm/biginsights/hadoop/tmp,nonFatal -Xscmx20m - Xdump:java:file=/var/ibm/biginsights/hadoop/tmp/javacore.%Y%m%d.%H%M%S.%pid.%seq.txt - Xdump:heap:file=/var/ibm/biginsights/hadoop/tmp/heapdump.%Y%m%d.%H%M%S.%pid.%seq.phd</value> 
</property> 
<property> 
<!-- Max heap of child JVM spawned by tasktracker. Ideally as large as the 
task machine can afford. The default -Xmx200m is usually too small. --> 
<name>mapreduce.reduce.java.opts</name> 
<value>-Xmx3000m -Xms1000m -Xmn100m -Xtune:virtualized - Xshareclasses:name=mrscc_%g,groupAccess,cacheDir=/var/ibm/biginsights/hadoop/tmp,nonFatal -Xscmx20m - Xdump:java:file=/var/ibm/biginsights/hadoop/tmp/javacore.%Y%m%d.%H%M%S.%pid.%seq.txt - Xdump:heap:file=/var/ibm/biginsights/hadoop/tmp/heapdump.%Y%m%d.%H%M%S.%pid.%seq.phd</value> 
</property> 
Big SQL dfs reader options: 
The following properties were changed in the Big SQL bigsql-conf.xml file to tune dfs reader properties: 
<property> 
<!-- Number of threads reading from each disk. 
Set this to 0 to use default values. --> 
<name>dfsio.num_threads_per_disk</name> 
<value>2</value> 
<!--value>0</value--> 
</property> 
<property> 
<!-- Read Size (in bytes) - Size of the reads sent to Hdfs (i.e., also the max I/O read buffer size). 
Default is 8*1024*1024 = 8388608 bytes --> 
<name>dfsio.read_size</name> 
<value>4194304</value> 
<!--value>8388608</value--> 
</property> 
….. 
<property> 
<!-- (Advanced) Cap on the number of scanner threads that will be created. 
If set to 0, the system decides. --> 
<name>dfsio.num_scanner_threads</name> 
<value>12</value> 
</property> 
Big SQL dfs logging: 
The minLogLevel property was changed in the Big SQL glog-dfsio.properties file to reduce the amount of logging by the dfs readers:
Page | 42 
glog_enabled=true 
log_dir=/var/ibm/biginsights/bigsql/logs 
log_filename=bigsql-ndfsio.log 
# 0 - INFO 
# 1 - WARN 
# 2 - ERROR 
# 3 - FATAL 
minloglevel=3 
OS Storage: 
The following script was used to create ext4 filesystems on all disks (to be used to store data) on all nodes in the cluster (inc. the master) – in-line with Big SQL best practices. 
Note- a single one SSD was used for swap during the test, the rest were unused: 
#!/bin/bash 
# READ / WRITE Performance tests for EXT4 file systems 
# Author - Stewart Tate, tates@us.ibm.com 
# Copyright (C) 2013, IBM Corp. All rights reserved.: 
################################################################# 
# the follow is server unique and MUST be adjusted! # 
################################################################# 
drives=(b g h i j k l m n) 
SSDdrives=(c d e f) 
echo "Create EXT4 file systems, version 130213b" 
echo " " 
pause() 
{ 
sleep 2 
} 
# make ext4 file systems on HDDs 
echo "Create EXT4 file systems on HDDs" 
for dev_range in ${drives[@]} 
do 
echo "y" | mkfs.ext4 -b 4096 -O dir_index,extent /dev/sd$dev_range 
done 
for dev_range in ${drives[@]} 
do 
parted /dev/sd$dev_range print 
done 
pause 
# make ext4 file systems on SSDs 
echo "Create EXT4 file systems on SSDs" 
for dev_range in ${SSDdrives[@]} 
do 
echo "y" | mkfs.ext4 -b 4096 -O dir_index,extent /dev/sd$dev_range 
done 
for dev_range in ${SSDdrives[@]} 
do 
parted /dev/sd$dev_range print
Page | 43 
echo "Partitions aligned(important for performance) if following returns 0:" 
blockdev --getalignoff /dev/sd$dev_range 
done 
exit 
The filesystems are then mounted using the following script: 
#!/bin/bash 
# READ / WRITE Performance tests for EXT4 file systems 
# Author - Stewart Tate, tates@us.ibm.com 
# Copyright (C) 2013, IBM Corp. All rights reserved.: 
################################################################# 
# the follow is server unique and MUST be adjusted! # 
################################################################# 
drives=(b g h i j k l m n) 
SSDdrives=(c d e f) 
echo "Mount EXT4 file systems, version 130213b" 
echo " " 
pause() 
{ 
sleep 2 
} 
j=0 
echo "Create EXT4 mount points for HDDs" 
for i in ${drives[@]} 
do 
let j++ 
mkdir /data$j 
mount -vs -t ext4 -o nobarrier,noatime,nodiratime,nobh,nouser_xattr,data=writeback,commit=100 /dev/sd$i /data$j 
done 
j=0 
echo "Create EXT4 mount points for SSDs" 
for i in ${SSDdrives[@]} 
do 
let j++ 
mkdir /datassd$j 
mount -vs -t ext4 -o nobarrier,noatime,nodiratime,discard,nobh,nouser_xattr,data=writeback,commit=100 /dev/sd$i /datassd$j 
done 
echo "Done." 
exit 
OS kernel changes: 
echo 0 > /proc/sys/vm/swappiness 
echo "net.ipv6.conf.all.disable_ipv6 = 1" >> /etc/sysctl.conf 
Active Hadoop components: 
In order to release valuable resources on the cluster only the following BigInsights components were started during the single- and multi-stream runs: bigsql, Hadoop, hive, catalog, zookeeper and console.
Page | 44 
Appendix D: Scaling and Database Population 
The following table details the cardinality of each table along with its on disk size stored in the parquet format. 
Table 
Cardinality 
Size on disk (in bytes) 
call_center 
60 
13373 
catalog_page 
46000 
2843783 
catalog_returns 
4319733388 
408957106444 
catalog_sales 
43198299410 
4094116501072 
customer 
80000000 
4512791675 
customer_address 
40000000 
986555270 
customer_demographics 
1920800 
7871742 
date_dim 
73049 
1832116 
household_demographics 
7200 
30851 
income_band 
20 
692 
inventory 
1627857000 
8646662210 
item 
462000 
46060058 
promotion 
2300 
116753 
reason 
72 
1890 
ship_mode 
20 
1497 
store 
1704 
154852 
store_returns 
8639847757 
656324500809 
store_sales 
86400432613 
5623520800788 
time_dim 
86400 
1134623 
warehouse 
27 
3782 
web_page 
4602 
86318 
web_returns 
2160007345 
205828528963 
web_sales 
21600036511 
2017966660709 
web_site 
84 
15977 
TOTAL 
13020920276247
Page | 45 
Appendix E: Queries 
Query Generation 
The queries used in this workload were generated using the TPC dsqgen tool, which applies random parameter substitutions to a set of query templates. Each query “stream” consisted of the 99 queries in a different order and each had different random parameter substitutions applied. 
Query Template Modifications 
We modified 12 (out of 99) query templates so the queries would execute on Big SQL and match the TPC-DS supplied result sets against the 1GB qualification database. 
The following query modifications were made: 
Minor Query Modification 
Queries using the MQM 
Output formatting functions - Scalar functions whose sole purpose is to affect output formatting or intermediate arithmetic result precision (such as CASTs) may be applied to items in the outermost SELECT list of the query. 
query84, query97 
Explicit Casting: For queries that divide values of two integer columns and compare the results with a decimal value, explicit casting into decimal of the integer columns is permissible for the purpose of matching the qualification output. 
Explicit integer division 
query 21, query34, query78, query83 
Implicit integer division (e.g. avg()) 
query7, query22, query26, query27, query39 
Date expressions - For queries that include an expression involving manipulation of dates (e.g., adding/subtracting days/months/years, or extracting years from dates), vendor-specific syntax may be used instead of the specified syntax. Replacement syntax must have equivalent semantic behavior. Examples of acceptable implementations include "YEAR(<column>)" to extract the year from a date column or "DATE(<date>) + 3 MONTHS" to add 3 months to a date. 
query72 
Query Execution Order 
The execution order of queries within all streams is determined by the toolkit and is based upon a random number whose seed is the timestamp of the end of the load. 
In this instance, the queries were executed in the following order: Single-Stream Multi-Stream Stream 0 Stream 1 Stream 2 Stream 3 Stream 4 
96 
83 
56 
89 
79 
7 
32 
98 
5 
39 
75 
30 
59 
52 
93 
44 
92 
24 
62 
41 
39 
66 
88 
53 
29 
80 
84 
2 
7 
32 
32 
98 
5 
39 
66 
19 
58 
6 
80 
84 
25 
16 
27 
63 
8 
78 
77 
87 
72 
71 
86 
40 
90 
18 
45 
1 
96 
83 
56 
89 
91 
13 
91 
13 
91
Page | 46 
21 
36 
28 
69 
14 
43 
95 
68 
23 
46 
27 
63 
8 
19 
58 
94 
99 
76 
74 
48 
45 
3 
75 
30 
59 
58 
6 
80 
84 
2 
64 
12 
1 
96 
83 
36 
28 
69 
14 
21 
33 
85 
26 
10 
78 
46 
51 
11 
86 
40 
62 
41 
17 
94 
99 
16 
27 
63 
8 
19 
10 
78 
77 
87 
72 
63 
8 
19 
58 
6 
69 
14 
21 
36 
28 
60 
50 
31 
37 
81 
59 
52 
93 
4 
44 
37 
81 
54 
38 
97 
98 
5 
39 
66 
88 
85 
26 
10 
78 
77 
70 
57 
15 
43 
95 
67 
82 
55 
22 
33 
28 
69 
14 
21 
36 
81 
54 
38 
97 
61 
97 
61 
42 
47 
35 
66 
88 
53 
29 
60 
90 
18 
45 
3 
75 
17 
94 
99 
76 
74 
47 
35 
67 
82 
55 
95 
68 
23 
46 
51 
92 
24 
62 
41 
17 
3 
75 
30 
59 
52 
51 
11 
86 
40 
90 
35 
67 
82 
55 
22 
49 
9 
25 
16 
27 
9 
25 
16 
27 
63 
31 
37 
81 
54 
38 
11 
86 
40 
90 
18 
93 
4 
44 
92 
24 
29 
60 
50 
31 
37 
38 
97 
61 
42 
47 
22 
33 
85 
26 
10 
89 
79 
73 
34 
70 
15 
43 
95 
68 
23 
6 
80 
84 
2 
7 
52 
93 
4 
44 
92 
50 
31 
37 
81 
54 
42 
47 
35 
67 
82 
41 
17 
94 
99 
76 
8 
19 
58 
6 
80 
12 
1 
96 
83 
56 
20 
64 
12 
1 
96
Page | 47 
88 
53 
29 
60 
50 
82 
55 
22 
33 
85 
23 
46 
51 
11 
86 
14 
21 
36 
28 
69 
57 
15 
43 
95 
68 
65 
20 
64 
12 
1 
71 
65 
20 
64 
12 
34 
70 
57 
15 
43 
48 
49 
9 
25 
16 
30 
59 
52 
93 
4 
74 
48 
49 
9 
25 
87 
72 
71 
65 
20 
77 
87 
72 
71 
65 
73 
34 
70 
57 
15 
84 
2 
7 
32 
98 
54 
38 
97 
61 
42 
55 
22 
33 
85 
26 
56 
89 
79 
73 
34 
2 
7 
32 
98 
5 
26 
10 
78 
77 
87 
40 
90 
18 
45 
3 
72 
71 
65 
20 
64 
53 
29 
60 
50 
31 
79 
73 
34 
70 
57 
18 
45 
3 
75 
30 
13 
91 
13 
91 
13 
24 
62 
41 
17 
94 
4 
44 
92 
24 
62 
99 
76 
74 
48 
49 
68 
23 
46 
51 
11 
83 
56 
89 
79 
73 
61 
42 
47 
35 
67 
5 
39 
66 
88 
53 
76 
74 
48 
49 
9 
Query Text: 
The following is the full query text for all 99 queries executed in the single-stream run. 
-- start query 1 in stream 0 using template query96.tpl and seed 1798621055 
select count(*) 
from store_sales 
,household_demographics 
,time_dim, store 
where ss_sold_time_sk = time_dim.t_time_sk 
and ss_hdemo_sk = household_demographics.hd_demo_sk 
and ss_store_sk = s_store_sk 
and time_dim.t_hour = 8 
and time_dim.t_minute >= 30 
and household_demographics.hd_dep_count = 6 
and store.s_store_name = 'ese' 
order by count(*) 
fetch first 100 rows only; 
-- end query 1 in stream 0 using template query96.tpl 
-- start query 2 in stream 0 using template query7.tpl and seed 335100942 
select i_item_id, 
avg(cast(ss_quantity as double)) agg1, 
avg(ss_list_price) agg2,
Page | 48 
avg(ss_coupon_amt) agg3, 
avg(ss_sales_price) agg4 
from store_sales, customer_demographics, date_dim, item, promotion 
where ss_sold_date_sk = d_date_sk and 
ss_item_sk = i_item_sk and 
ss_cdemo_sk = cd_demo_sk and 
ss_promo_sk = p_promo_sk and 
cd_gender = 'M' and 
cd_marital_status = 'W' and 
cd_education_status = 'College' and 
(p_channel_email = 'N' or p_channel_event = 'N') and 
d_year = 2001 
group by i_item_id 
order by i_item_id 
fetch first 100 rows only; 
-- end query 2 in stream 0 using template query7.tpl 
-- start query 3 in stream 0 using template query75.tpl and seed 1536248466 
WITH all_sales AS ( 
SELECT d_year 
,i_brand_id 
,i_class_id 
,i_category_id 
,i_manufact_id 
,SUM(sales_cnt) AS sales_cnt 
,SUM(sales_amt) AS sales_amt 
FROM (SELECT d_year 
,i_brand_id 
,i_class_id 
,i_category_id 
,i_manufact_id 
,cs_quantity - COALESCE(cr_return_quantity,0) AS sales_cnt 
,cs_ext_sales_price - COALESCE(cr_return_amount,0.0) AS sales_amt 
FROM catalog_sales JOIN item ON i_item_sk=cs_item_sk 
JOIN date_dim ON d_date_sk=cs_sold_date_sk 
LEFT JOIN catalog_returns ON (cs_order_number=cr_order_number 
AND cs_item_sk=cr_item_sk) 
WHERE i_category='Home' 
UNION 
SELECT d_year 
,i_brand_id 
,i_class_id 
,i_category_id 
,i_manufact_id 
,ss_quantity - COALESCE(sr_return_quantity,0) AS sales_cnt 
,ss_ext_sales_price - COALESCE(sr_return_amt,0.0) AS sales_amt 
FROM store_sales JOIN item ON i_item_sk=ss_item_sk 
JOIN date_dim ON d_date_sk=ss_sold_date_sk 
LEFT JOIN store_returns ON (ss_ticket_number=sr_ticket_number 
AND ss_item_sk=sr_item_sk) 
WHERE i_category='Home' 
UNION 
SELECT d_year 
,i_brand_id 
,i_class_id 
,i_category_id 
,i_manufact_id 
,ws_quantity - COALESCE(wr_return_quantity,0) AS sales_cnt 
,ws_ext_sales_price - COALESCE(wr_return_amt,0.0) AS sales_amt 
FROM web_sales JOIN item ON i_item_sk=ws_item_sk 
JOIN date_dim ON d_date_sk=ws_sold_date_sk 
LEFT JOIN web_returns ON (ws_order_number=wr_order_number 
AND ws_item_sk=wr_item_sk) 
WHERE i_category='Home') sales_detail 
GROUP BY d_year, i_brand_id, i_class_id, i_category_id, i_manufact_id) 
SELECT prev_yr.d_year AS prev_year 
,curr_yr.d_year AS year 
,curr_yr.i_brand_id 
,curr_yr.i_class_id 
,curr_yr.i_category_id 
,curr_yr.i_manufact_id 
,prev_yr.sales_cnt AS prev_yr_cnt 
,curr_yr.sales_cnt AS curr_yr_cnt 
,curr_yr.sales_cnt- prev_yr.sales_cnt AS sales_cnt_diff 
,curr_yr.sales_amt- prev_yr.sales_amt AS sales_amt_diff 
FROM all_sales curr_yr, all_sales prev_yr
Page | 49 
WHERE curr_yr.i_brand_id=prev_yr.i_brand_id 
AND curr_yr.i_class_id=prev_yr.i_class_id 
AND curr_yr.i_category_id=prev_yr.i_category_id 
AND curr_yr.i_manufact_id=prev_yr.i_manufact_id 
AND curr_yr.d_year=1999 
AND prev_yr.d_year=1999-1 
AND CAST(curr_yr.sales_cnt AS DECIMAL(17,2))/CAST(prev_yr.sales_cnt AS DECIMAL(17,2))<0.9 
ORDER BY sales_cnt_diff 
fetch first 100 rows only; 
-- end query 3 in stream 0 using template query75.tpl 
-- start query 4 in stream 0 using template query44.tpl and seed 549695959 
select asceding.rnk, i1.i_product_name best_performing, i2.i_product_name worst_performing 
from(select * 
from (select item_sk,rank() over (order by rank_col asc) rnk 
from (select ss_item_sk item_sk,avg(ss_net_profit) rank_col 
from store_sales ss1 
where ss_store_sk = 218 
group by ss_item_sk 
having avg(ss_net_profit) > 0.9*(select avg(ss_net_profit) rank_col 
from store_sales 
where ss_store_sk = 218 
and ss_promo_sk is null 
group by ss_store_sk))V1)V11 
where rnk < 11) asceding, 
(select * 
from (select item_sk,rank() over (order by rank_col desc) rnk 
from (select ss_item_sk item_sk,avg(ss_net_profit) rank_col 
from store_sales ss1 
where ss_store_sk = 218 
group by ss_item_sk 
having avg(ss_net_profit) > 0.9*(select avg(ss_net_profit) rank_col 
from store_sales 
where ss_store_sk = 218 
and ss_promo_sk is null 
group by ss_store_sk))V2)V21 
where rnk < 11) descending, 
item i1, 
item i2 
where asceding.rnk = descending.rnk 
and i1.i_item_sk=asceding.item_sk 
and i2.i_item_sk=descending.item_sk 
order by asceding.rnk 
fetch first 100 rows only; 
-- end query 4 in stream 0 using template query44.tpl 
-- start query 5 in stream 0 using template query39.tpl and seed 547624773 
with inv as 
(select w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy 
,stdev,mean, case mean when 0 then null else stdev/mean end cov 
from(select w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy 
,stddev_samp(inv_quantity_on_hand) stdev,avg(cast(inv_quantity_on_hand as double)) mean 
from inventory 
,item 
,warehouse 
,date_dim 
where inv_item_sk = i_item_sk 
and inv_warehouse_sk = w_warehouse_sk 
and inv_date_sk = d_date_sk 
and d_year =1998 
group by w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy) foo 
where case mean when 0 then 0 else stdev/mean end > 1) 
select inv1.w_warehouse_sk,inv1.i_item_sk,inv1.d_moy,inv1.mean, inv1.cov 
,inv2.w_warehouse_sk,inv2.i_item_sk,inv2.d_moy,inv2.mean, inv2.cov 
from inv inv1,inv inv2 
where inv1.i_item_sk = inv2.i_item_sk 
and inv1.w_warehouse_sk = inv2.w_warehouse_sk 
and inv1.d_moy=2 
and inv2.d_moy=2+1 
order by inv1.w_warehouse_sk,inv1.i_item_sk,inv1.d_moy,inv1.mean,inv1.cov 
,inv2.d_moy,inv2.mean, inv2.cov 
;
Page | 50 
with inv as 
(select w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy 
,stdev,mean, case mean when 0 then null else stdev/mean end cov 
from(select w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy 
,stddev_samp(inv_quantity_on_hand) stdev,avg(cast(inv_quantity_on_hand as double)) mean 
from inventory 
,item 
,warehouse 
,date_dim 
where inv_item_sk = i_item_sk 
and inv_warehouse_sk = w_warehouse_sk 
and inv_date_sk = d_date_sk 
and d_year =1998 
group by w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy) foo 
where case mean when 0 then 0 else stdev/mean end > 1) 
select inv1.w_warehouse_sk,inv1.i_item_sk,inv1.d_moy,inv1.mean, inv1.cov 
,inv2.w_warehouse_sk,inv2.i_item_sk,inv2.d_moy,inv2.mean, inv2.cov 
from inv inv1,inv inv2 
where inv1.i_item_sk = inv2.i_item_sk 
and inv1.w_warehouse_sk = inv2.w_warehouse_sk 
and inv1.d_moy=2 
and inv2.d_moy=2+1 
and inv1.cov > 1.5 
order by inv1.w_warehouse_sk,inv1.i_item_sk,inv1.d_moy,inv1.mean,inv1.cov 
,inv2.d_moy,inv2.mean, inv2.cov 
; 
-- end query 5 in stream 0 using template query39.tpl 
-- start query 6 in stream 0 using template query80.tpl and seed 800632380 
with ssr as 
(select s_store_id as store_id, 
sum(ss_ext_sales_price) as sales, 
sum(coalesce(sr_return_amt, 0)) as returns, 
sum(ss_net_profit - coalesce(sr_net_loss, 0)) as profit 
from store_sales left outer join store_returns on 
(ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number), 
date_dim, 
store, 
item, 
promotion 
where ss_sold_date_sk = d_date_sk 
and d_date between cast('1999-08-15' as date) 
and (cast('1999-08-15' as date) + 30 days) 
and ss_store_sk = s_store_sk 
and ss_item_sk = i_item_sk 
and i_current_price > 50 
and ss_promo_sk = p_promo_sk 
and p_channel_tv = 'N' 
group by s_store_id) 
, 
csr as 
(select cp_catalog_page_id as catalog_page_id, 
sum(cs_ext_sales_price) as sales, 
sum(coalesce(cr_return_amount, 0)) as returns, 
sum(cs_net_profit - coalesce(cr_net_loss, 0)) as profit 
from catalog_sales left outer join catalog_returns on 
(cs_item_sk = cr_item_sk and cs_order_number = cr_order_number), 
date_dim, 
catalog_page, 
item, 
promotion 
where cs_sold_date_sk = d_date_sk 
and d_date between cast('1999-08-15' as date) 
and (cast('1999-08-15' as date) + 30 days) 
and cs_catalog_page_sk = cp_catalog_page_sk 
and cs_item_sk = i_item_sk 
and i_current_price > 50 
and cs_promo_sk = p_promo_sk 
and p_channel_tv = 'N' 
group by cp_catalog_page_id) 
, 
wsr as 
(select web_site_id, 
sum(ws_ext_sales_price) as sales, 
sum(coalesce(wr_return_amt, 0)) as returns,
Page | 51 
sum(ws_net_profit - coalesce(wr_net_loss, 0)) as profit 
from web_sales left outer join web_returns on 
(ws_item_sk = wr_item_sk and ws_order_number = wr_order_number), 
date_dim, 
web_site, 
item, 
promotion 
where ws_sold_date_sk = d_date_sk 
and d_date between cast('1999-08-15' as date) 
and (cast('1999-08-15' as date) + 30 days) 
and ws_web_site_sk = web_site_sk 
and ws_item_sk = i_item_sk 
and i_current_price > 50 
and ws_promo_sk = p_promo_sk 
and p_channel_tv = 'N' 
group by web_site_id) 
select channel 
, id 
, sum(sales) as sales 
, sum(returns) as returns 
, sum(profit) as profit 
from 
(select 'store channel' as channel 
, 'store' || store_id as id 
, sales 
, returns 
, profit 
from ssr 
union all 
select 'catalog channel' as channel 
, 'catalog_page' || catalog_page_id as id 
, sales 
, returns 
, profit 
from csr 
union all 
select 'web channel' as channel 
, 'web_site' || web_site_id as id 
, sales 
, returns 
, profit 
from wsr 
) x 
group by rollup (channel, id) 
order by channel 
,id 
fetch first 100 rows only; 
-- end query 6 in stream 0 using template query80.tpl 
-- start query 7 in stream 0 using template query32.tpl and seed 965672451 
select sum(cs_ext_discount_amt) as "excess discount amount" 
from 
catalog_sales 
,item 
,date_dim 
where 
i_manufact_id = 452 
and i_item_sk = cs_item_sk 
and d_date between '2001-03-31' and 
(cast('2001-03-31' as date) + 90 days) 
and d_date_sk = cs_sold_date_sk 
and cs_ext_discount_amt 
> ( 
select 
1.3 * avg(cs_ext_discount_amt) 
from 
catalog_sales 
,date_dim 
where 
cs_item_sk = i_item_sk 
and d_date between '2001-03-31' and 
(cast('2001-03-31' as date) + 90 days) 
and d_date_sk = cs_sold_date_sk 
) 
fetch first 100 rows only; 
-- end query 7 in stream 0 using template query32.tpl 
-- start query 8 in stream 0 using template query19.tpl and seed 98764099 
select i_brand_id brand_id, i_brand brand, i_manufact_id, i_manufact, 
sum(ss_ext_sales_price) ext_price 
from date_dim, store_sales, item,customer,customer_address,store 
where d_date_sk = ss_sold_date_sk 
and ss_item_sk = i_item_sk 
and i_manager_id=46 
and d_moy=11
Page | 52 
and d_year=2002 
and ss_customer_sk = c_customer_sk 
and c_current_addr_sk = ca_address_sk 
and substr(ca_zip,1,5) <> substr(s_zip,1,5) 
and ss_store_sk = s_store_sk 
group by i_brand 
,i_brand_id 
,i_manufact_id 
,i_manufact 
order by ext_price desc 
,i_brand 
,i_brand_id 
,i_manufact_id 
,i_manufact 
fetch first 100 rows only ; 
-- end query 8 in stream 0 using template query19.tpl 
-- start query 9 in stream 0 using template query25.tpl and seed 280059134 
select 
i_item_id 
,i_item_desc 
,s_store_id 
,s_store_name 
,stddev_samp(ss_net_profit) as store_sales_profit 
,stddev_samp(sr_net_loss) as store_returns_loss 
,stddev_samp(cs_net_profit) as catalog_sales_profit 
from 
store_sales 
,store_returns 
,catalog_sales 
,date_dim d1 
,date_dim d2 
,date_dim d3 
,store 
,item 
where 
d1.d_moy = 4 
and d1.d_year = 2002 
and d1.d_date_sk = ss_sold_date_sk 
and i_item_sk = ss_item_sk 
and s_store_sk = ss_store_sk 
and ss_customer_sk = sr_customer_sk 
and ss_item_sk = sr_item_sk 
and ss_ticket_number = sr_ticket_number 
and sr_returned_date_sk = d2.d_date_sk 
and d2.d_moy between 4 and 10 
and d2.d_year = 2002 
and sr_customer_sk = cs_bill_customer_sk 
and sr_item_sk = cs_item_sk 
and cs_sold_date_sk = d3.d_date_sk 
and d3.d_moy between 4 and 10 
and d3.d_year = 2002 
group by 
i_item_id 
,i_item_desc 
,s_store_id 
,s_store_name 
order by 
i_item_id 
,i_item_desc 
,s_store_id 
,s_store_name 
fetch first 100 rows only; 
-- end query 9 in stream 0 using template query25.tpl 
-- start query 10 in stream 0 using template query78.tpl and seed 76559093 
with ws as 
(select d_year AS ws_sold_year, ws_item_sk, 
ws_bill_customer_sk ws_customer_sk, 
sum(ws_quantity) ws_qty, 
sum(ws_wholesale_cost) ws_wc, 
sum(ws_sales_price) ws_sp 
from web_sales 
left join web_returns on wr_order_number=ws_order_number and ws_item_sk=wr_item_sk 
join date_dim on ws_sold_date_sk = d_date_sk 
where wr_order_number is null 
group by d_year, ws_item_sk, ws_bill_customer_sk 
), 
cs as 
(select d_year AS cs_sold_year, cs_item_sk, 
cs_bill_customer_sk cs_customer_sk, 
sum(cs_quantity) cs_qty, 
sum(cs_wholesale_cost) cs_wc, 
sum(cs_sales_price) cs_sp 
from catalog_sales 
left join catalog_returns on cr_order_number=cs_order_number and cs_item_sk=cr_item_sk
Page | 53 
join date_dim on cs_sold_date_sk = d_date_sk 
where cr_order_number is null 
group by d_year, cs_item_sk, cs_bill_customer_sk 
), 
ss as 
(select d_year AS ss_sold_year, ss_item_sk, 
ss_customer_sk, 
sum(ss_quantity) ss_qty, 
sum(ss_wholesale_cost) ss_wc, 
sum(ss_sales_price) ss_sp 
from store_sales 
left join store_returns on sr_ticket_number=ss_ticket_number and ss_item_sk=sr_item_sk 
join date_dim on ss_sold_date_sk = d_date_sk 
where sr_ticket_number is null 
group by d_year, ss_item_sk, ss_customer_sk 
) 
select 
ss_item_sk, 
round(cast(ss_qty as double)/cast(coalesce(ws_qty+cs_qty,1) as double),2) ratio, 
ss_qty store_qty, ss_wc store_wholesale_cost, ss_sp store_sales_price, 
coalesce(ws_qty,0)+coalesce(cs_qty,0) other_chan_qty, 
coalesce(ws_wc,0)+coalesce(cs_wc,0) other_chan_wholesale_cost, 
coalesce(ws_sp,0)+coalesce(cs_sp,0) other_chan_sales_price 
from ss 
left join ws on (ws_sold_year=ss_sold_year and ws_item_sk=ss_item_sk and ws_customer_sk=ss_customer_sk) 
left join cs on (cs_sold_year=ss_sold_year and cs_item_sk=cs_item_sk and cs_customer_sk=ss_customer_sk) 
where coalesce(ws_qty,0)>0 and coalesce(cs_qty, 0)>0 and ss_sold_year=2001 
order by 
ss_item_sk, 
ss_qty desc, ss_wc desc, ss_sp desc, 
other_chan_qty, 
other_chan_wholesale_cost, 
other_chan_sales_price, 
round(ss_qty/(coalesce(ws_qty+cs_qty,1)),2) 
fetch first 100 rows only; 
-- end query 10 in stream 0 using template query78.tpl 
-- start query 11 in stream 0 using template query86.tpl and seed 1622352946 
select 
sum(ws_net_paid) as total_sum 
,i_category 
,i_class 
,grouping(i_category)+grouping(i_class) as lochierarchy 
,rank() over ( 
partition by grouping(i_category)+grouping(i_class), 
case when grouping(i_class) = 0 then i_category end 
order by sum(ws_net_paid) desc) as rank_within_parent 
from 
web_sales 
,date_dim d1 
,item 
where 
d1.d_month_seq between 1193 and 1193+11 
and d1.d_date_sk = ws_sold_date_sk 
and i_item_sk = ws_item_sk 
group by rollup(i_category,i_class) 
order by 
lochierarchy desc, 
case when lochierarchy = 0 then i_category end, 
rank_within_parent 
fetch first 100 rows only; 
-- end query 11 in stream 0 using template query86.tpl 
-- start query 12 in stream 0 using template query1.tpl and seed 1023042618 
with customer_total_return as 
(select sr_customer_sk as ctr_customer_sk 
,sr_store_sk as ctr_store_sk 
,sum(SR_FEE) as ctr_total_return 
from store_returns 
,date_dim 
where sr_returned_date_sk = d_date_sk 
and d_year =2001 
group by sr_customer_sk 
,sr_store_sk) 
select c_customer_id 
from customer_total_return ctr1 
,store 
,customer
Page | 54 
where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2 
from customer_total_return ctr2 
where ctr1.ctr_store_sk = ctr2.ctr_store_sk) 
and s_store_sk = ctr1.ctr_store_sk 
and s_state = 'CA' 
and ctr1.ctr_customer_sk = c_customer_sk 
order by c_customer_id 
fetch first 100 rows only; 
-- end query 12 in stream 0 using template query1.tpl 
-- start query 13 in stream 0 using template query91.tpl and seed 1522589387 
select 
cc_call_center_id Call_Center, 
cc_name Call_Center_Name, 
cc_manager Manager, 
sum(cr_net_loss) Returns_Loss 
from 
call_center, 
catalog_returns, 
date_dim, 
customer, 
customer_address, 
customer_demographics, 
household_demographics 
where 
cr_call_center_sk = cc_call_center_sk 
and cr_returned_date_sk = d_date_sk 
and cr_returning_customer_sk= c_customer_sk 
and cd_demo_sk = c_current_cdemo_sk 
and hd_demo_sk = c_current_hdemo_sk 
and ca_address_sk = c_current_addr_sk 
and d_year = 2000 
and d_moy = 11 
and ( (cd_marital_status = 'M' and cd_education_status = 'Unknown') 
or(cd_marital_status = 'W' and cd_education_status = 'Advanced Degree')) 
and hd_buy_potential like '5001-10000%' 
and ca_gmt_offset = -6 
group by cc_call_center_id,cc_name,cc_manager,cd_marital_status,cd_education_status 
order by sum(cr_net_loss) desc; 
-- end query 13 in stream 0 using template query91.tpl 
-- start query 14 in stream 0 using template query21.tpl and seed 464370483 
select * 
from(select w_warehouse_name 
,i_item_id 
,sum(case when (cast(d_date as date) < cast ('2000-06-11' as date)) 
then inv_quantity_on_hand 
else 0 end) as inv_before 
,sum(case when (cast(d_date as date) >= cast ('2000-06-11' as date)) 
then inv_quantity_on_hand 
else 0 end) as inv_after 
from inventory 
,warehouse 
,item 
,date_dim 
where i_current_price between 0.99 and 1.49 
and i_item_sk = inv_item_sk 
and inv_warehouse_sk = w_warehouse_sk 
and inv_date_sk = d_date_sk 
and d_date between (cast ('2000-06-11' as date) - 30 days) 
and (cast ('2000-06-11' as date) + 30 days) 
group by w_warehouse_name, i_item_id) x 
where (case when inv_before > 0 
then cast(inv_after as double) / cast(inv_before as double) 
else null 
end) between 2.0/3.0 and 3.0/2.0 
order by w_warehouse_name 
,i_item_id 
fetch first 100 rows only; 
-- end query 14 in stream 0 using template query21.tpl 
-- start query 15 in stream 0 using template query43.tpl and seed 456971165 
select s_store_name, s_store_id, 
sum(case when (d_day_name='Sunday') then ss_sales_price else null end) sun_sales, 
sum(case when (d_day_name='Monday') then ss_sales_price else null end) mon_sales, 
sum(case when (d_day_name='Tuesday') then ss_sales_price else null end) tue_sales, 
sum(case when (d_day_name='Wednesday') then ss_sales_price else null end) wed_sales, 
sum(case when (d_day_name='Thursday') then ss_sales_price else null end) thu_sales,
Page | 55 
sum(case when (d_day_name='Friday') then ss_sales_price else null end) fri_sales, 
sum(case when (d_day_name='Saturday') then ss_sales_price else null end) sat_sales 
from date_dim, store_sales, store 
where d_date_sk = ss_sold_date_sk and 
s_store_sk = ss_store_sk and 
s_gmt_offset = -6 and 
d_year = 2001 
group by s_store_name, s_store_id 
order by s_store_name, s_store_id,sun_sales,mon_sales,tue_sales,wed_sales,thu_sales,fri_sales,sat_sales 
fetch first 100 rows only; 
-- end query 15 in stream 0 using template query43.tpl 
-- start query 16 in stream 0 using template query27.tpl and seed 1310184181 
select i_item_id, 
s_state, grouping(s_state) g_state, 
avg(cast(ss_quantity as double)) agg1, 
avg(ss_list_price) agg2, 
avg(ss_coupon_amt) agg3, 
avg(ss_sales_price) agg4 
from store_sales, customer_demographics, date_dim, store, item 
where ss_sold_date_sk = d_date_sk and 
ss_item_sk = i_item_sk and 
ss_store_sk = s_store_sk and 
ss_cdemo_sk = cd_demo_sk and 
cd_gender = 'M' and 
cd_marital_status = 'U' and 
cd_education_status = '2 yr Degree' and 
d_year = 1999 and 
s_state in ('KS','MI', 'TX', 'SC', 'MN', 'WV') 
group by rollup (i_item_id, s_state) 
order by i_item_id 
,s_state 
fetch first 100 rows only; 
-- end query 16 in stream 0 using template query27.tpl 
-- start query 17 in stream 0 using template query94.tpl and seed 471030633 
select 
count(distinct ws_order_number) as "order count" 
,sum(ws_ext_ship_cost) as "total shipping cost" 
,sum(ws_net_profit) as "total net profit" 
from 
web_sales ws1 
,date_dim 
,customer_address 
,web_site 
where 
d_date between '2000-4-01' and 
(cast('2000-4-01' as date) + 60 days) 
and ws1.ws_ship_date_sk = d_date_sk 
and ws1.ws_ship_addr_sk = ca_address_sk 
and ca_state = 'OR' 
and ws1.ws_web_site_sk = web_site_sk 
and web_company_name = 'pri' 
and exists (select * 
from web_sales ws2 
where ws1.ws_order_number = ws2.ws_order_number 
and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk) 
and not exists(select * 
from web_returns wr1 
where ws1.ws_order_number = wr1.wr_order_number) 
order by count(distinct ws_order_number) 
fetch first 100 rows only; 
-- end query 17 in stream 0 using template query94.tpl 
-- start query 18 in stream 0 using template query45.tpl and seed 454612518 
select ca_zip, ca_city, sum(ws_sales_price) 
from web_sales, customer, customer_address, date_dim, item 
where ws_bill_customer_sk = c_customer_sk 
and c_current_addr_sk = ca_address_sk 
and ws_item_sk = i_item_sk 
and ( substr(ca_zip,1,5) in ('85669', '86197','88274','83405','86475', '85392', '85460', '80348', '81792') 
or 
i_item_id in (select i_item_id 
from item 
where i_item_sk in (2, 3, 5, 7, 11, 13, 17, 19, 23, 29) 
) 
) 
and ws_sold_date_sk = d_date_sk 
and d_qoy = 2 and d_year = 2001 
group by ca_zip, ca_city 
order by ca_zip, ca_city
Page | 56 
fetch first 100 rows only; 
-- end query 18 in stream 0 using template query45.tpl 
-- start query 19 in stream 0 using template query58.tpl and seed 171616907 
with ss_items as 
(select i_item_id item_id 
,sum(ss_ext_sales_price) ss_item_rev 
from store_sales 
,item 
,date_dim 
where ss_item_sk = i_item_sk 
and d_date in (select d_date 
from date_dim 
where d_week_seq = (select d_week_seq 
from date_dim 
where d_date = '2000-07-12')) 
and ss_sold_date_sk = d_date_sk 
group by i_item_id), 
cs_items as 
(select i_item_id item_id 
,sum(cs_ext_sales_price) cs_item_rev 
from catalog_sales 
,item 
,date_dim 
where cs_item_sk = i_item_sk 
and d_date in (select d_date 
from date_dim 
where d_week_seq = (select d_week_seq 
from date_dim 
where d_date = '2000-07-12')) 
and cs_sold_date_sk = d_date_sk 
group by i_item_id), 
ws_items as 
(select i_item_id item_id 
,sum(ws_ext_sales_price) ws_item_rev 
from web_sales 
,item 
,date_dim 
where ws_item_sk = i_item_sk 
and d_date in (select d_date 
from date_dim 
where d_week_seq =(select d_week_seq 
from date_dim 
where d_date = '2000-07-12')) 
and ws_sold_date_sk = d_date_sk 
group by i_item_id) 
select ss_items.item_id 
,ss_item_rev 
,ss_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 ss_dev 
,cs_item_rev 
,cs_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 cs_dev 
,ws_item_rev 
,ws_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 ws_dev 
,(ss_item_rev+cs_item_rev+ws_item_rev)/3 average 
from ss_items,cs_items,ws_items 
where ss_items.item_id=cs_items.item_id 
and ss_items.item_id=ws_items.item_id 
and ss_item_rev between 0.9 * cs_item_rev and 1.1 * cs_item_rev 
and ss_item_rev between 0.9 * ws_item_rev and 1.1 * ws_item_rev 
and cs_item_rev between 0.9 * ss_item_rev and 1.1 * ss_item_rev 
and cs_item_rev between 0.9 * ws_item_rev and 1.1 * ws_item_rev 
and ws_item_rev between 0.9 * ss_item_rev and 1.1 * ss_item_rev 
and ws_item_rev between 0.9 * cs_item_rev and 1.1 * cs_item_rev 
order by item_id 
,ss_item_rev 
fetch first 100 rows only; 
-- end query 19 in stream 0 using template query58.tpl 
-- start query 20 in stream 0 using template query64.tpl and seed 479369059 
with cs_ui as 
(select cs_item_sk 
,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund 
from catalog_sales 
,catalog_returns 
where cs_item_sk = cr_item_sk 
and cs_order_number = cr_order_number 
group by cs_item_sk
Page | 57 
having sum(cs_ext_list_price)>2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)), 
cross_sales as 
(select i_product_name product_name 
,i_item_sk item_sk 
,s_store_name store_name 
,s_zip store_zip 
,ad1.ca_street_number b_street_number 
,ad1.ca_street_name b_streen_name 
,ad1.ca_city b_city 
,ad1.ca_zip b_zip 
,ad2.ca_street_number c_street_number 
,ad2.ca_street_name c_street_name 
,ad2.ca_city c_city 
,ad2.ca_zip c_zip 
,d1.d_year as syear 
,d2.d_year as fsyear 
,d3.d_year s2year 
,count(*) cnt 
,sum(ss_wholesale_cost) s1 
,sum(ss_list_price) s2 
,sum(ss_coupon_amt) s3 
FROM store_sales 
,store_returns 
,cs_ui 
,date_dim d1 
,date_dim d2 
,date_dim d3 
,store 
,customer 
,customer_demographics cd1 
,customer_demographics cd2 
,promotion 
,household_demographics hd1 
,household_demographics hd2 
,customer_address ad1 
,customer_address ad2 
,income_band ib1 
,income_band ib2 
,item 
WHERE ss_store_sk = s_store_sk AND 
ss_sold_date_sk = d1.d_date_sk AND 
ss_customer_sk = c_customer_sk AND 
ss_cdemo_sk= cd1.cd_demo_sk AND 
ss_hdemo_sk = hd1.hd_demo_sk AND 
ss_addr_sk = ad1.ca_address_sk and 
ss_item_sk = i_item_sk and 
ss_item_sk = sr_item_sk and 
ss_ticket_number = sr_ticket_number and 
ss_item_sk = cs_ui.cs_item_sk and 
c_current_cdemo_sk = cd2.cd_demo_sk AND 
c_current_hdemo_sk = hd2.hd_demo_sk AND 
c_current_addr_sk = ad2.ca_address_sk and 
c_first_sales_date_sk = d2.d_date_sk and 
c_first_shipto_date_sk = d3.d_date_sk and 
ss_promo_sk = p_promo_sk and 
hd1.hd_income_band_sk = ib1.ib_income_band_sk and 
hd2.hd_income_band_sk = ib2.ib_income_band_sk and 
cd1.cd_marital_status <> cd2.cd_marital_status and 
i_color in ('pink','turquoise','peach','powder','floral','bisque') and 
i_current_price between 51 and 51 + 10 and 
i_current_price between 51 + 1 and 51 + 15 
group by i_product_name 
,i_item_sk 
,s_store_name 
,s_zip 
,ad1.ca_street_number 
,ad1.ca_street_name 
,ad1.ca_city 
,ad1.ca_zip 
,ad2.ca_street_number 
,ad2.ca_street_name 
,ad2.ca_city 
,ad2.ca_zip 
,d1.d_year 
,d2.d_year 
,d3.d_year 
) 
select cs1.product_name 
,cs1.store_name 
,cs1.store_zip 
,cs1.b_street_number 
,cs1.b_streen_name 
,cs1.b_city 
,cs1.b_zip 
,cs1.c_street_number 
,cs1.c_street_name 
,cs1.c_city
Page | 58 
,cs1.c_zip 
,cs1.syear 
,cs1.cnt 
,cs1.s1 
,cs1.s2 
,cs1.s3 
,cs2.s1 
,cs2.s2 
,cs2.s3 
,cs2.syear 
,cs2.cnt 
from cross_sales cs1,cross_sales cs2 
where cs1.item_sk=cs2.item_sk and 
cs1.syear = 1999 and 
cs2.syear = 1999 + 1 and 
cs2.cnt <= cs1.cnt and 
cs1.store_name = cs2.store_name and 
cs1.store_zip = cs2.store_zip 
order by cs1.product_name 
,cs1.store_name 
,cs2.cnt; 
-- end query 20 in stream 0 using template query64.tpl 
-- start query 21 in stream 0 using template query36.tpl and seed 636400895 
select 
sum(ss_net_profit)/sum(ss_ext_sales_price) as gross_margin 
,i_category 
,i_class 
,grouping(i_category)+grouping(i_class) as lochierarchy 
,rank() over ( 
partition by grouping(i_category)+grouping(i_class), 
case when grouping(i_class) = 0 then i_category end 
order by sum(ss_net_profit)/sum(ss_ext_sales_price) asc) as rank_within_parent 
from 
store_sales 
,date_dim d1 
,item 
,store 
where 
d1.d_year = 1998 
and d1.d_date_sk = ss_sold_date_sk 
and i_item_sk = ss_item_sk 
and s_store_sk = ss_store_sk 
and s_state in ('GA','LA','MI','MO', 
'FL','OK','KS','WV') 
group by rollup(i_category,i_class) 
order by 
lochierarchy desc 
,case when lochierarchy = 0 then i_category end 
,rank_within_parent 
fetch first 100 rows only; 
-- end query 21 in stream 0 using template query36.tpl 
-- start query 22 in stream 0 using template query33.tpl and seed 961967872 
with ss as ( 
select 
i_manufact_id,sum(ss_ext_sales_price) total_sales 
from 
store_sales, 
date_dim, 
customer_address, 
item 
where 
i_manufact_id in (select 
i_manufact_id 
from 
item 
where i_category in ('Books')) 
and ss_item_sk = i_item_sk 
and ss_sold_date_sk = d_date_sk 
and d_year = 2000 
and d_moy = 4 
and ss_addr_sk = ca_address_sk 
and ca_gmt_offset = -6 
group by i_manufact_id), 
cs as ( 
select 
i_manufact_id,sum(cs_ext_sales_price) total_sales 
from 
catalog_sales, 
date_dim, 
customer_address, 
item 
where 
i_manufact_id in (select
Page | 59 
i_manufact_id 
from 
item 
where i_category in ('Books')) 
and cs_item_sk = i_item_sk 
and cs_sold_date_sk = d_date_sk 
and d_year = 2000 
and d_moy = 4 
and cs_bill_addr_sk = ca_address_sk 
and ca_gmt_offset = -6 
group by i_manufact_id), 
ws as ( 
select 
i_manufact_id,sum(ws_ext_sales_price) total_sales 
from 
web_sales, 
date_dim, 
customer_address, 
item 
where 
i_manufact_id in (select 
i_manufact_id 
from 
item 
where i_category in ('Books')) 
and ws_item_sk = i_item_sk 
and ws_sold_date_sk = d_date_sk 
and d_year = 2000 
and d_moy = 4 
and ws_bill_addr_sk = ca_address_sk 
and ca_gmt_offset = -6 
group by i_manufact_id) 
select i_manufact_id ,sum(total_sales) total_sales 
from (select * from ss 
union all 
select * from cs 
union all 
select * from ws) tmp1 
group by i_manufact_id 
order by total_sales 
fetch first 100 rows only; 
-- end query 22 in stream 0 using template query33.tpl 
-- start query 23 in stream 0 using template query46.tpl and seed 473844672 
select c_last_name 
,c_first_name 
,ca_city 
,bought_city 
,ss_ticket_number 
,amt,profit 
from 
(select ss_ticket_number 
,ss_customer_sk 
,ca_city bought_city 
,sum(ss_coupon_amt) amt 
,sum(ss_net_profit) profit 
from store_sales,date_dim,store,household_demographics, customer_address 
where store_sales.ss_sold_date_sk = date_dim.d_date_sk 
and store_sales.ss_store_sk = store.s_store_sk 
and store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk 
and store_sales.ss_addr_sk = customer_address.ca_address_sk 
and (household_demographics.hd_dep_count = 2 or 
household_demographics.hd_vehicle_count= 0) 
and date_dim.d_dow in (6,0) 
and date_dim.d_year in (1999,1999+1,1999+2) 
and store.s_city in ('Needmore','Henderson','Washington','Monroe','Hamilton') 
group by ss_ticket_number,ss_customer_sk,ss_addr_sk,ca_city) dn,customer,customer_address current_addr 
where ss_customer_sk = c_customer_sk 
and customer.c_current_addr_sk = current_addr.ca_address_sk 
and current_addr.ca_city <> bought_city 
order by c_last_name 
,c_first_name 
,ca_city 
,bought_city 
,ss_ticket_number 
fetch first 100 rows only; 
-- end query 23 in stream 0 using template query46.tpl 
-- start query 24 in stream 0 using template query62.tpl and seed 168156768 
select 
substr(w_warehouse_name,1,20) 
,sm_type
Page | 60 
,web_name 
,sum(case when (ws_ship_date_sk - ws_sold_date_sk <= 30 ) then 1 else 0 end) as "30 days" 
,sum(case when (ws_ship_date_sk - ws_sold_date_sk > 30) and 
(ws_ship_date_sk - ws_sold_date_sk <= 60) then 1 else 0 end ) as "31-60 days" 
,sum(case when (ws_ship_date_sk - ws_sold_date_sk > 60) and 
(ws_ship_date_sk - ws_sold_date_sk <= 90) then 1 else 0 end) as "61- 90 days" 
,sum(case when (ws_ship_date_sk - ws_sold_date_sk > 90) and 
(ws_ship_date_sk - ws_sold_date_sk <= 120) then 1 else 0 end) as "91-120 days" 
,sum(case when (ws_ship_date_sk - ws_sold_date_sk > 120) then 1 else 0 end) as ">120 days" 
from 
web_sales 
,warehouse 
,ship_mode 
,web_site 
,date_dim 
where 
d_month_seq between 1178 and 1178 + 11 
and ws_ship_date_sk = d_date_sk 
and ws_warehouse_sk = w_warehouse_sk 
and ws_ship_mode_sk = sm_ship_mode_sk 
and ws_web_site_sk = web_site_sk 
group by 
substr(w_warehouse_name,1,20) 
,sm_type 
,web_name 
order by substr(w_warehouse_name,1,20) 
,sm_type 
,web_name 
fetch first 100 rows only; 
-- end query 24 in stream 0 using template query62.tpl 
-- start query 25 in stream 0 using template query16.tpl and seed 357537407 
select 
count(distinct cs_order_number) as "order count" 
,sum(cs_ext_ship_cost) as "total shipping cost" 
,sum(cs_net_profit) as "total net profit" 
from 
catalog_sales cs1 
,date_dim 
,customer_address 
,call_center 
where 
d_date between '2002-4-01' and 
(cast('2002-4-01' as date) + 60 days) 
and cs1.cs_ship_date_sk = d_date_sk 
and cs1.cs_ship_addr_sk = ca_address_sk 
and ca_state = 'TX' 
and cs1.cs_call_center_sk = cc_call_center_sk 
and cc_county in ('Franklin Parish','Levy County','Sierra County','Gage County', 
'Gogebic County' 
) 
and exists (select * 
from catalog_sales cs2 
where cs1.cs_order_number = cs2.cs_order_number 
and cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk) 
and not exists(select * 
from catalog_returns cr1 
where cs1.cs_order_number = cr1.cr_order_number) 
order by count(distinct cs_order_number) 
fetch first 100 rows only; 
-- end query 25 in stream 0 using template query16.tpl 
-- start query 26 in stream 0 using template query10.tpl and seed 772495384 
select 
cd_gender, 
cd_marital_status, 
cd_education_status, 
count(*) cnt1, 
cd_purchase_estimate, 
count(*) cnt2, 
cd_credit_rating, 
count(*) cnt3, 
cd_dep_count, 
count(*) cnt4, 
cd_dep_employed_count, 
count(*) cnt5, 
cd_dep_college_count, 
count(*) cnt6 
from
Page | 61 
customer c,customer_address ca,customer_demographics 
where 
c.c_current_addr_sk = ca.ca_address_sk and 
ca_county in ('Knox County','Twin Falls County','Chautauqua County','Trousdale County','Lawrence County') and 
cd_demo_sk = c.c_current_cdemo_sk and 
exists (select * 
from store_sales,date_dim 
where c.c_customer_sk = ss_customer_sk and 
ss_sold_date_sk = d_date_sk and 
d_year = 2002 and 
d_moy between 3 and 3+3) and 
(exists (select * 
from web_sales,date_dim 
where c.c_customer_sk = ws_bill_customer_sk and 
ws_sold_date_sk = d_date_sk and 
d_year = 2002 and 
d_moy between 3 ANd 3+3) or 
exists (select * 
from catalog_sales,date_dim 
where c.c_customer_sk = cs_ship_customer_sk and 
cs_sold_date_sk = d_date_sk and 
d_year = 2002 and 
d_moy between 3 and 3+3)) 
group by cd_gender, 
cd_marital_status, 
cd_education_status, 
cd_purchase_estimate, 
cd_credit_rating, 
cd_dep_count, 
cd_dep_employed_count, 
cd_dep_college_count 
order by cd_gender, 
cd_marital_status, 
cd_education_status, 
cd_purchase_estimate, 
cd_credit_rating, 
cd_dep_count, 
cd_dep_employed_count, 
cd_dep_college_count 
fetch first 100 rows only; 
-- end query 26 in stream 0 using template query10.tpl 
-- start query 27 in stream 0 using template query63.tpl and seed 348248518 
select * 
from (select i_manager_id 
,sum(ss_sales_price) sum_sales 
,avg(sum(ss_sales_price)) over (partition by i_manager_id) avg_monthly_sales 
from item 
,store_sales 
,date_dim 
,store 
where ss_item_sk = i_item_sk 
and ss_sold_date_sk = d_date_sk 
and ss_store_sk = s_store_sk 
and d_month_seq in (1186,1186+1,1186+2,1186+3,1186+4,1186+5,1186+6,1186+7,1186+8,1186+9,1186+10,1186+11) 
and (( i_category in ('Books','Children','Electronics') 
and i_class in ('personal','portable','refernece','self-help') 
and i_brand in ('scholaramalgamalg #14','scholaramalgamalg #7', 
'exportiunivamalg #9','scholaramalgamalg #9')) 
or( i_category in ('Women','Music','Men') 
and i_class in ('accessories','classical','fragrances','pants') 
and i_brand in ('amalgimporto #1','edu packscholar #1','exportiimporto #1', 
'importoamalg #1'))) 
group by i_manager_id, d_moy) tmp1 
where case when avg_monthly_sales > 0 then abs (sum_sales - avg_monthly_sales) / avg_monthly_sales else null end > 0.1 
order by i_manager_id 
,avg_monthly_sales 
,sum_sales 
fetch first 100 rows only; 
-- end query 27 in stream 0 using template query63.tpl 
-- start query 28 in stream 0 using template query69.tpl and seed 1875439018 
select 
cd_gender, 
cd_marital_status, 
cd_education_status, 
count(*) cnt1, 
cd_purchase_estimate, 
count(*) cnt2,
Page | 62 
cd_credit_rating, 
count(*) cnt3 
from 
customer c,customer_address ca,customer_demographics 
where 
c.c_current_addr_sk = ca.ca_address_sk and 
ca_state in ('VA','PA','LA') and 
cd_demo_sk = c.c_current_cdemo_sk and 
exists (select * 
from store_sales,date_dim 
where c.c_customer_sk = ss_customer_sk and 
ss_sold_date_sk = d_date_sk and 
d_year = 2004 and 
d_moy between 3 and 3+2) and 
(not exists (select * 
from web_sales,date_dim 
where c.c_customer_sk = ws_bill_customer_sk and 
ws_sold_date_sk = d_date_sk and 
d_year = 2004 and 
d_moy between 3 and 3+2) and 
not exists (select * 
from catalog_sales,date_dim 
where c.c_customer_sk = cs_ship_customer_sk and 
cs_sold_date_sk = d_date_sk and 
d_year = 2004 and 
d_moy between 3 and 3+2)) 
group by cd_gender, 
cd_marital_status, 
cd_education_status, 
cd_purchase_estimate, 
cd_credit_rating 
order by cd_gender, 
cd_marital_status, 
cd_education_status, 
cd_purchase_estimate, 
cd_credit_rating 
fetch first 100 rows only; 
-- end query 28 in stream 0 using template query69.tpl 
-- start query 29 in stream 0 using template query60.tpl and seed 43130453 
with ss as ( 
select 
i_item_id,sum(ss_ext_sales_price) total_sales 
from 
store_sales, 
date_dim, 
customer_address, 
item 
where 
i_item_id in (select 
i_item_id 
from 
item 
where i_category in ('Music')) 
and ss_item_sk = i_item_sk 
and ss_sold_date_sk = d_date_sk 
and d_year = 2001 
and d_moy = 10 
and ss_addr_sk = ca_address_sk 
and ca_gmt_offset = -6 
group by i_item_id), 
cs as ( 
select 
i_item_id,sum(cs_ext_sales_price) total_sales 
from 
catalog_sales, 
date_dim, 
customer_address, 
item 
where 
i_item_id in (select 
i_item_id 
from 
item 
where i_category in ('Music')) 
and cs_item_sk = i_item_sk 
and cs_sold_date_sk = d_date_sk 
and d_year = 2001 
and d_moy = 10 
and cs_bill_addr_sk = ca_address_sk 
and ca_gmt_offset = -6 
group by i_item_id), 
ws as ( 
select 
i_item_id,sum(ws_ext_sales_price) total_sales 
from
Page | 63 
web_sales, 
date_dim, 
customer_address, 
item 
where 
i_item_id in (select 
i_item_id 
from 
item 
where i_category in ('Music')) 
and ws_item_sk = i_item_sk 
and ws_sold_date_sk = d_date_sk 
and d_year = 2001 
and d_moy = 10 
and ws_bill_addr_sk = ca_address_sk 
and ca_gmt_offset = -6 
group by i_item_id) 
select 
i_item_id 
,sum(total_sales) total_sales 
from (select * from ss 
union all 
select * from cs 
union all 
select * from ws) tmp1 
group by i_item_id 
order by i_item_id 
,total_sales 
fetch first 100 rows only; 
-- end query 29 in stream 0 using template query60.tpl 
-- start query 30 in stream 0 using template query59.tpl and seed 835871049 
with wss as 
(select d_week_seq, 
ss_store_sk, 
sum(case when (d_day_name='Sunday') then ss_sales_price else null end) sun_sales, 
sum(case when (d_day_name='Monday') then ss_sales_price else null end) mon_sales, 
sum(case when (d_day_name='Tuesday') then ss_sales_price else null end) tue_sales, 
sum(case when (d_day_name='Wednesday') then ss_sales_price else null end) wed_sales, 
sum(case when (d_day_name='Thursday') then ss_sales_price else null end) thu_sales, 
sum(case when (d_day_name='Friday') then ss_sales_price else null end) fri_sales, 
sum(case when (d_day_name='Saturday') then ss_sales_price else null end) sat_sales 
from store_sales,date_dim 
where d_date_sk = ss_sold_date_sk 
group by d_week_seq,ss_store_sk 
) 
select s_store_name1,s_store_id1,d_week_seq1 
,sun_sales1/sun_sales2,mon_sales1/mon_sales2 
,tue_sales1/tue_sales1,wed_sales1/wed_sales2,thu_sales1/thu_sales2 
,fri_sales1/fri_sales2,sat_sales1/sat_sales2 
from 
(select s_store_name s_store_name1,wss.d_week_seq d_week_seq1 
,s_store_id s_store_id1,sun_sales sun_sales1 
,mon_sales mon_sales1,tue_sales tue_sales1 
,wed_sales wed_sales1,thu_sales thu_sales1 
,fri_sales fri_sales1,sat_sales sat_sales1 
from wss,store,date_dim d 
where d.d_week_seq = wss.d_week_seq and 
ss_store_sk = s_store_sk and 
d_month_seq between 1176 and 1176 + 11) y, 
(select s_store_name s_store_name2,wss.d_week_seq d_week_seq2 
,s_store_id s_store_id2,sun_sales sun_sales2 
,mon_sales mon_sales2,tue_sales tue_sales2 
,wed_sales wed_sales2,thu_sales thu_sales2 
,fri_sales fri_sales2,sat_sales sat_sales2 
from wss,store,date_dim d 
where d.d_week_seq = wss.d_week_seq and 
ss_store_sk = s_store_sk and 
d_month_seq between 1176+ 12 and 1176 + 23) x 
where s_store_id1=s_store_id2 
and d_week_seq1=d_week_seq2-52 
order by s_store_name1,s_store_id1,d_week_seq1 
fetch first 100 rows only; 
-- end query 30 in stream 0 using template query59.tpl 
-- start query 31 in stream 0 using template query37.tpl and seed 606286915 
select i_item_id 
,i_item_desc 
,i_current_price 
from item, inventory, date_dim, catalog_sales 
where i_current_price between 29 and 29 + 30
Page | 64 
and inv_item_sk = i_item_sk 
and d_date_sk=inv_date_sk 
and d_date between cast('1999-02-22' as date) and (cast('1999-02-22' as date) + 60 days) 
and i_manufact_id in (701,681,796,684) 
and inv_quantity_on_hand between 100 and 500 
and cs_item_sk = i_item_sk 
group by i_item_id,i_item_desc,i_current_price 
order by i_item_id 
fetch first 100 rows only; 
-- end query 31 in stream 0 using template query37.tpl 
-- start query 32 in stream 0 using template query98.tpl and seed 1764313172 
select i_item_desc 
,i_category 
,i_class 
,i_current_price 
,sum(ss_ext_sales_price) as itemrevenue 
,sum(ss_ext_sales_price)*100/sum(sum(ss_ext_ sales_price)) over 
(partition by i_class) as revenueratio 
from 
store_sales 
,item 
,date_dim 
where 
ss_item_sk = i_item_sk 
and i_category in ('Women', 'Shoes', 'Jewelry') 
and ss_sold_date_sk = d_date_sk 
and d_date between cast('2001-02-15' as date) 
and (cast('2001-02- 15' as date) + 30 days) 
group by 
i_item_id 
,i_item_desc 
,i_category 
,i_class 
,i_current_price 
order by 
i_category 
,i_class 
,i_item_id 
,i_item_desc 
,revenueratio; 
-- end query 32 in stream 0 using template query98.tpl 
-- start query 33 in stream 0 using template query85.tpl and seed 1218061953 
select substr(r_reason_desc,1,20) 
,avg(ws_quantity) 
,avg(wr_refunded_cash) 
,avg(wr_fee) 
from web_sales, web_returns, web_page, customer_demographics cd1, 
customer_demographics cd2, customer_address, date_dim, reason 
where ws_web_page_sk = wp_web_page_sk 
and ws_item_sk = wr_item_sk 
and ws_order_number = wr_order_number 
and ws_sold_date_sk = d_date_sk and d_year = 2001 
and cd1.cd_demo_sk = wr_refunded_cdemo_sk 
and cd2.cd_demo_sk = wr_returning_cdemo_sk 
and ca_address_sk = wr_refunded_addr_sk 
and r_reason_sk = wr_reason_sk 
and 
( 
( 
cd1.cd_marital_status = 'U' 
and 
cd1.cd_marital_status = cd2.cd_marital_status 
and 
cd1.cd_education_status = '2 yr Degree' 
and 
cd1.cd_education_status = cd2.cd_education_status 
and 
ws_sales_price between 100.00 and 150.00 
) 
or 
( 
cd1.cd_marital_status = 'W' 
and 
cd1.cd_marital_status = cd2.cd_marital_status 
and 
cd1.cd_education_status = 'College' 
and 
cd1.cd_education_status = cd2.cd_education_status 
and 
ws_sales_price between 50.00 and 100.00 
) 
or
Page | 65 
( 
cd1.cd_marital_status = 'S' 
and 
cd1.cd_marital_status = cd2.cd_marital_status 
and 
cd1.cd_education_status = 'Secondary' 
and 
cd1.cd_education_status = cd2.cd_education_status 
and 
ws_sales_price between 150.00 and 200.00 
) 
) 
and 
( 
( 
ca_country = 'United States' 
and 
ca_state in ('VA', 'KS', 'ND') 
and ws_net_profit between 100 and 200 
) 
or 
( 
ca_country = 'United States' 
and 
ca_state in ('SD', 'KY', 'GA') 
and ws_net_profit between 150 and 300 
) 
or 
( 
ca_country = 'United States' 
and 
ca_state in ('TX', 'AL', 'WA') 
and ws_net_profit between 50 and 250 
) 
) 
group by r_reason_desc 
order by substr(r_reason_desc,1,20) 
,avg(ws_quantity) 
,avg(wr_refunded_cash) 
,avg(wr_fee) 
fetch first 100 rows only; 
-- end query 33 in stream 0 using template query85.tpl 
-- start query 34 in stream 0 using template query70.tpl and seed 255476072 
select 
sum(ss_net_profit) as total_sum 
,s_state 
,s_county 
,grouping(s_state)+grouping(s_county) as lochierarchy 
,rank() over ( 
partition by grouping(s_state)+grouping(s_county), 
case when grouping(s_county) = 0 then s_state end 
order by sum(ss_net_profit) desc) as rank_within_parent 
from 
store_sales 
,date_dim d1 
,store 
where 
d1.d_month_seq between 1191 and 1191+11 
and d1.d_date_sk = ss_sold_date_sk 
and s_store_sk = ss_store_sk 
and s_state in 
( select s_state 
from (select s_state as s_state, 
rank() over ( partition by s_state order by sum(ss_net_profit) desc) as ranking 
from store_sales, store, date_dim 
where d_month_seq between 1191 and 1191+11 
and d_date_sk = ss_sold_date_sk 
and s_store_sk = ss_store_sk 
group by s_state 
) tmp1 
where ranking <= 5 
) 
group by rollup(s_state,s_county) 
order by 
lochierarchy desc 
,case when lochierarchy = 0 then s_state end 
,rank_within_parent 
fetch first 100 rows only; 
-- end query 34 in stream 0 using template query70.tpl 
-- start query 35 in stream 0 using template query67.tpl and seed 932833149 
select *
Page | 66 
from (select i_category 
,i_class 
,i_brand 
,i_product_name 
,d_year 
,d_qoy 
,d_moy 
,s_store_id 
,sumsales 
,rank() over (partition by i_category order by sumsales desc) rk 
from (select i_category 
,i_class 
,i_brand 
,i_product_name 
,d_year 
,d_qoy 
,d_moy 
,s_store_id 
,sum(coalesce(ss_sales_price*ss_ quantity,0)) sumsales 
from store_sales 
,date_dim 
,store 
,item 
where ss_sold_date_sk=d_date_sk 
and ss_item_sk=i_item_sk 
and ss_store_sk = s_store_sk 
and d_month_seq between 1214 and 1214+11 
group by rollup(i_category, i_class, i_brand, i_product_name, d_year, d_qoy, d_moy,s_store_id))dw1) dw2 
where rk <= 100 
order by i_category 
,i_class 
,i_brand 
,i_product_name 
,d_year 
,d_qoy 
,d_moy 
,s_store_id 
,sumsales 
,rk 
fetch first 100 rows only; 
-- end query 35 in stream 0 using template query67.tpl 
-- start query 36 in stream 0 using template query28.tpl and seed 1281706826 
select * 
from (select avg(ss_list_price) B1_LP 
,count(ss_list_price) B1_CNT 
,count(distinct ss_list_price) B1_CNTD 
from store_sales 
where ss_quantity between 0 and 5 
and (ss_list_price between 57 and 57+10 
or ss_coupon_amt between 11751 and 11751+1000 
or ss_wholesale_cost between 4 and 4+20)) B1, 
(select avg(ss_list_price) B2_LP 
,count(ss_list_price) B2_CNT 
,count(distinct ss_list_price) B2_CNTD 
from store_sales 
where ss_quantity between 6 and 10 
and (ss_list_price between 166 and 166+10 
or ss_coupon_amt between 7402 and 7402+1000 
or ss_wholesale_cost between 38 and 38+20)) B2, 
(select avg(ss_list_price) B3_LP 
,count(ss_list_price) B3_CNT 
,count(distinct ss_list_price) B3_CNTD 
from store_sales 
where ss_quantity between 11 and 15 
and (ss_list_price between 8 and 8+10 
or ss_coupon_amt between 5854 and 5854+1000 
or ss_wholesale_cost between 67 and 67+20)) B3, 
(select avg(ss_list_price) B4_LP 
,count(ss_list_price) B4_CNT 
,count(distinct ss_list_price) B4_CNTD 
from store_sales 
where ss_quantity between 16 and 20 
and (ss_list_price between 103 and 103+10 
or ss_coupon_amt between 5165 and 5165+1000 
or ss_wholesale_cost between 14 and 14+20)) B4, 
(select avg(ss_list_price) B5_LP 
,count(ss_list_price) B5_CNT 
,count(distinct ss_list_price) B5_CNTD 
from store_sales 
where ss_quantity between 21 and 25 
and (ss_list_price between 137 and 137+10 
or ss_coupon_amt between 12978 and 12978+1000
Page | 67 
or ss_wholesale_cost between 30 and 30+20)) B5, 
(select avg(ss_list_price) B6_LP 
,count(ss_list_price) B6_CNT 
,count(distinct ss_list_price) B6_CNTD 
from store_sales 
where ss_quantity between 26 and 30 
and (ss_list_price between 10 and 10+10 
or ss_coupon_amt between 5270 and 5270+1000 
or ss_wholesale_cost between 79 and 79+20)) B6 
fetch first 100 rows only; 
-- end query 36 in stream 0 using template query28.tpl 
-- start query 37 in stream 0 using template query81.tpl and seed 2018716314 
with customer_total_return as 
(select cr_returning_customer_sk as ctr_customer_sk 
,ca_state as ctr_state, 
sum(cr_return_amt_inc_tax) as ctr_total_return 
from catalog_returns 
,date_dim 
,customer_address 
where cr_returned_date_sk = d_date_sk 
and d_year =1998 
and cr_returning_addr_sk = ca_address_sk 
group by cr_returning_customer_sk 
,ca_state ) 
select c_customer_id,c_salutation,c_first_name,c_last_name,ca_street_number,ca_street_name 
,ca_street_type,ca_suite_number,ca_city,ca_county,ca_state,ca_zip,ca_country,ca_gmt_offset 
,ca_location_type,ctr_total_return 
from customer_total_return ctr1 
,customer_address 
,customer 
where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2 
from customer_total_return ctr2 
where ctr1.ctr_state = ctr2.ctr_state) 
and ca_address_sk = c_current_addr_sk 
and ca_state = 'GA' 
and ctr1.ctr_customer_sk = c_customer_sk 
order by c_customer_id,c_salutation,c_first_name,c_last_name,ca_street_number,ca_street_name 
,ca_street_type,ca_suite_number,ca_city,ca_county,ca_state,ca_zip,ca_country,ca_gmt_offset 
,ca_location_type,ctr_total_return 
fetch first 100 rows only; 
-- end query 37 in stream 0 using template query81.tpl 
-- start query 38 in stream 0 using template query97.tpl and seed 1786889920 
with ssci as ( 
select ss_customer_sk customer_sk 
,ss_item_sk item_sk 
from store_sales,date_dim 
where ss_sold_date_sk = d_date_sk 
and d_month_seq between 1204 and 1204 + 11 
group by ss_customer_sk 
,ss_item_sk), 
csci as( 
select cs_bill_customer_sk customer_sk 
,cs_item_sk item_sk 
from catalog_sales,date_dim 
where cs_sold_date_sk = d_date_sk 
and d_month_seq between 1204 and 1204 + 11 
group by cs_bill_customer_sk 
,cs_item_sk) 
select sum(case when ssci.customer_sk is not null and csci.customer_sk is null then cast(1 as bigint) else cast(0 as bigint) end) store_only 
,sum(case when ssci.customer_sk is null and csci.customer_sk is not null then cast(1 as bigint) else cast(0 as bigint) end) catalog_only 
,sum(case when ssci.customer_sk is not null and csci.customer_sk is not null then cast(1 as bigint) else cast(0 as bigint) end) store_and_catalog 
from ssci full outer join csci on (ssci.customer_sk=csci.customer_sk 
and ssci.item_sk = csci.item_sk) 
fetch first 100 rows only; 
-- end query 38 in stream 0 using template query97.tpl 
-- start query 39 in stream 0 using template query66.tpl and seed 486724873 
select 
w_warehouse_name 
,w_warehouse_sq_ft
Page | 68 
,w_city 
,w_county 
,w_state 
,w_country 
,ship_carriers 
,year 
,sum(jan_sales) as jan_sales 
,sum(feb_sales) as feb_sales 
,sum(mar_sales) as mar_sales 
,sum(apr_sales) as apr_sales 
,sum(may_sales) as may_sales 
,sum(jun_sales) as jun_sales 
,sum(jul_sales) as jul_sales 
,sum(aug_sales) as aug_sales 
,sum(sep_sales) as sep_sales 
,sum(oct_sales) as oct_sales 
,sum(nov_sales) as nov_sales 
,sum(dec_sales) as dec_sales 
,sum(jan_sales/w_warehouse_sq_ft) as jan_sales_per_sq_foot 
,sum(feb_sales/w_warehouse_sq_ft) as feb_sales_per_sq_foot 
,sum(mar_sales/w_warehouse_sq_ft) as mar_sales_per_sq_foot 
,sum(apr_sales/w_warehouse_sq_ft) as apr_sales_per_sq_foot 
,sum(may_sales/w_warehouse_sq_ft) as may_sales_per_sq_foot 
,sum(jun_sales/w_warehouse_sq_ft) as jun_sales_per_sq_foot 
,sum(jul_sales/w_warehouse_sq_ft) as jul_sales_per_sq_foot 
,sum(aug_sales/w_warehouse_sq_ft) as aug_sales_per_sq_foot 
,sum(sep_sales/w_warehouse_sq_ft) as sep_sales_per_sq_foot 
,sum(oct_sales/w_warehouse_sq_ft) as oct_sales_per_sq_foot 
,sum(nov_sales/w_warehouse_sq_ft) as nov_sales_per_sq_foot 
,sum(dec_sales/w_warehouse_sq_ft) as dec_sales_per_sq_foot 
,sum(jan_net) as jan_net 
,sum(feb_net) as feb_net 
,sum(mar_net) as mar_net 
,sum(apr_net) as apr_net 
,sum(may_net) as may_net 
,sum(jun_net) as jun_net 
,sum(jul_net) as jul_net 
,sum(aug_net) as aug_net 
,sum(sep_net) as sep_net 
,sum(oct_net) as oct_net 
,sum(nov_net) as nov_net 
,sum(dec_net) as dec_net 
from ( 
(select 
w_warehouse_name 
,w_warehouse_sq_ft 
,w_city 
,w_county 
,w_state 
,w_country 
,'LATVIAN' || ',' || 'BARIAN' as ship_carriers 
,d_year as year 
,sum(case when d_moy = 1 
then ws_ext_sales_price* ws_quantity else 0 end) as jan_sales 
,sum(case when d_moy = 2 
then ws_ext_sales_price* ws_quantity else 0 end) as feb_sales 
,sum(case when d_moy = 3 
then ws_ext_sales_price* ws_quantity else 0 end) as mar_sales 
,sum(case when d_moy = 4 
then ws_ext_sales_price* ws_quantity else 0 end) as apr_sales 
,sum(case when d_moy = 5 
then ws_ext_sales_price* ws_quantity else 0 end) as may_sales 
,sum(case when d_moy = 6 
then ws_ext_sales_price* ws_quantity else 0 end) as jun_sales 
,sum(case when d_moy = 7 
then ws_ext_sales_price* ws_quantity else 0 end) as jul_sales 
,sum(case when d_moy = 8 
then ws_ext_sales_price* ws_quantity else 0 end) as aug_sales 
,sum(case when d_moy = 9 
then ws_ext_sales_price* ws_quantity else 0 end) as sep_sales 
,sum(case when d_moy = 10 
then ws_ext_sales_price* ws_quantity else 0 end) as oct_sales 
,sum(case when d_moy = 11 
then ws_ext_sales_price* ws_quantity else 0 end) as nov_sales 
,sum(case when d_moy = 12 
then ws_ext_sales_price* ws_quantity else 0 end) as dec_sales 
,sum(case when d_moy = 1
Page | 69 
then ws_net_paid_inc_tax * ws_quantity else 0 end) as jan_net 
,sum(case when d_moy = 2 
then ws_net_paid_inc_tax * ws_quantity else 0 end) as feb_net 
,sum(case when d_moy = 3 
then ws_net_paid_inc_tax * ws_quantity else 0 end) as mar_net 
,sum(case when d_moy = 4 
then ws_net_paid_inc_tax * ws_quantity else 0 end) as apr_net 
,sum(case when d_moy = 5 
then ws_net_paid_inc_tax * ws_quantity else 0 end) as may_net 
,sum(case when d_moy = 6 
then ws_net_paid_inc_tax * ws_quantity else 0 end) as jun_net 
,sum(case when d_moy = 7 
then ws_net_paid_inc_tax * ws_quantity else 0 end) as jul_net 
,sum(case when d_moy = 8 
then ws_net_paid_inc_tax * ws_quantity else 0 end) as aug_net 
,sum(case when d_moy = 9 
then ws_net_paid_inc_tax * ws_quantity else 0 end) as sep_net 
,sum(case when d_moy = 10 
then ws_net_paid_inc_tax * ws_quantity else 0 end) as oct_net 
,sum(case when d_moy = 11 
then ws_net_paid_inc_tax * ws_quantity else 0 end) as nov_net 
,sum(case when d_moy = 12 
then ws_net_paid_inc_tax * ws_quantity else 0 end) as dec_net 
from 
web_sales 
,warehouse 
,date_dim 
,time_dim 
,ship_mode 
where 
ws_warehouse_sk = w_warehouse_sk 
and ws_sold_date_sk = d_date_sk 
and ws_sold_time_sk = t_time_sk 
and ws_ship_mode_sk = sm_ship_mode_sk 
and d_year = 2001 
and t_time between 46669 and 46669+28800 
and sm_carrier in ('LATVIAN','BARIAN') 
group by 
w_warehouse_name 
,w_warehouse_sq_ft 
,w_city 
,w_county 
,w_state 
,w_country 
,d_year 
) 
union all 
(select 
w_warehouse_name 
,w_warehouse_sq_ft 
,w_city 
,w_county 
,w_state 
,w_country 
,'LATVIAN' || ',' || 'BARIAN' as ship_carriers 
,d_year as year 
,sum(case when d_moy = 1 
then cs_sales_price* cs_quantity else 0 end) as jan_sales 
,sum(case when d_moy = 2 
then cs_sales_price* cs_quantity else 0 end) as feb_sales 
,sum(case when d_moy = 3 
then cs_sales_price* cs_quantity else 0 end) as mar_sales 
,sum(case when d_moy = 4 
then cs_sales_price* cs_quantity else 0 end) as apr_sales 
,sum(case when d_moy = 5 
then cs_sales_price* cs_quantity else 0 end) as may_sales 
,sum(case when d_moy = 6 
then cs_sales_price* cs_quantity else 0 end) as jun_sales 
,sum(case when d_moy = 7 
then cs_sales_price* cs_quantity else 0 end) as jul_sales 
,sum(case when d_moy = 8 
then cs_sales_price* cs_quantity else 0 end) as aug_sales 
,sum(case when d_moy = 9 
then cs_sales_price* cs_quantity else 0 end) as sep_sales 
,sum(case when d_moy = 10 
then cs_sales_price* cs_quantity else 0 end) as oct_sales 
,sum(case when d_moy = 11 
then cs_sales_price* cs_quantity else 0 end) as nov_sales
Page | 70 
,sum(case when d_moy = 12 
then cs_sales_price* cs_quantity else 0 end) as dec_sales 
,sum(case when d_moy = 1 
then cs_net_profit * cs_quantity else 0 end) as jan_net 
,sum(case when d_moy = 2 
then cs_net_profit * cs_quantity else 0 end) as feb_net 
,sum(case when d_moy = 3 
then cs_net_profit * cs_quantity else 0 end) as mar_net 
,sum(case when d_moy = 4 
then cs_net_profit * cs_quantity else 0 end) as apr_net 
,sum(case when d_moy = 5 
then cs_net_profit * cs_quantity else 0 end) as may_net 
,sum(case when d_moy = 6 
then cs_net_profit * cs_quantity else 0 end) as jun_net 
,sum(case when d_moy = 7 
then cs_net_profit * cs_quantity else 0 end) as jul_net 
,sum(case when d_moy = 8 
then cs_net_profit * cs_quantity else 0 end) as aug_net 
,sum(case when d_moy = 9 
then cs_net_profit * cs_quantity else 0 end) as sep_net 
,sum(case when d_moy = 10 
then cs_net_profit * cs_quantity else 0 end) as oct_net 
,sum(case when d_moy = 11 
then cs_net_profit * cs_quantity else 0 end) as nov_net 
,sum(case when d_moy = 12 
then cs_net_profit * cs_quantity else 0 end) as dec_net 
from 
catalog_sales 
,warehouse 
,date_dim 
,time_dim 
,ship_mode 
where 
cs_warehouse_sk = w_warehouse_sk 
and cs_sold_date_sk = d_date_sk 
and cs_sold_time_sk = t_time_sk 
and cs_ship_mode_sk = sm_ship_mode_sk 
and d_year = 2001 
and t_time between 46669 AND 46669+28800 
and sm_carrier in ('LATVIAN','BARIAN') 
group by 
w_warehouse_name 
,w_warehouse_sq_ft 
,w_city 
,w_county 
,w_state 
,w_country 
,d_year 
) 
) x 
group by 
w_warehouse_name 
,w_warehouse_sq_ft 
,w_city 
,w_county 
,w_state 
,w_country 
,ship_carriers 
,year 
order by w_warehouse_name 
fetch first 100 rows only; 
-- end query 39 in stream 0 using template query66.tpl 
-- start query 40 in stream 0 using template query90.tpl and seed 1038311841 
select cast(amc as decimal(15,4))/cast(pmc as decimal(15,4)) am_pm_ratio 
from ( select count(*) amc 
from web_sales, household_demographics , time_dim, web_page 
where ws_sold_time_sk = time_dim.t_time_sk 
and ws_ship_hdemo_sk = household_demographics.hd_demo_sk 
and ws_web_page_sk = web_page.wp_web_page_sk 
and time_dim.t_hour between 7 and 7+1 
and household_demographics.hd_dep_count = 1 
and web_page.wp_char_count between 5000 and 5200) at, 
( select count(*) pmc 
from web_sales, household_demographics , time_dim, web_page 
where ws_sold_time_sk = time_dim.t_time_sk 
and ws_ship_hdemo_sk = household_demographics.hd_demo_sk 
and ws_web_page_sk = web_page.wp_web_page_sk
Page | 71 
and time_dim.t_hour between 18 and 18+1 
and household_demographics.hd_dep_count = 1 
and web_page.wp_char_count between 5000 and 5200) pt 
order by am_pm_ratio 
fetch first 100 rows only; 
-- end query 40 in stream 0 using template query90.tpl 
-- start query 41 in stream 0 using template query17.tpl and seed 2078761835 
select i_item_id 
,i_item_desc 
,s_state 
,count(ss_quantity) as store_sales_quantitycount 
,avg(ss_quantity) as store_sales_quantityave 
,stddev_samp(ss_quantity) as store_sales_quantitystdev 
,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov 
,count(sr_return_quantity) as_store_returns_quantitycount 
,avg(sr_return_quantity) as_store_returns_quantityave 
,stddev_samp(sr_return_quantity) as_store_returns_quantitystdev 
,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as store_returns_quantitycov 
,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) as catalog_sales_quantityave 
,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitystdev 
,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov 
from store_sales 
,store_returns 
,catalog_sales 
,date_dim d1 
,date_dim d2 
,date_dim d3 
,store 
,item 
where d1.d_quarter_name = '1998Q1' 
and d1.d_date_sk = ss_sold_date_sk 
and i_item_sk = ss_item_sk 
and s_store_sk = ss_store_sk 
and ss_customer_sk = sr_customer_sk 
and ss_item_sk = sr_item_sk 
and ss_ticket_number = sr_ticket_number 
and sr_returned_date_sk = d2.d_date_sk 
and d2.d_quarter_name in ('1998Q1','1998Q2','1998Q3') 
and sr_customer_sk = cs_bill_customer_sk 
and sr_item_sk = cs_item_sk 
and cs_sold_date_sk = d3.d_date_sk 
and d3.d_quarter_name in ('1998Q1','1998Q2','1998Q3') 
group by i_item_id 
,i_item_desc 
,s_state 
order by i_item_id 
,i_item_desc 
,s_state 
fetch first 100 rows only; 
-- end query 41 in stream 0 using template query17.tpl 
-- start query 42 in stream 0 using template query47.tpl and seed 218857573 
with v1 as( 
select i_category, i_brand, 
s_store_name, s_company_name, 
d_year, d_moy, 
sum(ss_sales_price) sum_sales, 
avg(sum(ss_sales_price)) over 
(partition by i_category, i_brand, 
s_store_name, s_company_name, d_year) 
avg_monthly_sales, 
rank() over 
(partition by i_category, i_brand, 
s_store_name, s_company_name 
order by d_year, d_moy) rn 
from item, store_sales, date_dim, store 
where ss_item_sk = i_item_sk and 
ss_sold_date_sk = d_date_sk and 
ss_store_sk = s_store_sk and 
( 
d_year = 2000 or 
( d_year = 2000-1 and d_moy =12) or 
( d_year = 2000+1 and d_moy =1) 
) 
group by i_category, i_brand, 
s_store_name, s_company_name, 
d_year, d_moy), 
v2 as( 
select v1.s_store_name
Page | 72 
,v1.d_year 
,v1.avg_monthly_sales 
,v1.sum_sales, v1_lag.sum_sales psum, v1_lead.sum_sales nsum 
from v1, v1 v1_lag, v1 v1_lead 
where v1.i_category = v1_lag.i_category and 
v1.i_category = v1_lead.i_category and 
v1.i_brand = v1_lag.i_brand and 
v1.i_brand = v1_lead.i_brand and 
v1.s_store_name = v1_lag.s_store_name and 
v1.s_store_name = v1_lead.s_store_name and 
v1.s_company_name = v1_lag.s_company_name and 
v1.s_company_name = v1_lead.s_company_name and 
v1.rn = v1_lag.rn + 1 and 
v1.rn = v1_lead.rn - 1) 
select * 
from v2 
where d_year = 2000 and 
avg_monthly_sales > 0 and 
case when avg_monthly_sales > 0 then abs(sum_sales - avg_monthly_sales) / avg_monthly_sales else null end > 0.1 
order by sum_sales - avg_monthly_sales, 3 
fetch first 100 rows only; 
-- end query 42 in stream 0 using template query47.tpl 
-- start query 43 in stream 0 using template query95.tpl and seed 2064779767 
with ws_wh as 
(select ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2 
from web_sales ws1,web_sales ws2 
where ws1.ws_order_number = ws2.ws_order_number 
and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk) 
select 
count(distinct ws_order_number) as "order count" 
,sum(ws_ext_ship_cost) as "total shipping cost" 
,sum(ws_net_profit) as "total net profit" 
from 
web_sales ws1 
,date_dim 
,customer_address 
,web_site 
where 
d_date between '2002-4-01' and 
(cast('2002-4-01' as date) + 60 days) 
and ws1.ws_ship_date_sk = d_date_sk 
and ws1.ws_ship_addr_sk = ca_address_sk 
and ca_state = 'SC' 
and ws1.ws_web_site_sk = web_site_sk 
and web_company_name = 'pri' 
and ws1.ws_order_number in (select ws_order_number 
from ws_wh) 
and ws1.ws_order_number in (select wr_order_number 
from web_returns,ws_wh 
where wr_order_number = ws_wh.ws_order_number) 
order by count(distinct ws_order_number) 
fetch first 100 rows only; 
-- end query 43 in stream 0 using template query95.tpl 
-- start query 44 in stream 0 using template query92.tpl and seed 227248084 
select 
sum(ws_ext_discount_amt) as "Excess Discount Amount" 
from 
web_sales 
,item 
,date_dim 
where 
i_manufact_id = 85 
and i_item_sk = ws_item_sk 
and d_date between '1999-01-05' and 
(cast('1999-01-05' as date) + 90 days) 
and d_date_sk = ws_sold_date_sk 
and ws_ext_discount_amt 
> ( 
SELECT 
1.3 * avg(ws_ext_discount_amt) 
FROM 
web_sales 
,date_dim 
WHERE 
ws_item_sk = i_item_sk 
and d_date between '1999-01-05' and 
(cast('1999-01-05' as date) + 90 days) 
and d_date_sk = ws_sold_date_sk 
) 
order by sum(ws_ext_discount_amt) 
fetch first 100 rows only;
Page | 73 
-- end query 44 in stream 0 using template query92.tpl 
-- start query 45 in stream 0 using template query3.tpl and seed 1565567065 
select dt.d_year 
,item.i_brand_id brand_id 
,item.i_brand brand 
,sum(ss_sales_price) sum_agg 
from date_dim dt 
,store_sales 
,item 
where dt.d_date_sk = store_sales.ss_sold_date_sk 
and store_sales.ss_item_sk = item.i_item_sk 
and item.i_manufact_id = 423 
and dt.d_moy=11 
group by dt.d_year 
,item.i_brand 
,item.i_brand_id 
order by dt.d_year 
,sum_agg desc 
,brand_id 
fetch first 100 rows only; 
-- end query 45 in stream 0 using template query3.tpl 
-- start query 46 in stream 0 using template query51.tpl and seed 1975445669 
WITH web_v1 as ( 
select 
ws_item_sk item_sk, d_date, 
sum(sum(ws_sales_price)) 
over (partition by ws_item_sk order by d_date rows between unbounded preceding and current row) cume_sales 
from web_sales 
,date_dim 
where ws_sold_date_sk=d_date_sk 
and d_month_seq between 1188 and 1188+11 
and ws_item_sk is not NULL 
group by ws_item_sk, d_date), 
store_v1 as ( 
select 
ss_item_sk item_sk, d_date, 
sum(sum(ss_sales_price)) 
over (partition by ss_item_sk order by d_date rows between unbounded preceding and current row) cume_sales 
from store_sales 
,date_dim 
where ss_sold_date_sk=d_date_sk 
and d_month_seq between 1188 and 1188+11 
and ss_item_sk is not NULL 
group by ss_item_sk, d_date) 
select * 
from (select item_sk 
,d_date 
,web_sales 
,store_sales 
,max(web_sales) 
over (partition by item_sk order by d_date rows between unbounded preceding and current row) web_cumulative 
,max(store_sales) 
over (partition by item_sk order by d_date rows between unbounded preceding and current row) store_cumulative 
from (select case when web.item_sk is not null then web.item_sk else store.item_sk end item_sk 
,case when web.d_date is not null then web.d_date else store.d_date end d_date 
,web.cume_sales web_sales 
,store.cume_sales store_sales 
from web_v1 web full outer join store_v1 store on (web.item_sk = store.item_sk 
and web.d_date = store.d_date) 
)x )y 
where web_cumulative > store_cumulative 
order by item_sk 
,d_date 
fetch first 100 rows only; 
-- end query 46 in stream 0 using template query51.tpl 
-- start query 47 in stream 0 using template query35.tpl and seed 1491334050 
select 
ca_state, 
cd_gender, 
cd_marital_status, 
count(*) cnt1, 
sum(cd_dep_count), 
sum(cd_dep_count), 
avg(cd_dep_count), 
cd_dep_employed_count, 
count(*) cnt2, 
sum(cd_dep_employed_count), 
sum(cd_dep_employed_count), 
avg(cd_dep_employed_count),
Page | 74 
cd_dep_college_count, 
count(*) cnt3, 
sum(cd_dep_college_count), 
sum(cd_dep_college_count), 
avg(cd_dep_college_count) 
from 
customer c,customer_address ca,customer_demographics 
where 
c.c_current_addr_sk = ca.ca_address_sk and 
cd_demo_sk = c.c_current_cdemo_sk and 
exists (select * 
from store_sales,date_dim 
where c.c_customer_sk = ss_customer_sk and 
ss_sold_date_sk = d_date_sk and 
d_year = 2001 and 
d_qoy < 4) and 
(exists (select * 
from web_sales,date_dim 
where c.c_customer_sk = ws_bill_customer_sk and 
ws_sold_date_sk = d_date_sk and 
d_year = 2001 and 
d_qoy < 4) or 
exists (select * 
from catalog_sales,date_dim 
where c.c_customer_sk = cs_ship_customer_sk and 
cs_sold_date_sk = d_date_sk and 
d_year = 2001 and 
d_qoy < 4)) 
group by ca_state, 
cd_gender, 
cd_marital_status, 
cd_dep_count, 
cd_dep_employed_count, 
cd_dep_college_count 
order by ca_state, 
cd_gender, 
cd_marital_status, 
cd_dep_count, 
cd_dep_employed_count, 
cd_dep_college_count 
fetch first 100 rows only; 
-- end query 47 in stream 0 using template query35.tpl 
-- start query 48 in stream 0 using template query49.tpl and seed 686314496 
select 
'web' as channel 
,web.item 
,web.return_ratio 
,web.return_rank 
,web.currency_rank 
from ( 
select 
item 
,return_ratio 
,currency_ratio 
,rank() over (order by return_ratio) as return_rank 
,rank() over (order by currency_ratio) as currency_rank 
from 
( select ws.ws_item_sk as item 
,(cast(sum(coalesce(wr.wr_return_quantity,0)) as dec(15,4))/ 
cast(sum(coalesce(ws.ws_quantity,0)) as dec(15,4) )) as return_ratio 
,(cast(sum(coalesce(wr.wr_return_amt,0)) as dec(15,4))/ 
cast(sum(coalesce(ws.ws_net_paid,0)) as dec(15,4) )) as currency_ratio 
from 
web_sales ws left outer join web_returns wr 
on (ws.ws_order_number = wr.wr_order_number and 
ws.ws_item_sk = wr.wr_item_sk) 
,date_dim 
where 
wr.wr_return_amt > 10000 
and ws.ws_net_profit > 1 
and ws.ws_net_paid > 0 
and ws.ws_quantity > 0 
and ws_sold_date_sk = d_date_sk 
and d_year = 1999 
and d_moy = 12 
group by ws.ws_item_sk 
) in_web 
) web 
where 
(
Page | 75 
web.return_rank <= 10 
or 
web.currency_rank <= 10 
) 
union 
select 
'catalog' as channel 
,catalog.item 
,catalog.return_ratio 
,catalog.return_rank 
,catalog.currency_rank 
from ( 
select 
item 
,return_ratio 
,currency_ratio 
,rank() over (order by return_ratio) as return_rank 
,rank() over (order by currency_ratio) as currency_rank 
from 
( select 
cs.cs_item_sk as item 
,(cast(sum(coalesce(cr.cr_return_quantity,0)) as dec(15,4))/ 
cast(sum(coalesce(cs.cs_quantity,0)) as dec(15,4) )) as return_ratio 
,(cast(sum(coalesce(cr.cr_return_amount,0)) as dec(15,4))/ 
cast(sum(coalesce(cs.cs_net_paid,0)) as dec(15,4) )) as currency_ratio 
from 
catalog_sales cs left outer join catalog_returns cr 
on (cs.cs_order_number = cr.cr_order_number and 
cs.cs_item_sk = cr.cr_item_sk) 
,date_dim 
where 
cr.cr_return_amount > 10000 
and cs.cs_net_profit > 1 
and cs.cs_net_paid > 0 
and cs.cs_quantity > 0 
and cs_sold_date_sk = d_date_sk 
and d_year = 1999 
and d_moy = 12 
group by cs.cs_item_sk 
) in_cat 
) catalog 
where 
( 
catalog.return_rank <= 10 
or 
catalog.currency_rank <=10 
) 
union 
select 
'store' as channel 
,store.item 
,store.return_ratio 
,store.return_rank 
,store.currency_rank 
from ( 
select 
item 
,return_ratio 
,currency_ratio 
,rank() over (order by return_ratio) as return_rank 
,rank() over (order by currency_ratio) as currency_rank 
from 
( select sts.ss_item_sk as item 
,(cast(sum(coalesce(sr.sr_return_quantity,0)) as dec(15,4))/cast(sum(coalesce(sts.ss_quantity,0)) as dec(15,4) )) as return_ratio 
,(cast(sum(coalesce(sr.sr_return_amt,0)) as dec(15,4))/cast(sum(coalesce(sts.ss_net_paid,0)) as dec(15,4) )) as currency_ratio 
from 
store_sales sts left outer join store_returns sr 
on (sts.ss_ticket_number = sr.sr_ticket_number and sts.ss_item_sk = sr.sr_item_sk) 
,date_dim 
where 
sr.sr_return_amt > 10000 
and sts.ss_net_profit > 1 
and sts.ss_net_paid > 0 
and sts.ss_quantity > 0 
and ss_sold_date_sk = d_date_sk 
and d_year = 1999 
and d_moy = 12
Page | 76 
group by sts.ss_item_sk 
) in_store 
) store 
where ( 
store.return_rank <= 10 
or 
store.currency_rank <= 10 
) 
order by 1,4,5 
fetch first 100 rows only; 
-- end query 48 in stream 0 using template query49.tpl 
-- start query 49 in stream 0 using template query9.tpl and seed 1544661402 
select case when (select count(*) 
from store_sales 
where ss_quantity between 1 and 20) > 864784291 
then (select avg(ss_ext_list_price) 
from store_sales 
where ss_quantity between 1 and 20) 
else (select avg(ss_net_profit) 
from store_sales 
where ss_quantity between 1 and 20) end bucket1 , 
case when (select count(*) 
from store_sales 
where ss_quantity between 21 and 40) > 719640698 
then (select avg(ss_ext_list_price) 
from store_sales 
where ss_quantity between 21 and 40) 
else (select avg(ss_net_profit) 
from store_sales 
where ss_quantity between 21 and 40) end bucket2, 
case when (select count(*) 
from store_sales 
where ss_quantity between 41 and 60) > 842161476 
then (select avg(ss_ext_list_price) 
from store_sales 
where ss_quantity between 41 and 60) 
else (select avg(ss_net_profit) 
from store_sales 
where ss_quantity between 41 and 60) end bucket3, 
case when (select count(*) 
from store_sales 
where ss_quantity between 61 and 80) > 918194147 
then (select avg(ss_ext_list_price) 
from store_sales 
where ss_quantity between 61 and 80) 
else (select avg(ss_net_profit) 
from store_sales 
where ss_quantity between 61 and 80) end bucket4, 
case when (select count(*) 
from store_sales 
where ss_quantity between 81 and 100) > 694910283 
then (select avg(ss_ext_list_price) 
from store_sales 
where ss_quantity between 81 and 100) 
else (select avg(ss_net_profit) 
from store_sales 
where ss_quantity between 81 and 100) end bucket5 
from reason 
where r_reason_sk = 1 
; 
-- end query 49 in stream 0 using template query9.tpl 
-- start query 50 in stream 0 using template query31.tpl and seed 535157530 
with ss as 
(select ca_county,d_qoy, d_year,sum(ss_ext_sales_price) as store_sales 
from store_sales,date_dim,customer_address 
where ss_sold_date_sk = d_date_sk 
and ss_addr_sk=ca_address_sk 
group by ca_county,d_qoy, d_year), 
ws as 
(select ca_county,d_qoy, d_year,sum(ws_ext_sales_price) as web_sales 
from web_sales,date_dim,customer_address 
where ws_sold_date_sk = d_date_sk 
and ws_bill_addr_sk=ca_address_sk 
group by ca_county,d_qoy, d_year) 
select /* tt */ 
ss1.ca_county 
,ss1.d_year 
,ws2.web_sales/ws1.web_sales web_q1_q2_increase
Page | 77 
,ss2.store_sales/ss1.store_sales store_q1_q2_increase 
,ws3.web_sales/ws2.web_sales web_q2_q3_increase 
,ss3.store_sales/ss2.store_sales store_q2_q3_increase 
from 
ss ss1 
,ss ss2 
,ss ss3 
,ws ws1 
,ws ws2 
,ws ws3 
where 
ss1.d_qoy = 1 
and ss1.d_year = 1998 
and ss1.ca_county = ss2.ca_county 
and ss2.d_qoy = 2 
and ss2.d_year = 1998 
and ss2.ca_county = ss3.ca_county 
and ss3.d_qoy = 3 
and ss3.d_year = 1998 
and ss1.ca_county = ws1.ca_county 
and ws1.d_qoy = 1 
and ws1.d_year = 1998 
and ws1.ca_county = ws2.ca_county 
and ws2.d_qoy = 2 
and ws2.d_year = 1998 
and ws1.ca_county = ws3.ca_county 
and ws3.d_qoy = 3 
and ws3.d_year =1998 
and case when ws1.web_sales > 0 then ws2.web_sales/ws1.web_sales else null end 
> case when ss1.store_sales > 0 then ss2.store_sales/ss1.store_sales else null end 
and case when ws2.web_sales > 0 then ws3.web_sales/ws2.web_sales else null end 
> case when ss2.store_sales > 0 then ss3.store_sales/ss2.store_sales else null end 
order by ss1.d_year; 
-- end query 50 in stream 0 using template query31.tpl 
-- start query 51 in stream 0 using template query11.tpl and seed 1727350231 
with year_total as ( 
select c_customer_id customer_id 
,c_first_name customer_first_name 
,c_last_name customer_last_name 
,c_preferred_cust_flag customer_preferred_cust_flag 
,c_birth_country customer_birth_country 
,c_login customer_login 
,c_email_address customer_email_address 
,d_year dyear 
,sum(ss_ext_list_price-ss_ext_discount_amt) year_total 
,'s' sale_type 
from customer 
,store_sales 
,date_dim 
where c_customer_sk = ss_customer_sk 
and ss_sold_date_sk = d_date_sk 
group by c_customer_id 
,c_first_name 
,c_last_name 
,c_preferred_cust_flag 
,c_birth_country 
,c_login 
,c_email_address 
,d_year 
union all 
select c_customer_id customer_id 
,c_first_name customer_first_name 
,c_last_name customer_last_name 
,c_preferred_cust_flag customer_preferred_cust_flag 
,c_birth_country customer_birth_country 
,c_login customer_login 
,c_email_address customer_email_address 
,d_year dyear 
,sum(ws_ext_list_price-ws_ext_discount_amt) year_total 
,'w' sale_type 
from customer 
,web_sales 
,date_dim 
where c_customer_sk = ws_bill_customer_sk 
and ws_sold_date_sk = d_date_sk 
group by c_customer_id 
,c_first_name 
,c_last_name 
,c_preferred_cust_flag 
,c_birth_country 
,c_login 
,c_email_address 
,d_year
Page | 78 
) 
select t_s_secyear.customer_last_name 
from year_total t_s_firstyear 
,year_total t_s_secyear 
,year_total t_w_firstyear 
,year_total t_w_secyear 
where t_s_secyear.customer_id = t_s_firstyear.customer_id 
and t_s_firstyear.customer_id = t_w_secyear.customer_id 
and t_s_firstyear.customer_id = t_w_firstyear.customer_id 
and t_s_firstyear.sale_type = 's' 
and t_w_firstyear.sale_type = 'w' 
and t_s_secyear.sale_type = 's' 
and t_w_secyear.sale_type = 'w' 
and t_s_firstyear.dyear = 2001 
and t_s_secyear.dyear = 2001+1 
and t_w_firstyear.dyear = 2001 
and t_w_secyear.dyear = 2001+1 
and t_s_firstyear.year_total > 0 
and t_w_firstyear.year_total > 0 
and case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else null end 
> case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else null end 
order by t_s_secyear.customer_last_name 
fetch first 100 rows only; 
-- end query 51 in stream 0 using template query11.tpl 
-- start query 52 in stream 0 using template query93.tpl and seed 1891392271 
select ss_customer_sk 
,sum(act_sales) sumsales 
from (select ss_item_sk 
,ss_ticket_number 
,ss_customer_sk 
,case when sr_return_quantity is not null then (ss_quantity- sr_return_quantity)*ss_sales_price 
else (ss_quantity*ss_sales_price) end act_sales 
from store_sales left outer join store_returns on (sr_item_sk = ss_item_sk 
and sr_ticket_number = ss_ticket_number) 
,reason 
where sr_reason_sk = r_reason_sk 
and r_reason_desc = 'reason 47') t 
group by ss_customer_sk 
order by sumsales, ss_customer_sk 
fetch first 100 rows only; 
-- end query 52 in stream 0 using template query93.tpl 
-- start query 53 in stream 0 using template query29.tpl and seed 700403081 
select 
i_item_id 
,i_item_desc 
,s_store_id 
,s_store_name 
,avg(ss_quantity) as store_sales_quantity 
,avg(sr_return_quantity) as store_returns_quantity 
,avg(cs_quantity) as catalog_sales_quantity 
from 
store_sales 
,store_returns 
,catalog_sales 
,date_dim d1 
,date_dim d2 
,date_dim d3 
,store 
,item 
where 
d1.d_moy = 4 
and d1.d_year = 2000 
and d1.d_date_sk = ss_sold_date_sk 
and i_item_sk = ss_item_sk 
and s_store_sk = ss_store_sk 
and ss_customer_sk = sr_customer_sk 
and ss_item_sk = sr_item_sk 
and ss_ticket_number = sr_ticket_number 
and sr_returned_date_sk = d2.d_date_sk 
and d2.d_moy between 4 and 4 + 3 
and d2.d_year = 2000 
and sr_customer_sk = cs_bill_customer_sk 
and sr_item_sk = cs_item_sk 
and cs_sold_date_sk = d3.d_date_sk 
and d3.d_year in (2000,2000+1,2000+2) 
group by 
i_item_id 
,i_item_desc 
,s_store_id
Page | 79 
,s_store_name 
order by 
i_item_id 
,i_item_desc 
,s_store_id 
,s_store_name 
fetch first 100 rows only; 
-- end query 53 in stream 0 using template query29.tpl 
-- start query 54 in stream 0 using template query38.tpl and seed 179097785 
select count(*) from ( 
select distinct c_last_name, c_first_name, d_date 
from store_sales, date_dim, customer 
where store_sales.ss_sold_date_sk = date_dim.d_date_sk 
and store_sales.ss_customer_sk = customer.c_customer_sk 
and d_month_seq between 1183 and 1183 + 11 
intersect 
select distinct c_last_name, c_first_name, d_date 
from catalog_sales, date_dim, customer 
where catalog_sales.cs_sold_date_sk = date_dim.d_date_sk 
and catalog_sales.cs_bill_customer_sk = customer.c_customer_sk 
and d_month_seq between 1183 and 1183 + 11 
intersect 
select distinct c_last_name, c_first_name, d_date 
from web_sales, date_dim, customer 
where web_sales.ws_sold_date_sk = date_dim.d_date_sk 
and web_sales.ws_bill_customer_sk = customer.c_customer_sk 
and d_month_seq between 1183 and 1183 + 11 
) hot_cust 
fetch first 100 rows only; 
-- end query 54 in stream 0 using template query38.tpl 
-- start query 55 in stream 0 using template query22.tpl and seed 1074257943 
select i_product_name 
,i_brand 
,i_class 
,i_category 
,avg(cast(inv_quantity_on_hand as double)) qoh 
from inventory 
,date_dim 
,item 
,warehouse 
where inv_date_sk=d_date_sk 
and inv_item_sk=i_item_sk 
and inv_warehouse_sk = w_warehouse_sk 
and d_month_seq between 1203 and 1203 + 11 
group by rollup(i_product_name 
,i_brand 
,i_class 
,i_category) 
order by qoh, i_product_name, i_brand, i_class, i_category 
fetch first 100 rows only; 
-- end query 55 in stream 0 using template query22.tpl 
-- start query 56 in stream 0 using template query89.tpl and seed 1684776629 
select * 
from( 
select i_category, i_class, i_brand, 
s_store_name, s_company_name, 
d_moy, 
sum(ss_sales_price) sum_sales, 
avg(sum(ss_sales_price)) over 
(partition by i_category, i_brand, s_store_name, s_company_name) 
avg_monthly_sales 
from item, store_sales, date_dim, store 
where ss_item_sk = i_item_sk and 
ss_sold_date_sk = d_date_sk and 
ss_store_sk = s_store_sk and 
d_year in (2002) and 
((i_category in ('Shoes','Music','Children') and 
i_class in ('mens','classical','toddlers') 
) 
or (i_category in ('Home','Electronics','Sports') and 
i_class in ('lighting','portable','athletic shoes') 
)) 
group by i_category, i_class, i_brand, 
s_store_name, s_company_name, d_moy) tmp1
Page | 80 
where case when (avg_monthly_sales <> 0) then (abs(sum_sales - avg_monthly_sales) / avg_monthly_sales) else null end > 0.1 
order by sum_sales - avg_monthly_sales, s_store_name 
fetch first 100 rows only; 
-- end query 56 in stream 0 using template query89.tpl 
-- start query 57 in stream 0 using template query15.tpl and seed 631273844 
select ca_zip 
,sum(cs_sales_price) 
from catalog_sales 
,customer 
,customer_address 
,date_dim 
where cs_bill_customer_sk = c_customer_sk 
and c_current_addr_sk = ca_address_sk 
and ( substr(ca_zip,1,5) in ('85669', '86197','88274','83405','86475', 
'85392', '85460', '80348', '81792') 
or ca_state in ('CA','WA','GA') 
or cs_sales_price > 500) 
and cs_sold_date_sk = d_date_sk 
and d_qoy = 1 and d_year = 2002 
group by ca_zip 
order by ca_zip 
fetch first 100 rows only; 
-- end query 57 in stream 0 using template query15.tpl 
-- start query 58 in stream 0 using template query6.tpl and seed 327264001 
select a.ca_state state, count(*) cnt 
from customer_address a 
,customer c 
,store_sales s 
,date_dim d 
,item i 
where a.ca_address_sk = c.c_current_addr_sk 
and c.c_customer_sk = s.ss_customer_sk 
and s.ss_sold_date_sk = d.d_date_sk 
and s.ss_item_sk = i.i_item_sk 
and d.d_month_seq = 
(select distinct (d_month_seq) 
from date_dim 
where d_year = 1999 
and d_moy = 3 ) 
and i.i_current_price > 1.2 * 
(select avg(j.i_current_price) 
from item j 
where j.i_category = i.i_category) 
group by a.ca_state 
having count(*) >= 10 
order by cnt 
fetch first 100 rows only; 
-- end query 58 in stream 0 using template query6.tpl 
-- start query 59 in stream 0 using template query52.tpl and seed 1783319695 
select dt.d_year 
,item.i_brand_id brand_id 
,item.i_brand brand 
,sum(ss_ext_sales_price) ext_price 
from date_dim dt 
,store_sales 
,item 
where dt.d_date_sk = store_sales.ss_sold_date_sk 
and store_sales.ss_item_sk = item.i_item_sk 
and item.i_manager_id = 1 
and dt.d_moy=12 
and dt.d_year=1998 
group by dt.d_year 
,item.i_brand 
,item.i_brand_id 
order by dt.d_year 
,ext_price desc 
,brand_id 
fetch first 100 rows only ; 
-- end query 59 in stream 0 using template query52.tpl 
-- start query 60 in stream 0 using template query50.tpl and seed 499173639 
select 
s_store_name 
,s_company_id 
,s_street_number 
,s_street_name 
,s_street_type 
,s_suite_number 
,s_city 
,s_county 
,s_state 
,s_zip
Page | 81 
,sum(case when (sr_returned_date_sk - ss_sold_date_sk <= 30 ) then 1 else 0 end) as "30 days" 
,sum(case when (sr_returned_date_sk - ss_sold_date_sk > 30) and 
(sr_returned_date_sk - ss_sold_date_sk <= 60) then 1 else 0 end ) as "31-60 days" 
,sum(case when (sr_returned_date_sk - ss_sold_date_sk > 60) and 
(sr_returned_date_sk - ss_sold_date_sk <= 90) then 1 else 0 end) as "61- 90 days" 
,sum(case when (sr_returned_date_sk - ss_sold_date_sk > 90) and 
(sr_returned_date_sk - ss_sold_date_sk <= 120) then 1 else 0 end) as "91-120 days" 
,sum(case when (sr_returned_date_sk - ss_sold_date_sk > 120) then 1 else 0 end) as ">120 days" 
from 
store_sales 
,store_returns 
,store 
,date_dim d1 
,date_dim d2 
where 
d2.d_year = 2002 
and d2.d_moy = 9 
and ss_ticket_number = sr_ticket_number 
and ss_item_sk = sr_item_sk 
and ss_sold_date_sk = d1.d_date_sk 
and sr_returned_date_sk = d2.d_date_sk 
and ss_customer_sk = sr_customer_sk 
and ss_store_sk = s_store_sk 
group by 
s_store_name 
,s_company_id 
,s_street_number 
,s_street_name 
,s_street_type 
,s_suite_number 
,s_city 
,s_county 
,s_state 
,s_zip 
order by s_store_name 
,s_company_id 
,s_street_number 
,s_street_name 
,s_street_type 
,s_suite_number 
,s_city 
,s_county 
,s_state 
,s_zip 
fetch first 100 rows only; 
-- end query 60 in stream 0 using template query50.tpl 
-- start query 61 in stream 0 using template query42.tpl and seed 801946299 
select dt.d_year 
,item.i_category_id 
,item.i_category 
,sum(ss_ext_sales_price) 
from date_dim dt 
,store_sales 
,item 
where dt.d_date_sk = store_sales.ss_sold_date_sk 
and store_sales.ss_item_sk = item.i_item_sk 
and item.i_manager_id = 1 
and dt.d_moy=12 
and dt.d_year=1999 
group by dt.d_year 
,item.i_category_id 
,item.i_category 
order by sum(ss_ext_sales_price) desc,dt.d_year 
,item.i_category_id 
,item.i_category 
fetch first 100 rows only ; 
-- end query 61 in stream 0 using template query42.tpl 
-- start query 62 in stream 0 using template query41.tpl and seed 1556130879 
select distinct(i_product_name) 
from item i1 
where i_manufact_id between 708 and 708+40 
and (select count(*) as item_cnt 
from item 
where (i_manufact = i1.i_manufact and 
((i_category = 'Women' and 
(i_color = 'blanched' or i_color = 'sandy') and 
(i_units = 'Gram' or i_units = 'Gross') and 
(i_size = 'small' or i_size = 'large')
Page | 82 
) or 
(i_category = 'Women' and 
(i_color = 'peru' or i_color = 'firebrick') and 
(i_units = 'Lb' or i_units = 'Pound') and 
(i_size = 'petite' or i_size = 'economy') 
) or 
(i_category = 'Men' and 
(i_color = 'coral' or i_color = 'ivory') and 
(i_units = 'Oz' or i_units = 'Dram') and 
(i_size = 'medium' or i_size = 'extra large') 
) or 
(i_category = 'Men' and 
(i_color = 'cornflower' or i_color = 'yellow') and 
(i_units = 'Box' or i_units = 'Dozen') and 
(i_size = 'small' or i_size = 'large') 
))) or 
(i_manufact = i1.i_manufact and 
((i_category = 'Women' and 
(i_color = 'orchid' or i_color = 'navy') and 
(i_units = 'Pallet' or i_units = 'Ton') and 
(i_size = 'small' or i_size = 'large') 
) or 
(i_category = 'Women' and 
(i_color = 'smoke' or i_color = 'chartreuse') and 
(i_units = 'Tsp' or i_units = 'Cup') and 
(i_size = 'petite' or i_size = 'economy') 
) or 
(i_category = 'Men' and 
(i_color = 'turquoise' or i_color = 'almond') and 
(i_units = 'N/A' or i_units = 'Carton') and 
(i_size = 'medium' or i_size = 'extra large') 
) or 
(i_category = 'Men' and 
(i_color = 'dim' or i_color = 'mint') and 
(i_units = 'Unknown' or i_units = 'Tbl') and 
(i_size = 'small' or i_size = 'large') 
)))) > 0 
order by i_product_name 
fetch first 100 rows only; 
-- end query 62 in stream 0 using template query41.tpl 
-- start query 63 in stream 0 using template query8.tpl and seed 1332175075 
select s_store_name 
,sum(ss_net_profit) 
from store_sales 
,date_dim 
,store, 
(select ca_zip 
from ( 
(SELECT substr(ca_zip,1,5) ca_zip 
FROM customer_address 
WHERE substr(ca_zip,1,5) IN ( 
'17520','56461','11390','87479','50201','64392', 
'77741','76113','54207','15320','44569', 
'35851','46871','50295','14109','70069', 
'42274','72697','49813','18583','63339', 
'60505','99432','79884','33972','42525', 
'35092','25778','22629','64234','29226', 
'75520','98109','16929','55589','40349', 
'19272','40489','28727','21155','14808', 
'49719','90782','96126','56778','31988', 
'59430','94944','40599','83996','13656', 
'56186','43140','61896','41823','37763', 
'88569','63139','49977','68798','61598', 
'12149','11627','83980','49908','32429', 
'12310','95102','68778','28297','44532', 
'78974','23090','44128','59881','17124', 
'70629','98394','50450','55883','33325', 
'85623','38485','49236','16046','86766', 
'52396','36647','74681','92467','76826', 
'28698','76613','49428','60613','78399', 
'32006','56656','50099','62541','24195',
Page | 83 
'59554','47479','27633','86644','77196', 
'11416','25315','69480','55282','21296', 
'74115','39036','57192','26772','41446', 
'85594','26170','32014','33686','17417', 
'38479','39798','26984','15384','19701', 
'31840','75749','10821','19540','74993', 
'14695','36295','77284','30705','60499', 
'88870','22740','93118','24062','89801', 
'64498','24353','63764','48640','20763', 
'81686','86801','14510','97250','63328', 
'14274','84750','37540','13141','43656', 
'42594','62162','30856','58781','97307', 
'30425','47381','29354','10208','53823', 
'34767','45240','92270','20139','32558', 
'85961','25518','18478','68301','28043', 
'95864','19684','23565','71884','28618', 
'83171','68892','91727','77558','42337', 
'13172','53387','18098','27450','54631', 
'87025','25044','61857','17079','37192', 
'56817','24721','86299','21755','11584', 
'29803','55705','52938','80270','15967', 
'68622','53938','45389','25599','29162', 
'31836','26662','13248','21731','12125', 
'83522','25188','99246','81384','26365', 
'10509','52595','27999','35400','64795', 
'80604','11425','62684','40040','63709', 
'31391','39805','23995','56156','30740', 
'12236','49117','30327','93204','36708', 
'99477','15829','19538','30944','41725', 
'13717','92499','62369','38977','22461', 
'78617','56289','77625','75530','89141', 
'82011','12189','86300','13413','49611', 
'98875','20610','94062','19966','44440', 
'53620','18960','24320','72802','64177', 
'96013','44107','48600','76115','59687', 
'91038','47763','10294','33300','68127', 
'23375','91206','63518','98214','14559', 
'12392','35386','10880','47707','63489', 
'82126','49457','45829','18615','15584', 
'22508','12710','50721','75233','61958', 
'16475','55756','69792','35722','27456', 
'51312','60703','34980','18416','16237', 
'12628','38465','52536','31266','48258', 
'86017','97136','50287','23293','90970', 
'37154','65745','34215','91448','84360', 
'79510','19040','64301','17062','20644', 
'54703','29273','13740','56946','20456', 
'68468','84097','84416','12869','26949', 
'52033','96951','31672','81587','95676', 
'37610','93894','39271','12437','56669', 
'18231','76214','11829','16427','68398', 
'33984','63223','71905','88964','44052', 
'99132','16234','11429','77186','63739', 
'78627','10605','66822','59934','78689', 
'15739','48704','66247','88173','75206', 
'35437','89093','65467','80334','53903', 
'67968','24348','21275','45914','34383', 
'30991','71679','15697','30316','30118',
Page | 84 
'15177','66847','20352','93605','69947', 
'73326','86839','62858','89802')) 
intersect 
(select ca_zip 
from (SELECT substr(ca_zip,1,5) ca_zip,count(*) cnt 
FROM customer_address, customer 
WHERE ca_address_sk = c_current_addr_sk and 
c_preferred_cust_flag='Y' 
group by ca_zip 
having count(*) > 10)A1))A2) V1 
where ss_store_sk = s_store_sk 
and ss_sold_date_sk = d_date_sk 
and d_qoy = 2 and d_year = 1998 
and (substr(s_zip,1,2) = substr(V1.ca_zip,1,2)) 
group by s_store_name 
order by s_store_name 
fetch first 100 rows only; 
-- end query 63 in stream 0 using template query8.tpl 
-- start query 64 in stream 0 using template query12.tpl and seed 893042407 
select i_item_desc 
,i_category 
,i_class 
,i_current_price 
,sum(ws_ext_sales_price) as itemrevenue 
,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_ sales_price)) over 
(partition by i_class) as revenueratio 
from 
web_sales 
,item 
,date_dim 
where 
ws_item_sk = i_item_sk 
and i_category in ('Jewelry', 'Men', 'Music') 
and ws_sold_date_sk = d_date_sk 
and d_date between cast('2002-05-04' as date) 
and (cast('2002-05- 04' as date) + 30 days) 
group by 
i_item_id 
,i_item_desc 
,i_category 
,i_class 
,i_current_price 
order by 
i_category 
,i_class 
,i_item_id 
,i_item_desc 
,revenueratio 
fetch first 100 rows only; 
-- end query 64 in stream 0 using template query12.tpl 
-- start query 65 in stream 0 using template query20.tpl and seed 315194146 
select i_item_desc 
,i_category 
,i_class 
,i_current_price 
,sum(cs_ext_sales_price) as itemrevenue 
,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over 
(partition by i_class) as revenueratio 
from catalog_sales 
,item 
,date_dim 
where cs_item_sk = i_item_sk 
and i_category in ('Home', 'Electronics', 'Children') 
and cs_sold_date_sk = d_date_sk 
and d_date between cast('2002-03-06' as date) 
and (cast('2002-03- 06' as date) + 30 days) 
group by i_item_id 
,i_item_desc 
,i_category 
,i_class 
,i_current_price 
order by i_category 
,i_class 
,i_item_id 
,i_item_desc 
,revenueratio 
fetch first 100 rows only; 
-- end query 65 in stream 0 using template query20.tpl 
-- start query 66 in stream 0 using template query88.tpl and seed 1356399435
Page | 85 
select * 
from 
(select count(*) h8_30_to_9 
from store_sales, household_demographics , time_dim, store 
where ss_sold_time_sk = time_dim.t_time_sk 
and ss_hdemo_sk = household_demographics.hd_demo_sk 
and ss_store_sk = s_store_sk 
and time_dim.t_hour = 8 
and time_dim.t_minute >= 30 
and ((household_demographics.hd_dep_count = 4 and household_demographics.hd_vehicle_count<=4+2) or 
(household_demographics.hd_dep_count = 3 and household_demographics.hd_vehicle_count<=3+2) or 
(household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2)) 
and store.s_store_name = 'ese') s1, 
(select count(*) h9_to_9_30 
from store_sales, household_demographics , time_dim, store 
where ss_sold_time_sk = time_dim.t_time_sk 
and ss_hdemo_sk = household_demographics.hd_demo_sk 
and ss_store_sk = s_store_sk 
and time_dim.t_hour = 9 
and time_dim.t_minute < 30 
and ((household_demographics.hd_dep_count = 4 and household_demographics.hd_vehicle_count<=4+2) or 
(household_demographics.hd_dep_count = 3 and household_demographics.hd_vehicle_count<=3+2) or 
(household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2)) 
and store.s_store_name = 'ese') s2, 
(select count(*) h9_30_to_10 
from store_sales, household_demographics , time_dim, store 
where ss_sold_time_sk = time_dim.t_time_sk 
and ss_hdemo_sk = household_demographics.hd_demo_sk 
and ss_store_sk = s_store_sk 
and time_dim.t_hour = 9 
and time_dim.t_minute >= 30 
and ((household_demographics.hd_dep_count = 4 and household_demographics.hd_vehicle_count<=4+2) or 
(household_demographics.hd_dep_count = 3 and household_demographics.hd_vehicle_count<=3+2) or 
(household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2)) 
and store.s_store_name = 'ese') s3, 
(select count(*) h10_to_10_30 
from store_sales, household_demographics , time_dim, store 
where ss_sold_time_sk = time_dim.t_time_sk 
and ss_hdemo_sk = household_demographics.hd_demo_sk 
and ss_store_sk = s_store_sk 
and time_dim.t_hour = 10 
and time_dim.t_minute < 30 
and ((household_demographics.hd_dep_count = 4 and household_demographics.hd_vehicle_count<=4+2) or 
(household_demographics.hd_dep_count = 3 and household_demographics.hd_vehicle_count<=3+2) or 
(household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2)) 
and store.s_store_name = 'ese') s4, 
(select count(*) h10_30_to_11 
from store_sales, household_demographics , time_dim, store 
where ss_sold_time_sk = time_dim.t_time_sk 
and ss_hdemo_sk = household_demographics.hd_demo_sk 
and ss_store_sk = s_store_sk 
and time_dim.t_hour = 10 
and time_dim.t_minute >= 30 
and ((household_demographics.hd_dep_count = 4 and household_demographics.hd_vehicle_count<=4+2) or 
(household_demographics.hd_dep_count = 3 and household_demographics.hd_vehicle_count<=3+2) or 
(household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2)) 
and store.s_store_name = 'ese') s5, 
(select count(*) h11_to_11_30 
from store_sales, household_demographics , time_dim, store 
where ss_sold_time_sk = time_dim.t_time_sk 
and ss_hdemo_sk = household_demographics.hd_demo_sk 
and ss_store_sk = s_store_sk 
and time_dim.t_hour = 11 
and time_dim.t_minute < 30 
and ((household_demographics.hd_dep_count = 4 and household_demographics.hd_vehicle_count<=4+2) or 
(household_demographics.hd_dep_count = 3 and household_demographics.hd_vehicle_count<=3+2) or 
(household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2)) 
and store.s_store_name = 'ese') s6,
Page | 86 
(select count(*) h11_30_to_12 
from store_sales, household_demographics , time_dim, store 
where ss_sold_time_sk = time_dim.t_time_sk 
and ss_hdemo_sk = household_demographics.hd_demo_sk 
and ss_store_sk = s_store_sk 
and time_dim.t_hour = 11 
and time_dim.t_minute >= 30 
and ((household_demographics.hd_dep_count = 4 and household_demographics.hd_vehicle_count<=4+2) or 
(household_demographics.hd_dep_count = 3 and household_demographics.hd_vehicle_count<=3+2) or 
(household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2)) 
and store.s_store_name = 'ese') s7, 
(select count(*) h12_to_12_30 
from store_sales, household_demographics , time_dim, store 
where ss_sold_time_sk = time_dim.t_time_sk 
and ss_hdemo_sk = household_demographics.hd_demo_sk 
and ss_store_sk = s_store_sk 
and time_dim.t_hour = 12 
and time_dim.t_minute < 30 
and ((household_demographics.hd_dep_count = 4 and household_demographics.hd_vehicle_count<=4+2) or 
(household_demographics.hd_dep_count = 3 and household_demographics.hd_vehicle_count<=3+2) or 
(household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2)) 
and store.s_store_name = 'ese') s8 
; 
-- end query 66 in stream 0 using template query88.tpl 
-- start query 67 in stream 0 using template query82.tpl and seed 1485865852 
select i_item_id 
,i_item_desc 
,i_current_price 
from item, inventory, date_dim, store_sales 
where i_current_price between 16 and 16+30 
and inv_item_sk = i_item_sk 
and d_date_sk=inv_date_sk 
and d_date between cast('2000-02-07' as date) and (cast('2000-02-07' as date) + 60 days) 
and i_manufact_id in (763,359,233,597) 
and inv_quantity_on_hand between 100 and 500 
and ss_item_sk = i_item_sk 
group by i_item_id,i_item_desc,i_current_price 
order by i_item_id 
fetch first 100 rows only; 
-- end query 67 in stream 0 using template query82.tpl 
-- start query 68 in stream 0 using template query23.tpl and seed 1801862381 
with frequent_ss_items as 
(select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date solddate,count(*) cnt 
from store_sales 
,date_dim 
,item 
where ss_sold_date_sk = d_date_sk 
and ss_item_sk = i_item_sk 
and d_year in (2000,2000+1,2000+2,2000+3) 
group by substr(i_item_desc,1,30),i_item_sk,d_date 
having count(*) >4), 
max_store_sales as 
(select max(csales) tpcds_cmax 
from (select c_customer_sk,sum(ss_quantity*ss_sales_price) csales 
from store_sales 
,customer 
,date_dim 
where ss_customer_sk = c_customer_sk 
and ss_sold_date_sk = d_date_sk 
and d_year in (2000,2000+1,2000+2,2000+3) 
group by c_customer_sk) x), 
best_ss_customer as 
(select c_customer_sk,sum(ss_quantity*ss_sales_price) ssales 
from store_sales 
,customer 
where ss_customer_sk = c_customer_sk 
group by c_customer_sk 
having sum(ss_quantity*ss_sales_price) > (95/100.0) * (select 
* 
from 
max_store_sales)) 
select sum(sales) 
from ((select cs_quantity*cs_list_price sales 
from catalog_sales
Page | 87 
,date_dim 
where d_year = 2000 
and d_moy = 5 
and cs_sold_date_sk = d_date_sk 
and cs_item_sk in (select item_sk from frequent_ss_items) 
and cs_bill_customer_sk in (select c_customer_sk from best_ss_customer)) 
union all 
(select ws_quantity*ws_list_price sales 
from web_sales 
,date_dim 
where d_year = 2000 
and d_moy = 5 
and ws_sold_date_sk = d_date_sk 
and ws_item_sk in (select item_sk from frequent_ss_items) 
and ws_bill_customer_sk in (select c_customer_sk from best_ss_customer))) y 
fetch first 100 rows only; 
with frequent_ss_items as 
(select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date solddate,count(*) cnt 
from store_sales 
,date_dim 
,item 
where ss_sold_date_sk = d_date_sk 
and ss_item_sk = i_item_sk 
and d_year in (2000,2000 + 1,2000 + 2,2000 + 3) 
group by substr(i_item_desc,1,30),i_item_sk,d_date 
having count(*) >4), 
max_store_sales as 
(select max(csales) tpcds_cmax 
from (select c_customer_sk,sum(ss_quantity*ss_sales_price) csales 
from store_sales 
,customer 
,date_dim 
where ss_customer_sk = c_customer_sk 
and ss_sold_date_sk = d_date_sk 
and d_year in (2000,2000+1,2000+2,2000+3) 
group by c_customer_sk) x), 
best_ss_customer as 
(select c_customer_sk,sum(ss_quantity*ss_sales_price) ssales 
from store_sales 
,customer 
where ss_customer_sk = c_customer_sk 
group by c_customer_sk 
having sum(ss_quantity*ss_sales_price) > (95/100.0) * (select 
* 
from max_store_sales)) 
select c_last_name,c_first_name,sales 
from ((select c_last_name,c_first_name,sum(cs_quantity*cs_list_price) sales 
from catalog_sales 
,customer 
,date_dim 
where d_year = 2000 
and d_moy = 5 
and cs_sold_date_sk = d_date_sk 
and cs_item_sk in (select item_sk from frequent_ss_items) 
and cs_bill_customer_sk in (select c_customer_sk from best_ss_customer) 
and cs_bill_customer_sk = c_customer_sk 
group by c_last_name,c_first_name) 
union all 
(select c_last_name,c_first_name,sum(ws_quantity*ws_list_price) sales 
from web_sales 
,customer 
,date_dim 
where d_year = 2000 
and d_moy = 5 
and ws_sold_date_sk = d_date_sk 
and ws_item_sk in (select item_sk from frequent_ss_items) 
and ws_bill_customer_sk in (select c_customer_sk from best_ss_customer) 
and ws_bill_customer_sk = c_customer_sk 
group by c_last_name,c_first_name)) y 
order by c_last_name,c_first_name,sales 
fetch first 100 rows only; 
-- end query 68 in stream 0 using template query23.tpl 
-- start query 69 in stream 0 using template query14.tpl and seed 290166045 
with cross_items as 
(select i_item_sk ss_item_sk 
from item, 
(select iss.i_brand_id brand_id 
,iss.i_class_id class_id 
,iss.i_category_id category_id
Page | 88 
from store_sales 
,item iss 
,date_dim d1 
where ss_item_sk = iss.i_item_sk 
and ss_sold_date_sk = d1.d_date_sk 
and d1.d_year between 1999 AND 1999 + 2 
intersect 
select ics.i_brand_id 
,ics.i_class_id 
,ics.i_category_id 
from catalog_sales 
,item ics 
,date_dim d2 
where cs_item_sk = ics.i_item_sk 
and cs_sold_date_sk = d2.d_date_sk 
and d2.d_year between 1999 AND 1999 + 2 
intersect 
select iws.i_brand_id 
,iws.i_class_id 
,iws.i_category_id 
from web_sales 
,item iws 
,date_dim d3 
where ws_item_sk = iws.i_item_sk 
and ws_sold_date_sk = d3.d_date_sk 
and d3.d_year between 1999 AND 1999 + 2) x 
where i_brand_id = brand_id 
and i_class_id = class_id 
and i_category_id = category_id 
), 
avg_sales as 
(select avg(quantity*list_price) average_sales 
from (select ss_quantity quantity 
,ss_list_price list_price 
from store_sales 
,date_dim 
where ss_sold_date_sk = d_date_sk 
and d_year between 1999 and 2001 
union all 
select cs_quantity quantity 
,cs_list_price list_price 
from catalog_sales 
,date_dim 
where cs_sold_date_sk = d_date_sk 
and d_year between 1998 and 1998 + 2 
union all 
select ws_quantity quantity 
,ws_list_price list_price 
from web_sales 
,date_dim 
where ws_sold_date_sk = d_date_sk 
and d_year between 1998 and 1998 + 2) x) 
select channel, i_brand_id,i_class_id,i_category_id,sum(sales), sum(number_sales) 
from( 
select 'store' channel, i_brand_id,i_class_id 
,i_category_id,sum(ss_quantity*ss_list_price) sales 
, count(*) number_sales 
from store_sales 
,item 
,date_dim 
where ss_item_sk in (select ss_item_sk from cross_items) 
and ss_item_sk = i_item_sk 
and ss_sold_date_sk = d_date_sk 
and d_year = 1998+2 
and d_moy = 11 
group by i_brand_id,i_class_id,i_category_id 
having sum(ss_quantity*ss_list_price) > (select average_sales from avg_sales) 
union all 
select 'catalog' channel, i_brand_id,i_class_id,i_category_id, sum(cs_quantity*cs_list_price) sales, count(*) number_sales 
from catalog_sales 
,item 
,date_dim 
where cs_item_sk in (select ss_item_sk from cross_items) 
and cs_item_sk = i_item_sk 
and cs_sold_date_sk = d_date_sk 
and d_year = 1998+2 
and d_moy = 11 
group by i_brand_id,i_class_id,i_category_id 
having sum(cs_quantity*cs_list_price) > (select average_sales from avg_sales) 
union all 
select 'web' channel, i_brand_id,i_class_id,i_category_id, sum(ws_quantity*ws_list_price) sales , count(*) number_sales 
from web_sales 
,item
Page | 89 
,date_dim 
where ws_item_sk in (select ss_item_sk from cross_items) 
and ws_item_sk = i_item_sk 
and ws_sold_date_sk = d_date_sk 
and d_year = 1998+2 
and d_moy = 11 
group by i_brand_id,i_class_id,i_category_id 
having sum(ws_quantity*ws_list_price) > (select average_sales from avg_sales) 
) y 
group by rollup (channel, i_brand_id,i_class_id,i_category_id) 
order by channel,i_brand_id,i_class_id,i_category_id 
fetch first 100 rows only; 
with cross_items as 
(select i_item_sk ss_item_sk 
from item, 
(select iss.i_brand_id brand_id 
,iss.i_class_id class_id 
,iss.i_category_id category_id 
from store_sales 
,item iss 
,date_dim d1 
where ss_item_sk = iss.i_item_sk 
and ss_sold_date_sk = d1.d_date_sk 
and d1.d_year between 1999 AND 1999 + 2 
intersect 
select ics.i_brand_id 
,ics.i_class_id 
,ics.i_category_id 
from catalog_sales 
,item ics 
,date_dim d2 
where cs_item_sk = ics.i_item_sk 
and cs_sold_date_sk = d2.d_date_sk 
and d2.d_year between 1999 AND 1999 + 2 
intersect 
select iws.i_brand_id 
,iws.i_class_id 
,iws.i_category_id 
from web_sales 
,item iws 
,date_dim d3 
where ws_item_sk = iws.i_item_sk 
and ws_sold_date_sk = d3.d_date_sk 
and d3.d_year between 1999 AND 1999 + 2) x 
where i_brand_id = brand_id 
and i_class_id = class_id 
and i_category_id = category_id 
), 
avg_sales as 
(select avg(quantity*list_price) average_sales 
from (select ss_quantity quantity 
,ss_list_price list_price 
from store_sales 
,date_dim 
where ss_sold_date_sk = d_date_sk 
and d_year between 1998 and 1998 + 2 
union all 
select cs_quantity quantity 
,cs_list_price list_price 
from catalog_sales 
,date_dim 
where cs_sold_date_sk = d_date_sk 
and d_year between 1998 and 1998 + 2 
union all 
select ws_quantity quantity 
,ws_list_price list_price 
from web_sales 
,date_dim 
where ws_sold_date_sk = d_date_sk 
and d_year between 1998 and 1998 + 2) x) 
select * from 
(select 'store' channel, i_brand_id,i_class_id,i_category_id 
,sum(ss_quantity*ss_list_price) sales, count(*) number_sales 
from store_sales 
,item 
,date_dim 
where ss_item_sk in (select ss_item_sk from cross_items) 
and ss_item_sk = i_item_sk 
and ss_sold_date_sk = d_date_sk 
and d_week_seq = (select d_week_seq 
from date_dim 
where d_year = 1998 + 1 
and d_moy = 12 
and d_dom = 18) 
group by i_brand_id,i_class_id,i_category_id 
having sum(ss_quantity*ss_list_price) > (select average_sales from avg_sales)) this_year, 
(select 'store' channel, i_brand_id,i_class_id
Page | 90 
,i_category_id, sum(ss_quantity*ss_list_price) sales, count(*) number_sales 
from store_sales 
,item 
,date_dim 
where ss_item_sk in (select ss_item_sk from cross_items) 
and ss_item_sk = i_item_sk 
and ss_sold_date_sk = d_date_sk 
and d_week_seq = (select d_week_seq 
from date_dim 
where d_year = 1998 
and d_moy = 12 
and d_dom = 18) 
group by i_brand_id,i_class_id,i_category_id 
having sum(ss_quantity*ss_list_price) > (select average_sales from avg_sales)) last_year 
where this_year.i_brand_id= last_year.i_brand_id 
and this_year.i_class_id = last_year.i_class_id 
and this_year.i_category_id = last_year.i_category_id 
order by this_year.channel, this_year.i_brand_id, this_year.i_class_id, this_year.i_category_id 
fetch first 100 rows only; 
-- end query 69 in stream 0 using template query14.tpl 
-- start query 70 in stream 0 using template query57.tpl and seed 1980588756 
with v1 as( 
select i_category, i_brand, 
cc_name, 
d_year, d_moy, 
sum(cs_sales_price) sum_sales, 
avg(sum(cs_sales_price)) over 
(partition by i_category, i_brand, 
cc_name, d_year) 
avg_monthly_sales, 
rank() over 
(partition by i_category, i_brand, 
cc_name 
order by d_year, d_moy) rn 
from item, catalog_sales, date_dim, call_center 
where cs_item_sk = i_item_sk and 
cs_sold_date_sk = d_date_sk and 
cc_call_center_sk= cs_call_center_sk and 
( 
d_year = 1999 or 
( d_year = 1999-1 and d_moy =12) or 
( d_year = 1999+1 and d_moy =1) 
) 
group by i_category, i_brand, 
cc_name , d_year, d_moy), 
v2 as( 
select v1.i_category, v1.i_brand 
,v1.d_year, v1.d_moy 
,v1.avg_monthly_sales 
,v1.sum_sales, v1_lag.sum_sales psum, v1_lead.sum_sales nsum 
from v1, v1 v1_lag, v1 v1_lead 
where v1.i_category = v1_lag.i_category and 
v1.i_category = v1_lead.i_category and 
v1.i_brand = v1_lag.i_brand and 
v1.i_brand = v1_lead.i_brand and 
v1. cc_name = v1_lag. cc_name and 
v1. cc_name = v1_lead. cc_name and 
v1.rn = v1_lag.rn + 1 and 
v1.rn = v1_lead.rn - 1) 
select * 
from v2 
where d_year = 1999 and 
avg_monthly_sales > 0 and 
case when avg_monthly_sales > 0 then abs(sum_sales - avg_monthly_sales) / avg_monthly_sales else null end > 0.1 
order by sum_sales - avg_monthly_sales, 3 
fetch first 100 rows only; 
-- end query 70 in stream 0 using template query57.tpl 
-- start query 71 in stream 0 using template query65.tpl and seed 398283436 
select 
s_store_name, 
i_item_desc, 
sc.revenue, 
i_current_price, 
i_wholesale_cost, 
i_brand 
from store, item, 
(select ss_store_sk, avg(revenue) as ave 
from 
(select ss_store_sk, ss_item_sk, 
sum(ss_sales_price) as revenue 
from store_sales, date_dim 
where ss_sold_date_sk = d_date_sk and d_month_seq between 1223 and 1223+11
Page | 91 
group by ss_store_sk, ss_item_sk) sa 
group by ss_store_sk) sb, 
(select ss_store_sk, ss_item_sk, sum(ss_sales_price) as revenue 
from store_sales, date_dim 
where ss_sold_date_sk = d_date_sk and d_month_seq between 1223 and 1223+11 
group by ss_store_sk, ss_item_sk) sc 
where sb.ss_store_sk = sc.ss_store_sk and 
sc.revenue <= 0.1 * sb.ave and 
s_store_sk = sc.ss_store_sk and 
i_item_sk = sc.ss_item_sk 
order by s_store_name, i_item_desc 
fetch first 100 rows only; 
-- end query 71 in stream 0 using template query65.tpl 
-- start query 72 in stream 0 using template query71.tpl and seed 2112533127 
select i_brand_id brand_id, i_brand brand,t_hour,t_minute, 
sum(ext_price) ext_price 
from item, (select ws_ext_sales_price as ext_price, 
ws_sold_date_sk as sold_date_sk, 
ws_item_sk as sold_item_sk, 
ws_sold_time_sk as time_sk 
from web_sales,date_dim 
where d_date_sk = ws_sold_date_sk 
and d_moy=11 
and d_year=2000 
union all 
select cs_ext_sales_price as ext_price, 
cs_sold_date_sk as sold_date_sk, 
cs_item_sk as sold_item_sk, 
cs_sold_time_sk as time_sk 
from catalog_sales,date_dim 
where d_date_sk = cs_sold_date_sk 
and d_moy=11 
and d_year=2000 
union all 
select ss_ext_sales_price as ext_price, 
ss_sold_date_sk as sold_date_sk, 
ss_item_sk as sold_item_sk, 
ss_sold_time_sk as time_sk 
from store_sales,date_dim 
where d_date_sk = ss_sold_date_sk 
and d_moy=11 
and d_year=2000 
) as tmp,time_dim 
where 
sold_item_sk = i_item_sk 
and i_manager_id=1 
and time_sk = t_time_sk 
and (t_meal_time = 'breakfast' or t_meal_time = 'dinner') 
group by i_brand, i_brand_id,t_hour,t_minute 
order by ext_price desc, i_brand_id 
; 
-- end query 72 in stream 0 using template query71.tpl 
-- start query 73 in stream 0 using template query34.tpl and seed 1754860092 
select c_last_name 
,c_first_name 
,c_salutation 
,c_preferred_cust_flag 
,ss_ticket_number 
,cnt from 
(select ss_ticket_number 
,ss_customer_sk 
,count(*) cnt 
from store_sales,date_dim,store,household_demographics 
where store_sales.ss_sold_date_sk = date_dim.d_date_sk 
and store_sales.ss_store_sk = store.s_store_sk 
and store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk 
and (date_dim.d_dom between 1 and 3 or date_dim.d_dom between 25 and 28) 
and (household_demographics.hd_buy_potential = '1001-5000' or 
household_demographics.hd_buy_potential = '5001-10000') 
and household_demographics.hd_vehicle_count > 0 
and (case when household_demographics.hd_vehicle_count > 0 
then cast(household_demographics.hd_dep_count as double) / cast(household_demographics.hd_vehicle_count as double) 
else null 
end) > 1.2 
and date_dim.d_year in (2000,2000+1,2000+2)
Page | 92 
and store.s_county in ('Chambers County','Perry County','Richland County','Sierra County', 
'Perry County','Surry County','Wadena County','Essex County') 
group by ss_ticket_number,ss_customer_sk) dn,customer 
where ss_customer_sk = c_customer_sk 
and cnt between 15 and 20 
order by c_last_name,c_first_name,c_salutation,c_preferred_ cust_flag desc; 
-- end query 73 in stream 0 using template query34.tpl 
-- start query 74 in stream 0 using template query48.tpl and seed 511060613 
select sum (ss_quantity) 
from store_sales, store, customer_demographics, customer_address, date_dim 
where s_store_sk = ss_store_sk 
and ss_sold_date_sk = d_date_sk and d_year = 1999 
and 
( 
( 
cd_demo_sk = ss_cdemo_sk 
and 
cd_marital_status = 'M' 
and 
cd_education_status = 'Secondary' 
and 
ss_sales_price between 100.00 and 150.00 
) 
or 
( 
cd_demo_sk = ss_cdemo_sk 
and 
cd_marital_status = 'M' 
and 
cd_education_status = 'Secondary' 
and 
ss_sales_price between 50.00 and 100.00 
) 
or 
( 
cd_demo_sk = ss_cdemo_sk 
and 
cd_marital_status = 'M' 
and 
cd_education_status = 'Secondary' 
and 
ss_sales_price between 150.00 and 200.00 
) 
) 
and 
( 
( 
ss_addr_sk = ca_address_sk 
and 
ca_country = 'United States' 
and 
ca_state in ('TX', 'MN', 'GA') 
and ss_net_profit between 0 and 2000 
) 
or 
(ss_addr_sk = ca_address_sk 
and 
ca_country = 'United States' 
and 
ca_state in ('AR', 'NJ', 'OH') 
and ss_net_profit between 150 and 3000 
) 
or 
(ss_addr_sk = ca_address_sk 
and 
ca_country = 'United States' 
and 
ca_state in ('VA', 'ID', 'KY') 
and ss_net_profit between 50 and 25000 
) 
) 
; 
-- end query 74 in stream 0 using template query48.tpl 
-- start query 75 in stream 0 using template query30.tpl and seed 1376774683 
with customer_total_return as 
(select wr_returning_customer_sk as ctr_customer_sk 
,ca_state as ctr_state, 
sum(wr_return_amt) as ctr_total_return 
from web_returns 
,date_dim 
,customer_address 
where wr_returned_date_sk = d_date_sk
Page | 93 
and d_year =2001 
and wr_returning_addr_sk = ca_address_sk 
group by wr_returning_customer_sk 
,ca_state) 
select c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag 
,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address 
,c_last_review_date,ctr_total_return 
from customer_total_return ctr1 
,customer_address 
,customer 
where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2 
from customer_total_return ctr2 
where ctr1.ctr_state = ctr2.ctr_state) 
and ca_address_sk = c_current_addr_sk 
and ca_state = 'NE' 
and ctr1.ctr_customer_sk = c_customer_sk 
order by c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag 
,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address 
,c_last_review_date,ctr_total_return 
fetch first 100 rows only; 
-- end query 75 in stream 0 using template query30.tpl 
-- start query 76 in stream 0 using template query74.tpl and seed 860979922 
with year_total as ( 
select c_customer_id customer_id 
,c_first_name customer_first_name 
,c_last_name customer_last_name 
,d_year as year 
,stddev_samp(ss_net_paid) year_total 
,'s' sale_type 
from customer 
,store_sales 
,date_dim 
where c_customer_sk = ss_customer_sk 
and ss_sold_date_sk = d_date_sk 
and d_year in (2000,2000+1) 
group by c_customer_id 
,c_first_name 
,c_last_name 
,d_year 
union all 
select c_customer_id customer_id 
,c_first_name customer_first_name 
,c_last_name customer_last_name 
,d_year as year 
,stddev_samp(ws_net_paid) year_total 
,'w' sale_type 
from customer 
,web_sales 
,date_dim 
where c_customer_sk = ws_bill_customer_sk 
and ws_sold_date_sk = d_date_sk 
and d_year in (2000,2000+1) 
group by c_customer_id 
,c_first_name 
,c_last_name 
,d_year 
) 
select 
t_s_secyear.customer_id, t_s_secyear.customer_first_name, t_s_secyear.customer_last_name 
from year_total t_s_firstyear 
,year_total t_s_secyear 
,year_total t_w_firstyear 
,year_total t_w_secyear 
where t_s_secyear.customer_id = t_s_firstyear.customer_id 
and t_s_firstyear.customer_id = t_w_secyear.customer_id 
and t_s_firstyear.customer_id = t_w_firstyear.customer_id 
and t_s_firstyear.sale_type = 's' 
and t_w_firstyear.sale_type = 'w' 
and t_s_secyear.sale_type = 's' 
and t_w_secyear.sale_type = 'w' 
and t_s_firstyear.year = 2000 
and t_s_secyear.year = 2000+1 
and t_w_firstyear.year = 2000 
and t_w_secyear.year = 2000+1 
and t_s_firstyear.year_total > 0 
and t_w_firstyear.year_total > 0 
and case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else null end 
> case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else null end 
order by 3,2,1
Page | 94 
fetch first 100 rows only; 
-- end query 76 in stream 0 using template query74.tpl 
-- start query 77 in stream 0 using template query87.tpl and seed 1235996660 
select count(*) 
from ((select distinct c_last_name, c_first_name, d_date 
from store_sales, date_dim, customer 
where store_sales.ss_sold_date_sk = date_dim.d_date_sk 
and store_sales.ss_customer_sk = customer.c_customer_sk 
and d_month_seq between 1179 and 1179+11) 
except 
(select distinct c_last_name, c_first_name, d_date 
from catalog_sales, date_dim, customer 
where catalog_sales.cs_sold_date_sk = date_dim.d_date_sk 
and catalog_sales.cs_bill_customer_sk = customer.c_customer_sk 
and d_month_seq between 1179 and 1179+11) 
except 
(select distinct c_last_name, c_first_name, d_date 
from web_sales, date_dim, customer 
where web_sales.ws_sold_date_sk = date_dim.d_date_sk 
and web_sales.ws_bill_customer_sk = customer.c_customer_sk 
and d_month_seq between 1179 and 1179+11) 
) cool_cust 
; 
-- end query 77 in stream 0 using template query87.tpl 
-- start query 78 in stream 0 using template query77.tpl and seed 1736758238 
with ss as 
(select s_store_sk, 
sum(ss_ext_sales_price) as sales, 
sum(ss_net_profit) as profit 
from store_sales, 
date_dim, 
store 
where ss_sold_date_sk = d_date_sk 
and d_date between cast('2002-08-25' as date) 
and (cast('2002-08-25' as date) + 30 days) 
and ss_store_sk = s_store_sk 
group by s_store_sk) 
, 
sr as 
(select s_store_sk, 
sum(sr_return_amt) as returns, 
sum(sr_net_loss) as profit_loss 
from store_returns, 
date_dim, 
store 
where sr_returned_date_sk = d_date_sk 
and d_date between cast('2002-08-25' as date) 
and (cast('2002-08-25' as date) + 30 days) 
and sr_store_sk = s_store_sk 
group by s_store_sk), 
cs as 
(select cs_call_center_sk, 
sum(cs_ext_sales_price) as sales, 
sum(cs_net_profit) as profit 
from catalog_sales, 
date_dim 
where cs_sold_date_sk = d_date_sk 
and d_date between cast('2002-08-25' as date) 
and (cast('2002-08-25' as date) + 30 days) 
group by cs_call_center_sk 
), 
cr as 
(select 
sum(cr_return_amount) as returns, 
sum(cr_net_loss) as profit_loss 
from catalog_returns, 
date_dim 
where cr_returned_date_sk = d_date_sk 
and d_date between cast('2002-08-25' as date) 
and (cast('2002-08-25' as date) + 30 days) 
), 
ws as 
( select wp_web_page_sk, 
sum(ws_ext_sales_price) as sales, 
sum(ws_net_profit) as profit 
from web_sales, 
date_dim, 
web_page
Page | 95 
where ws_sold_date_sk = d_date_sk 
and d_date between cast('2002-08-25' as date) 
and (cast('2002-08-25' as date) + 30 days) 
and ws_web_page_sk = wp_web_page_sk 
group by wp_web_page_sk), 
wr as 
(select wp_web_page_sk, 
sum(wr_return_amt) as returns, 
sum(wr_net_loss) as profit_loss 
from web_returns, 
date_dim, 
web_page 
where wr_returned_date_sk = d_date_sk 
and d_date between cast('2002-08-25' as date) 
and (cast('2002-08-25' as date) + 30 days) 
and wr_web_page_sk = wp_web_page_sk 
group by wp_web_page_sk) 
select channel 
, id 
, sum(sales) as sales 
, sum(returns) as returns 
, sum(profit) as profit 
from 
(select 'store channel' as channel 
, ss.s_store_sk as id 
, sales 
, coalesce(returns, 0) as returns 
, (profit - coalesce(profit_loss,0)) as profit 
from ss left join sr 
on ss.s_store_sk = sr.s_store_sk 
union all 
select 'catalog channel' as channel 
, cs_call_center_sk as id 
, sales 
, returns 
, (profit - profit_loss) as profit 
from cs 
, cr 
union all 
select 'web channel' as channel 
, ws.wp_web_page_sk as id 
, sales 
, coalesce(returns, 0) returns 
, (profit - coalesce(profit_loss,0)) as profit 
from ws left join wr 
on ws.wp_web_page_sk = wr.wp_web_page_sk 
) x 
group by rollup (channel, id) 
order by channel 
,id 
fetch first 100 rows only; 
-- end query 78 in stream 0 using template query77.tpl 
-- start query 79 in stream 0 using template query73.tpl and seed 1878398230 
select c_last_name 
,c_first_name 
,c_salutation 
,c_preferred_cust_flag 
,ss_ticket_number 
,cnt from 
(select ss_ticket_number 
,ss_customer_sk 
,count(*) cnt 
from store_sales,date_dim,store,household_demographics 
where store_sales.ss_sold_date_sk = date_dim.d_date_sk 
and store_sales.ss_store_sk = store.s_store_sk 
and store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk 
and date_dim.d_dom between 1 and 2 
and (household_demographics.hd_buy_potential = '>10000' or 
household_demographics.hd_buy_potential = '0-500') 
and household_demographics.hd_vehicle_count > 0 
and case when household_demographics.hd_vehicle_count > 0 then 
household_demographics.hd_dep_count/ household_demographics.hd_vehicle_count else null end > 1 
and date_dim.d_year in (2000,2000+1,2000+2) 
and store.s_county in ('Red River Parish','Wilkinson County','Cocke County','Jack County') 
group by ss_ticket_number,ss_customer_sk) dj,customer 
where ss_customer_sk = c_customer_sk 
and cnt between 1 and 5 
order by cnt desc;
Page | 96 
-- end query 79 in stream 0 using template query73.tpl 
-- start query 80 in stream 0 using template query84.tpl and seed 915340114 
select c_customer_id as customer_id 
,c_last_name || ', ' || coalesce(c_first_name,'') as customername 
from customer 
,customer_address 
,customer_demographics 
,household_demographics 
,income_band 
,store_returns 
where ca_city = 'Salem' 
and c_current_addr_sk = ca_address_sk 
and ib_lower_bound >= 55893 
and ib_upper_bound <= 55893 + 50000 
and ib_income_band_sk = hd_income_band_sk 
and cd_demo_sk = c_current_cdemo_sk 
and hd_demo_sk = c_current_hdemo_sk 
and sr_cdemo_sk = cd_demo_sk 
order by c_customer_id 
fetch first 100 rows only; 
-- end query 80 in stream 0 using template query84.tpl 
-- start query 81 in stream 0 using template query54.tpl and seed 87886004 
with my_customers as ( 
select distinct c_customer_sk 
, c_current_addr_sk 
from 
( select cs_sold_date_sk sold_date_sk, 
cs_bill_customer_sk customer_sk, 
cs_item_sk item_sk 
from catalog_sales 
union all 
select ws_sold_date_sk sold_date_sk, 
ws_bill_customer_sk customer_sk, 
ws_item_sk item_sk 
from web_sales 
) cs_or_ws_sales, 
item, 
date_dim, 
customer 
where sold_date_sk = d_date_sk 
and item_sk = i_item_sk 
and i_category = 'Sports' 
and i_class = 'outdoor' 
and c_customer_sk = cs_or_ws_sales.customer_sk 
and d_moy = 7 
and d_year = 2002 
) 
, my_revenue as ( 
select c_customer_sk, 
sum(ss_ext_sales_price) as revenue 
from my_customers, 
store_sales, 
customer_address, 
store, 
date_dim 
where c_current_addr_sk = ca_address_sk 
and ca_county = s_county 
and ca_state = s_state 
and ss_sold_date_sk = d_date_sk 
and c_customer_sk = ss_customer_sk 
and d_month_seq between (select distinct d_month_seq+1 
from date_dim where d_year = 2002 and d_moy = 7) 
and (select distinct d_month_seq+3 
from date_dim where d_year = 2002 and d_moy = 7) 
group by c_customer_sk 
) 
, segments as 
(select cast((revenue/50) as int) as segment 
from my_revenue 
) 
select segment, count(*) as num_customers, segment*50 as segment_base 
from segments 
group by segment 
order by segment, num_customers 
fetch first 100 rows only; 
-- end query 81 in stream 0 using template query54.tpl 
-- start query 82 in stream 0 using template query55.tpl and seed 792107888 
select i_brand_id brand_id, i_brand brand, 
sum(ss_ext_sales_price) ext_price 
from date_dim, store_sales, item 
where d_date_sk = ss_sold_date_sk 
and ss_item_sk = i_item_sk
Page | 97 
and i_manager_id=40 
and d_moy=11 
and d_year=2001 
group by i_brand, i_brand_id 
order by ext_price desc, i_brand_id 
fetch first 100 rows only ; 
-- end query 82 in stream 0 using template query55.tpl 
-- start query 83 in stream 0 using template query56.tpl and seed 912538819 
with ss as ( 
select i_item_id,sum(ss_ext_sales_price) total_sales 
from 
store_sales, 
date_dim, 
customer_address, 
item 
where i_item_id in (select 
i_item_id 
from item 
where i_color in ('magenta','tan','turquoise')) 
and ss_item_sk = i_item_sk 
and ss_sold_date_sk = d_date_sk 
and d_year = 2002 
and d_moy = 4 
and ss_addr_sk = ca_address_sk 
and ca_gmt_offset = -6 
group by i_item_id), 
cs as ( 
select i_item_id,sum(cs_ext_sales_price) total_sales 
from 
catalog_sales, 
date_dim, 
customer_address, 
item 
where 
i_item_id in (select 
i_item_id 
from item 
where i_color in ('magenta','tan','turquoise')) 
and cs_item_sk = i_item_sk 
and cs_sold_date_sk = d_date_sk 
and d_year = 2002 
and d_moy = 4 
and cs_bill_addr_sk = ca_address_sk 
and ca_gmt_offset = -6 
group by i_item_id), 
ws as ( 
select i_item_id,sum(ws_ext_sales_price) total_sales 
from 
web_sales, 
date_dim, 
customer_address, 
item 
where 
i_item_id in (select 
i_item_id 
from item 
where i_color in ('magenta','tan','turquoise')) 
and ws_item_sk = i_item_sk 
and ws_sold_date_sk = d_date_sk 
and d_year = 2002 
and d_moy = 4 
and ws_bill_addr_sk = ca_address_sk 
and ca_gmt_offset = -6 
group by i_item_id) 
select i_item_id ,sum(total_sales) total_sales 
from (select * from ss 
union all 
select * from cs 
union all 
select * from ws) tmp1 
group by i_item_id 
order by total_sales 
fetch first 100 rows only; 
-- end query 83 in stream 0 using template query56.tpl 
-- start query 84 in stream 0 using template query2.tpl and seed 1816850892 
with wscs as 
(select sold_date_sk 
,sales_price 
from (select ws_sold_date_sk sold_date_sk 
,ws_ext_sales_price sales_price 
from web_sales) x 
union all 
(select cs_sold_date_sk sold_date_sk 
,cs_ext_sales_price sales_price 
from catalog_sales)), 
wswscs as
Page | 98 
(select d_week_seq, 
sum(case when (d_day_name='Sunday') then sales_price else null end) sun_sales, 
sum(case when (d_day_name='Monday') then sales_price else null end) mon_sales, 
sum(case when (d_day_name='Tuesday') then sales_price else null end) tue_sales, 
sum(case when (d_day_name='Wednesday') then sales_price else null end) wed_sales, 
sum(case when (d_day_name='Thursday') then sales_price else null end) thu_sales, 
sum(case when (d_day_name='Friday') then sales_price else null end) fri_sales, 
sum(case when (d_day_name='Saturday') then sales_price else null end) sat_sales 
from wscs 
,date_dim 
where d_date_sk = sold_date_sk 
group by d_week_seq) 
select d_week_seq1 
,round(sun_sales1/sun_sales2,2) 
,round(mon_sales1/mon_sales2,2) 
,round(tue_sales1/tue_sales2,2) 
,round(wed_sales1/wed_sales2,2) 
,round(thu_sales1/thu_sales2,2) 
,round(fri_sales1/fri_sales2,2) 
,round(sat_sales1/sat_sales2,2) 
from 
(select wswscs.d_week_seq d_week_seq1 
,sun_sales sun_sales1 
,mon_sales mon_sales1 
,tue_sales tue_sales1 
,wed_sales wed_sales1 
,thu_sales thu_sales1 
,fri_sales fri_sales1 
,sat_sales sat_sales1 
from wswscs,date_dim 
where date_dim.d_week_seq = wswscs.d_week_seq and 
d_year = 2000) y, 
(select wswscs.d_week_seq d_week_seq2 
,sun_sales sun_sales2 
,mon_sales mon_sales2 
,tue_sales tue_sales2 
,wed_sales wed_sales2 
,thu_sales thu_sales2 
,fri_sales fri_sales2 
,sat_sales sat_sales2 
from wswscs 
,date_dim 
where date_dim.d_week_seq = wswscs.d_week_seq and 
d_year = 2000+1) z 
where d_week_seq1=d_week_seq2-53 
order by d_week_seq1; 
-- end query 84 in stream 0 using template query2.tpl 
-- start query 85 in stream 0 using template query26.tpl and seed 646470989 
select i_item_id, 
avg(cast(cs_quantity as double)) agg1, 
avg(cs_list_price) agg2, 
avg(cs_coupon_amt) agg3, 
avg(cs_sales_price) agg4 
from catalog_sales, customer_demographics, date_dim, item, promotion 
where cs_sold_date_sk = d_date_sk and 
cs_item_sk = i_item_sk and 
cs_bill_cdemo_sk = cd_demo_sk and 
cs_promo_sk = p_promo_sk and 
cd_gender = 'F' and 
cd_marital_status = 'M' and 
cd_education_status = '2 yr Degree' and 
(p_channel_email = 'N' or p_channel_event = 'N') and 
d_year = 1999 
group by i_item_id 
order by i_item_id 
fetch first 100 rows only; 
-- end query 85 in stream 0 using template query26.tpl 
-- start query 86 in stream 0 using template query40.tpl and seed 2129842400 
select 
w_state 
,i_item_id 
,sum(case when (cast(d_date as date) < cast ('1998-06-27' as date)) 
then cs_sales_price - coalesce(cr_refunded_cash,0) else 0 end) as sales_before 
,sum(case when (cast(d_date as date) >= cast ('1998-06-27' as date)) 
then cs_sales_price - coalesce(cr_refunded_cash,0) else 0 end) as sales_after 
from 
catalog_sales left outer join catalog_returns on
Page | 99 
(cs_order_number = cr_order_number 
and cs_item_sk = cr_item_sk) 
,warehouse 
,item 
,date_dim 
where 
i_current_price between 0.99 and 1.49 
and i_item_sk = cs_item_sk 
and cs_warehouse_sk = w_warehouse_sk 
and cs_sold_date_sk = d_date_sk 
and d_date between (cast ('1998-06-27' as date) - 30 days) 
and (cast ('1998-06-27' as date) + 30 days) 
group by 
w_state,i_item_id 
order by w_state,i_item_id 
fetch first 100 rows only; 
-- end query 86 in stream 0 using template query40.tpl 
-- start query 87 in stream 0 using template query72.tpl and seed 1882576156 
select i_item_desc 
,w_warehouse_name 
,d1.d_week_seq 
,count(case when p_promo_sk is null then 1 else 0 end) no_promo 
,count(case when p_promo_sk is not null then 1 else 0 end) promo 
,count(*) total_cnt 
from catalog_sales 
join inventory on (cs_item_sk = inv_item_sk) 
join warehouse on (w_warehouse_sk=inv_warehouse_sk) 
join item on (i_item_sk = cs_item_sk) 
join customer_demographics on (cs_bill_cdemo_sk = cd_demo_sk) 
join household_demographics on (cs_bill_hdemo_sk = hd_demo_sk) 
join date_dim d1 on (cs_sold_date_sk = d1.d_date_sk) 
join date_dim d2 on (inv_date_sk = d2.d_date_sk) 
join date_dim d3 on (cs_ship_date_sk = d3.d_date_sk) 
left outer join promotion on (cs_promo_sk=p_promo_sk) 
left outer join catalog_returns on (cr_item_sk = cs_item_sk and cr_order_number = cs_order_number) 
where d1.d_week_seq = d2.d_week_seq 
and inv_quantity_on_hand < cs_quantity 
and d3.d_date > cast(d1.d_date as date) + 5 days 
and hd_buy_potential = '>10000' 
and d1.d_year = 1999 
and hd_buy_potential = '>10000' 
and cd_marital_status = 'U' 
and d1.d_year = 1999 
group by i_item_desc,w_warehouse_name,d1.d_week_seq 
order by total_cnt desc, i_item_desc, w_warehouse_name, d_week_seq 
fetch first 100 rows only; 
-- end query 87 in stream 0 using template query72.tpl 
-- start query 88 in stream 0 using template query53.tpl and seed 1226866603 
select * from 
(select i_manufact_id, 
sum(ss_sales_price) sum_sales, 
avg(sum(ss_sales_price)) over (partition by i_manufact_id) avg_quarterly_sales 
from item, store_sales, date_dim, store 
where ss_item_sk = i_item_sk and 
ss_sold_date_sk = d_date_sk and 
ss_store_sk = s_store_sk and 
d_month_seq in (1178,1178+1,1178+2,1178+3,1178+4,1178+5,1178+6,1178+7,1178+8,1178+9,1178+10,1178+11) and 
((i_category in ('Books','Children','Electronics') and 
i_class in ('personal','portable','reference','self-help') and 
i_brand in ('scholaramalgamalg #14','scholaramalgamalg #7', 
'exportiunivamalg #9','scholaramalgamalg #9')) 
or(i_category in ('Women','Music','Men') and 
i_class in ('accessories','classical','fragrances','pants') and 
i_brand in ('amalgimporto #1','edu packscholar #1','exportiimporto #1', 
'importoamalg #1'))) 
group by i_manufact_id, d_qoy ) tmp1 
where case when avg_quarterly_sales > 0 
then abs (sum_sales - avg_quarterly_sales)/ avg_quarterly_sales 
else null end > 0.1 
order by avg_quarterly_sales, 
sum_sales, 
i_manufact_id 
fetch first 100 rows only;
Page | 100 
-- end query 88 in stream 0 using template query53.tpl 
-- start query 89 in stream 0 using template query79.tpl and seed 1825663721 
select 
c_last_name,c_first_name,substr(s_city,1,30),ss_ticket_number,amt,profit 
from 
(select ss_ticket_number 
,ss_customer_sk 
,store.s_city 
,sum(ss_coupon_amt) amt 
,sum(ss_net_profit) profit 
from store_sales,date_dim,store,household_demographics 
where store_sales.ss_sold_date_sk = date_dim.d_date_sk 
and store_sales.ss_store_sk = store.s_store_sk 
and store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk 
and (household_demographics.hd_dep_count = 1 or household_demographics.hd_vehicle_count > -1) 
and date_dim.d_dow = 1 
and date_dim.d_year in (1998,1998+1,1998+2) 
and store.s_number_employees between 200 and 295 
group by ss_ticket_number,ss_customer_sk,ss_addr_sk,store.s_city) ms,customer 
where ss_customer_sk = c_customer_sk 
order by c_last_name,c_first_name,substr(s_city,1,30), profit 
fetch first 100 rows only; 
-- end query 89 in stream 0 using template query79.tpl 
-- start query 90 in stream 0 using template query18.tpl and seed 1875871513 
select i_item_id, 
ca_country, 
ca_state, 
ca_county, 
avg( cast(cs_quantity as numeric(12,2))) agg1, 
avg( cast(cs_list_price as numeric(12,2))) agg2, 
avg( cast(cs_coupon_amt as numeric(12,2))) agg3, 
avg( cast(cs_sales_price as numeric(12,2))) agg4, 
avg( cast(cs_net_profit as numeric(12,2))) agg5, 
avg( cast(c_birth_year as numeric(12,2))) agg6, 
avg( cast(cd1.cd_dep_count as numeric(12,2))) agg7 
from catalog_sales, customer_demographics cd1, 
customer_demographics cd2, customer, customer_address, date_dim, item 
where cs_sold_date_sk = d_date_sk and 
cs_item_sk = i_item_sk and 
cs_bill_cdemo_sk = cd1.cd_demo_sk and 
cs_bill_customer_sk = c_customer_sk and 
cd1.cd_gender = 'F' and 
cd1.cd_education_status = 'Advanced Degree' and 
c_current_cdemo_sk = cd2.cd_demo_sk and 
c_current_addr_sk = ca_address_sk and 
c_birth_month in (2,6,7,11,1,8) and 
d_year = 2001 and 
ca_state in ('OH','MT','PA' 
,'WV','NC','CO','FL') 
group by rollup (i_item_id, ca_country, ca_state, ca_county) 
order by ca_country, 
ca_state, 
ca_county, 
i_item_id 
fetch first 100 rows only; 
-- end query 90 in stream 0 using template query18.tpl 
-- start query 91 in stream 0 using template query13.tpl and seed 512374064 
select avg(ss_quantity) 
,avg(ss_ext_sales_price) 
,avg(ss_ext_wholesale_cost) 
,sum(ss_ext_wholesale_cost) 
from store_sales 
,store 
,customer_demographics 
,household_demographics 
,customer_address 
,date_dim 
where s_store_sk = ss_store_sk 
and ss_sold_date_sk = d_date_sk and d_year = 2001 
and((ss_hdemo_sk=hd_demo_sk 
and cd_demo_sk = ss_cdemo_sk 
and cd_marital_status = 'W' 
and cd_education_status = 'College'
Page | 101 
and ss_sales_price between 100.00 and 150.00 
and hd_dep_count = 3 
)or 
(ss_hdemo_sk=hd_demo_sk 
and cd_demo_sk = ss_cdemo_sk 
and cd_marital_status = 'D' 
and cd_education_status = 'Primary' 
and ss_sales_price between 50.00 and 100.00 
and hd_dep_count = 1 
) or 
(ss_hdemo_sk=hd_demo_sk 
and cd_demo_sk = ss_cdemo_sk 
and cd_marital_status = 'S' 
and cd_education_status = 'Advanced Degree' 
and ss_sales_price between 150.00 and 200.00 
and hd_dep_count = 1 
)) 
and((ss_addr_sk = ca_address_sk 
and ca_country = 'United States' 
and ca_state in ('VA', 'WV', 'LA') 
and ss_net_profit between 100 and 200 
) or 
(ss_addr_sk = ca_address_sk 
and ca_country = 'United States' 
and ca_state in ('NV', 'SC', 'IL') 
and ss_net_profit between 150 and 300 
) or 
(ss_addr_sk = ca_address_sk 
and ca_country = 'United States' 
and ca_state in ('TN', 'GA', 'MA') 
and ss_net_profit between 50 and 250 
)) 
; 
-- end query 91 in stream 0 using template query13.tpl 
-- start query 92 in stream 0 using template query24.tpl and seed 1636591044 
with ssales as 
(select c_last_name 
,c_first_name 
,s_store_name 
,ca_state 
,s_state 
,i_color 
,i_current_price 
,i_manager_id 
,i_units 
,i_size 
,sum(ss_net_profit) netpaid 
from store_sales 
,store_returns 
,store 
,item 
,customer 
,customer_address 
where ss_ticket_number = sr_ticket_number 
and ss_item_sk = sr_item_sk 
and ss_customer_sk = c_customer_sk 
and ss_item_sk = i_item_sk 
and ss_store_sk = s_store_sk 
and c_birth_country = upper(ca_country) 
and s_zip = ca_zip 
and s_market_id=5 
group by c_last_name 
,c_first_name 
,s_store_name 
,ca_state 
,s_state 
,i_color 
,i_current_price 
,i_manager_id 
,i_units 
,i_size) 
select c_last_name 
,c_first_name 
,s_store_name 
,sum(netpaid) paid 
from ssales 
where i_color = 'deep' 
group by c_last_name 
,c_first_name 
,s_store_name 
having sum(netpaid) > (select 0.05*avg(netpaid) 
from ssales) 
; 
with ssales as 
(select c_last_name 
,c_first_name 
,s_store_name 
,ca_state 
,s_state 
,i_color
Page | 102 
,i_current_price 
,i_manager_id 
,i_units 
,i_size 
,sum(ss_net_profit) netpaid 
from store_sales 
,store_returns 
,store 
,item 
,customer 
,customer_address 
where ss_ticket_number = sr_ticket_number 
and ss_item_sk = sr_item_sk 
and ss_customer_sk = c_customer_sk 
and ss_item_sk = i_item_sk 
and ss_store_sk = s_store_sk 
and c_birth_country = upper(ca_country) 
and s_zip = ca_zip 
and s_market_id = 5 
group by c_last_name 
,c_first_name 
,s_store_name 
,ca_state 
,s_state 
,i_color 
,i_current_price 
,i_manager_id 
,i_units 
,i_size) 
select c_last_name 
,c_first_name 
,s_store_name 
,sum(netpaid) paid 
from ssales 
where i_color = 'blush' 
group by c_last_name 
,c_first_name 
,s_store_name 
having sum(netpaid) > (select 0.05*avg(netpaid) 
from ssales) 
; 
-- end query 92 in stream 0 using template query24.tpl 
-- start query 93 in stream 0 using template query4.tpl and seed 48694754 
with year_total as ( 
select c_customer_id customer_id 
,c_first_name customer_first_name 
,c_last_name customer_last_name 
,c_preferred_cust_flag customer_preferred_cust_flag 
,c_birth_country customer_birth_country 
,c_login customer_login 
,c_email_address customer_email_address 
,d_year dyear 
,sum(((ss_ext_list_price- ss_ext_wholesale_cost- ss_ext_discount_amt)+ss_ext_sales_price)/2) year_total 
,'s' sale_type 
from customer 
,store_sales 
,date_dim 
where c_customer_sk = ss_customer_sk 
and ss_sold_date_sk = d_date_sk 
group by c_customer_id 
,c_first_name 
,c_last_name 
,c_preferred_cust_flag 
,c_birth_country 
,c_login 
,c_email_address 
,d_year 
union all 
select c_customer_id customer_id 
,c_first_name customer_first_name 
,c_last_name customer_last_name 
,c_preferred_cust_flag customer_preferred_cust_flag 
,c_birth_country customer_birth_country 
,c_login customer_login 
,c_email_address customer_email_address 
,d_year dyear 
,sum((((cs_ext_list_price- cs_ext_wholesale_cost- cs_ext_discount_amt)+cs_ext_sales_price)/2) ) year_total 
,'c' sale_type 
from customer 
,catalog_sales 
,date_dim 
where c_customer_sk = cs_bill_customer_sk 
and cs_sold_date_sk = d_date_sk 
group by c_customer_id 
,c_first_name
Page | 103 
,c_last_name 
,c_preferred_cust_flag 
,c_birth_country 
,c_login 
,c_email_address 
,d_year 
union all 
select c_customer_id customer_id 
,c_first_name customer_first_name 
,c_last_name customer_last_name 
,c_preferred_cust_flag customer_preferred_cust_flag 
,c_birth_country customer_birth_country 
,c_login customer_login 
,c_email_address customer_email_address 
,d_year dyear 
,sum((((ws_ext_list_price- ws_ext_wholesale_cost- ws_ext_discount_amt)+ws_ext_sales_price)/2) ) year_total 
,'w' sale_type 
from customer 
,web_sales 
,date_dim 
where c_customer_sk = ws_bill_customer_sk 
and ws_sold_date_sk = d_date_sk 
group by c_customer_id 
,c_first_name 
,c_last_name 
,c_preferred_cust_flag 
,c_birth_country 
,c_login 
,c_email_address 
,d_year 
) 
select t_s_secyear.customer_birth_country 
from year_total t_s_firstyear 
,year_total t_s_secyear 
,year_total t_c_firstyear 
,year_total t_c_secyear 
,year_total t_w_firstyear 
,year_total t_w_secyear 
where t_s_secyear.customer_id = t_s_firstyear.customer_id 
and t_s_firstyear.customer_id = t_c_secyear.customer_id 
and t_s_firstyear.customer_id = t_c_firstyear.customer_id 
and t_s_firstyear.customer_id = t_w_firstyear.customer_id 
and t_s_firstyear.customer_id = t_w_secyear.customer_id 
and t_s_firstyear.sale_type = 's' 
and t_c_firstyear.sale_type = 'c' 
and t_w_firstyear.sale_type = 'w' 
and t_s_secyear.sale_type = 's' 
and t_c_secyear.sale_type = 'c' 
and t_w_secyear.sale_type = 'w' 
and t_s_firstyear.dyear = 2000 
and t_s_secyear.dyear = 2000+1 
and t_c_firstyear.dyear = 2000 
and t_c_secyear.dyear = 2000+1 
and t_w_firstyear.dyear = 2000 
and t_w_secyear.dyear = 2000+1 
and t_s_firstyear.year_total > 0 
and t_c_firstyear.year_total > 0 
and t_w_firstyear.year_total > 0 
and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end 
> case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else null end 
and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end 
> case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else null end 
order by t_s_secyear.customer_birth_country 
fetch first 100 rows only; 
-- end query 93 in stream 0 using template query4.tpl 
-- start query 94 in stream 0 using template query99.tpl and seed 505379346 
select 
substr(w_warehouse_name,1,20) 
,sm_type 
,cc_name 
,sum(case when (cs_ship_date_sk - cs_sold_date_sk <= 30 ) then 1 else 0 end) as "30 days" 
,sum(case when (cs_ship_date_sk - cs_sold_date_sk > 30) and 
(cs_ship_date_sk - cs_sold_date_sk <= 60) then 1 else 0 end ) as "31-60 days" 
,sum(case when (cs_ship_date_sk - cs_sold_date_sk > 60) and
Page | 104 
(cs_ship_date_sk - cs_sold_date_sk <= 90) then 1 else 0 end) as "61- 90 days" 
,sum(case when (cs_ship_date_sk - cs_sold_date_sk > 90) and 
(cs_ship_date_sk - cs_sold_date_sk <= 120) then 1 else 0 end) as "91-120 days" 
,sum(case when (cs_ship_date_sk - cs_sold_date_sk > 120) then 1 else 0 end) as ">120 days" 
from 
catalog_sales 
,warehouse 
,ship_mode 
,call_center 
,date_dim 
where 
d_month_seq between 1208 and 1208 + 11 
and cs_ship_date_sk = d_date_sk 
and cs_warehouse_sk = w_warehouse_sk 
and cs_ship_mode_sk = sm_ship_mode_sk 
and cs_call_center_sk = cc_call_center_sk 
group by 
substr(w_warehouse_name,1,20) 
,sm_type 
,cc_name 
order by substr(w_warehouse_name,1,20) 
,sm_type 
,cc_name 
fetch first 100 rows only; 
-- end query 94 in stream 0 using template query99.tpl 
-- start query 95 in stream 0 using template query68.tpl and seed 372107550 
select c_last_name 
,c_first_name 
,ca_city 
,bought_city 
,ss_ticket_number 
,extended_price 
,extended_tax 
,list_price 
from (select ss_ticket_number 
,ss_customer_sk 
,ca_city bought_city 
,sum(ss_ext_sales_price) extended_price 
,sum(ss_ext_list_price) list_price 
,sum(ss_ext_tax) extended_tax 
from store_sales 
,date_dim 
,store 
,household_demographics 
,customer_address 
where store_sales.ss_sold_date_sk = date_dim.d_date_sk 
and store_sales.ss_store_sk = store.s_store_sk 
and store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk 
and store_sales.ss_addr_sk = customer_address.ca_address_sk 
and date_dim.d_dom between 1 and 2 
and (household_demographics.hd_dep_count = 0 or 
household_demographics.hd_vehicle_count= -1) 
and date_dim.d_year in (2000,2000+1,2000+2) 
and store.s_city in ('Arcadia','Friendship') 
group by ss_ticket_number 
,ss_customer_sk 
,ss_addr_sk,ca_city) dn 
,customer 
,customer_address current_addr 
where ss_customer_sk = c_customer_sk 
and customer.c_current_addr_sk = current_addr.ca_address_sk 
and current_addr.ca_city <> bought_city 
order by c_last_name 
,ss_ticket_number 
fetch first 100 rows only; 
-- end query 95 in stream 0 using template query68.tpl 
-- start query 96 in stream 0 using template query83.tpl and seed 1926747028 
with sr_items as 
(select i_item_id item_id, 
sum(sr_return_quantity) sr_item_qty 
from store_returns, 
item, 
date_dim 
where sr_item_sk = i_item_sk 
and d_date in 
(select d_date 
from date_dim 
where d_week_seq in
Page | 105 
(select d_week_seq 
from date_dim 
where d_date in ('1999-05-19','1999-08- 02','1999-11-08'))) 
and sr_returned_date_sk = d_date_sk 
group by i_item_id), 
cr_items as 
(select i_item_id item_id, 
sum(cr_return_quantity) cr_item_qty 
from catalog_returns, 
item, 
date_dim 
where cr_item_sk = i_item_sk 
and d_date in 
(select d_date 
from date_dim 
where d_week_seq in 
(select d_week_seq 
from date_dim 
where d_date in ('1999-05-19','1999-08- 02','1999-11-08'))) 
and cr_returned_date_sk = d_date_sk 
group by i_item_id), 
wr_items as 
(select i_item_id item_id, 
sum(wr_return_quantity) wr_item_qty 
from web_returns, 
item, 
date_dim 
where wr_item_sk = i_item_sk 
and d_date in 
(select d_date 
from date_dim 
where d_week_seq in 
(select d_week_seq 
from date_dim 
where d_date in ('1999-05- 19','1999-08-02','1999-11-08'))) 
and wr_returned_date_sk = d_date_sk 
group by i_item_id) 
select sr_items.item_id 
,sr_item_qty 
,cast(sr_item_qty as double)/(cast(sr_item_qty+cr_item_qty+wr_item_qty as double))/3.0 * 100 sr_dev 
,cr_item_qty 
,cast(cr_item_qty as double)/(cast(sr_item_qty+cr_item_qty+wr_item_qty as double))/3.0 * 100 cr_dev 
,wr_item_qty 
,cast(wr_item_qty as double)/(cast(sr_item_qty+cr_item_qty+wr_item_qty as double))/3.0 * 100 wr_dev 
,(sr_item_qty+cr_item_qty+wr_item_qty)/3.0 average 
from sr_items 
,cr_items 
,wr_items 
where sr_items.item_id=cr_items.item_id 
and sr_items.item_id=wr_items.item_id 
order by sr_items.item_id 
,sr_item_qty 
fetch first 100 rows only; 
-- end query 96 in stream 0 using template query83.tpl 
-- start query 97 in stream 0 using template query61.tpl and seed 1235477058 
select promotions,total,cast(promotions as decimal(15,4))/cast(total as decimal(15,4))*100 
from 
(select sum(ss_ext_sales_price) promotions 
from store_sales 
,store 
,promotion 
,date_dim 
,customer 
,customer_address 
,item 
where ss_sold_date_sk = d_date_sk 
and ss_store_sk = s_store_sk 
and ss_promo_sk = p_promo_sk 
and ss_customer_sk= c_customer_sk 
and ca_address_sk = c_current_addr_sk 
and ss_item_sk = i_item_sk 
and ca_gmt_offset = -7 
and i_category = 'Jewelry' 
and (p_channel_dmail = 'Y' or p_channel_email = 'Y' or p_channel_tv = 'Y') 
and s_gmt_offset = -7 
and d_year = 2001 
and d_moy = 11) promotional_sales, 
(select sum(ss_ext_sales_price) total 
from store_sales 
,store 
,date_dim 
,customer 
,customer_address
Page | 106 
,item 
where ss_sold_date_sk = d_date_sk 
and ss_store_sk = s_store_sk 
and ss_customer_sk= c_customer_sk 
and ca_address_sk = c_current_addr_sk 
and ss_item_sk = i_item_sk 
and ca_gmt_offset = -7 
and i_category = 'Jewelry' 
and s_gmt_offset = -7 
and d_year = 2001 
and d_moy = 11) all_sales 
order by promotions, total 
fetch first 100 rows only; 
-- end query 97 in stream 0 using template query61.tpl 
-- start query 98 in stream 0 using template query5.tpl and seed 1097248849 
with ssr as 
(select s_store_id, 
sum(sales_price) as sales, 
sum(profit) as profit, 
sum(return_amt) as returns, 
sum(net_loss) as profit_loss 
from 
( select ss_store_sk as store_sk, 
ss_sold_date_sk as date_sk, 
ss_ext_sales_price as sales_price, 
ss_net_profit as profit, 
cast(0 as decimal(7,2)) as return_amt, 
cast(0 as decimal(7,2)) as net_loss 
from store_sales 
union all 
select sr_store_sk as store_sk, 
sr_returned_date_sk as date_sk, 
cast(0 as decimal(7,2)) as sales_price, 
cast(0 as decimal(7,2)) as profit, 
sr_return_amt as return_amt, 
sr_net_loss as net_loss 
from store_returns 
) salesreturns, 
date_dim, 
store 
where date_sk = d_date_sk 
and d_date between cast('2001-08-21' as date) 
and (cast('2001-08-21' as date) + 14 days) 
and store_sk = s_store_sk 
group by s_store_id) 
, 
csr as 
(select cp_catalog_page_id, 
sum(sales_price) as sales, 
sum(profit) as profit, 
sum(return_amt) as returns, 
sum(net_loss) as profit_loss 
from 
( select cs_catalog_page_sk as page_sk, 
cs_sold_date_sk as date_sk, 
cs_ext_sales_price as sales_price, 
cs_net_profit as profit, 
cast(0 as decimal(7,2)) as return_amt, 
cast(0 as decimal(7,2)) as net_loss 
from catalog_sales 
union all 
select cr_catalog_page_sk as page_sk, 
cr_returned_date_sk as date_sk, 
cast(0 as decimal(7,2)) as sales_price, 
cast(0 as decimal(7,2)) as profit, 
cr_return_amount as return_amt, 
cr_net_loss as net_loss 
from catalog_returns 
) salesreturns, 
date_dim, 
catalog_page 
where date_sk = d_date_sk 
and d_date between cast('2001-08-21' as date) 
and (cast('2001-08-21' as date) + 14 days) 
and page_sk = cp_catalog_page_sk 
group by cp_catalog_page_id) 
, 
wsr as 
(select web_site_id, 
sum(sales_price) as sales, 
sum(profit) as profit, 
sum(return_amt) as returns, 
sum(net_loss) as profit_loss 
from 
( select ws_web_site_sk as wsr_web_site_sk, 
ws_sold_date_sk as date_sk, 
ws_ext_sales_price as sales_price, 
ws_net_profit as profit,
Page | 107 
cast(0 as decimal(7,2)) as return_amt, 
cast(0 as decimal(7,2)) as net_loss 
from web_sales 
union all 
select ws_web_site_sk as wsr_web_site_sk, 
wr_returned_date_sk as date_sk, 
cast(0 as decimal(7,2)) as sales_price, 
cast(0 as decimal(7,2)) as profit, 
wr_return_amt as return_amt, 
wr_net_loss as net_loss 
from web_returns left outer join web_sales on 
( wr_item_sk = ws_item_sk 
and wr_order_number = ws_order_number) 
) salesreturns, 
date_dim, 
web_site 
where date_sk = d_date_sk 
and d_date between cast('2001-08-21' as date) 
and (cast('2001-08-21' as date) + 14 days) 
and wsr_web_site_sk = web_site_sk 
group by web_site_id) 
select channel 
, id 
, sum(sales) as sales 
, sum(returns) as returns 
, sum(profit) as profit 
from 
(select 'store channel' as channel 
, 'store' || s_store_id as id 
, sales 
, returns 
, (profit - profit_loss) as profit 
from ssr 
union all 
select 'catalog channel' as channel 
, 'catalog_page' || cp_catalog_page_id as id 
, sales 
, returns 
, (profit - profit_loss) as profit 
from csr 
union all 
select 'web channel' as channel 
, 'web_site' || web_site_id as id 
, sales 
, returns 
, (profit - profit_loss) as profit 
from wsr 
) x 
group by rollup (channel, id) 
order by channel 
,id 
fetch first 100 rows only; 
-- end query 98 in stream 0 using template query5.tpl 
-- start query 99 in stream 0 using template query76.tpl and seed 164871687 
select channel, col_name, d_year, d_qoy, i_category, COUNT(*) sales_cnt, SUM(ext_sales_price) sales_amt FROM ( 
SELECT 'store' as channel, 'ss_addr_sk' col_name, d_year, d_qoy, i_category, ss_ext_sales_price ext_sales_price 
FROM store_sales, item, date_dim 
WHERE ss_addr_sk IS NULL 
AND ss_sold_date_sk=d_date_sk 
AND ss_item_sk=i_item_sk 
UNION ALL 
SELECT 'web' as channel, 'ws_ship_addr_sk' col_name, d_year, d_qoy, i_category, ws_ext_sales_price ext_sales_price 
FROM web_sales, item, date_dim 
WHERE ws_ship_addr_sk IS NULL 
AND ws_sold_date_sk=d_date_sk 
AND ws_item_sk=i_item_sk 
UNION ALL 
SELECT 'catalog' as channel, 'cs_ship_mode_sk' col_name, d_year, d_qoy, i_category, cs_ext_sales_price ext_sales_price 
FROM catalog_sales, item, date_dim 
WHERE cs_ship_mode_sk IS NULL 
AND cs_sold_date_sk=d_date_sk 
AND cs_item_sk=i_item_sk) foo 
GROUP BY channel, col_name, d_year, d_qoy, i_category 
ORDER BY channel, col_name, d_year, d_qoy, i_category 
fetch first 100 rows only; 
-- end query 99 in stream 0 using template query76.tpl
Page | 108 
Appendix F: Attestation Letter 
Benchmark sponsor: 
Berni Schiefer 
IBM 
8200 Warden Avenue 
Markham, Ontario, L6C 1C7 
October 24, 2014 
At IBM‟s request I verified the implementation and results of a 30TB Big Data Decision Support (Hadoop-DS) benchmark, with most features derived from the TPC-DS Benchmark. 
The Hadoop-DS benchmark was executed on the following configuration: 
Test Platform: 
IBM x3650BD - 17 Node Cluster 
Query Engine: 
IBM BigInsights Big SQL v3.0 
Operating System: 
Red Hat Enterprise Linux 6.4 
Configuration per node: 
CPUs 
2 x Intel Xeon Processor E5-2680 v2 (2.8 GHz, 25MB L3) 
Memory 
128GB (1866MHz DDR3) 
Storage 
10 x 2TB SATA 3.5” HDD & 1 x 128GB SATA 2.5” SSD (swap) 
The results were: 
Single-Stream Performance 
1,023 Hadoop-DS Qph@30TB 
Multi-Stream Performance 
2,274 Hadoop-DS Qph@30TB 
Multi-Stream Concurrency 
4 Streams 
Load Time 
37h 11m 10s 
While these results are for a non-TPC benchmark, they complied with the following subset of requirements from the latest version of the TPC-DS Benchmark standard: 
• The database schema was defined with the proper layout and data types 
• The database population was generated using the TPC provided dsdgen 
• The database was properly scaled to 30TB and populated accordingly 
• The auxiliary data structure requirements were met since none were defined 
• The database load time was properly measured and reported 
• The query input variables were generated by the TPC provided dsqgen 
• The execution times for queries were correctly measured and reported
Page | 109 
The following aspects of the Hadoop-DS benchmark were implemented within the spirit of the TPC-DS Benchmark: 
• All 99 queries were executed using the specified and unmodified query text or by applying minor modifications to the queries 
• Query answers were verified against the available validation answer sets 
The following features and requirements from the latest version of the TPC-DS Benchmark standard were not adhered to: 
• The defined referential integrity constraints were not enforced 
• The statistics collection did not meet the required limitations 
• The data persistence properties were not demonstrated 
• The data maintenance functions were neither implemented nor executed 
• A single throughput test was used to measure multi-user performance 
• The system pricing was not provided or reviewed 
• The report did not meet the defined format and content 
The executive summary and the benchmark report documenting the details of this Hadoop-DS benchmark execution were verified for accuracy. 
Respectfully Yours, 
François Raab, President

More Related Content

What's hot (20)

PPTX
SAP HANA Overview
Manjunath Pathapadu
 
DOCX
SAP HANA Overview
infovillesolutions
 
PDF
Determine your sizing requirements
Jaleel Ahmed Gulammohiddin
 
PDF
Suse linux enterprise_server_12_x_for_sap_applications_configuration_guide_fo...
Jaleel Ahmed Gulammohiddin
 
PPTX
Tips for Installing Cognos Analytics: Configuring and Installing the Server
Senturus
 
DOC
Resume..
Ramesh Gurajana
 
PPTX
Effective Integration of SAP MDM & BODS
NavneetGiria
 
PDF
SAP HANA – A Technical Snapshot
Debajit Banerjee
 
PPTX
SAP ASE 16 SP02 Performance Features
SAP Technology
 
DOC
Himanshu_Oracle_DBA_Resume
Himanshu Jain
 
PPTX
SharePoint and Large Scale SQL Deployments - NZSPC
guest7c2e070
 
PPT
Informatica Power Center - Workflow Manager
ZaranTech LLC
 
PDF
SAP HANA SPS10- Workload Management
SAP Technology
 
PPTX
SAP HANA Interview questions
IT LearnMore
 
PPTX
Cognos Analytics Performance Tuning: Tips & Tricks to Rev Performance
Senturus
 
DOCX
Thejokumar_Oracle_DBA_resume
Thejokumar M.
 
PPT
SAP HANA Overview
Sitaram Kotnis
 
DOCX
Satheesh Oracle DBA Resume
raghu Idrilservices
 
PDF
SAP IQ 16 Product Annoucement
Dobler Consulting
 
PDF
Errors in process chains
Siva Kollipara
 
SAP HANA Overview
Manjunath Pathapadu
 
SAP HANA Overview
infovillesolutions
 
Determine your sizing requirements
Jaleel Ahmed Gulammohiddin
 
Suse linux enterprise_server_12_x_for_sap_applications_configuration_guide_fo...
Jaleel Ahmed Gulammohiddin
 
Tips for Installing Cognos Analytics: Configuring and Installing the Server
Senturus
 
Resume..
Ramesh Gurajana
 
Effective Integration of SAP MDM & BODS
NavneetGiria
 
SAP HANA – A Technical Snapshot
Debajit Banerjee
 
SAP ASE 16 SP02 Performance Features
SAP Technology
 
Himanshu_Oracle_DBA_Resume
Himanshu Jain
 
SharePoint and Large Scale SQL Deployments - NZSPC
guest7c2e070
 
Informatica Power Center - Workflow Manager
ZaranTech LLC
 
SAP HANA SPS10- Workload Management
SAP Technology
 
SAP HANA Interview questions
IT LearnMore
 
Cognos Analytics Performance Tuning: Tips & Tricks to Rev Performance
Senturus
 
Thejokumar_Oracle_DBA_resume
Thejokumar M.
 
SAP HANA Overview
Sitaram Kotnis
 
Satheesh Oracle DBA Resume
raghu Idrilservices
 
SAP IQ 16 Product Annoucement
Dobler Consulting
 
Errors in process chains
Siva Kollipara
 

Viewers also liked (20)

PPT
”’I den svenska och tyska litteraturens mittpunkt’: Svenska Pommerns roll som...
Andreas Önnerfors
 
PPTX
Motivación laboral
alexander_hv
 
PPTX
ระบบสารสนเทศ
Petch Boonyakorn
 
PDF
2016 Results & Outlook
Total
 
PDF
Jupyter for Education: Beyond Gutenberg and Erasmus
Paco Nathan
 
PPTX
tarea 7 gabriel
Gabriel Ramírez
 
PPSX
Your moment is Waiting
rittujacob
 
PDF
Agile Data Science
Russell Jurney
 
PDF
JSON-LD Update
Gregg Kellogg
 
PDF
Agile analytics applications on hadoop
Russell Jurney
 
PDF
Enabling Multimodel Graphs with Apache TinkerPop
Jason Plurad
 
PPT
Agile Data Science: Building Hadoop Analytics Applications
Russell Jurney
 
PDF
ConsumerLab: The Self-Driving Future
Ericsson
 
PDF
SF Python Meetup: TextRank in Python
Paco Nathan
 
PDF
Agile Data Science 2.0
Russell Jurney
 
PDF
Introduction to PySpark
Russell Jurney
 
PDF
Zipcar
Alex Li
 
PDF
Agile Data Science 2.0 - Big Data Science Meetup
Russell Jurney
 
PDF
Modeling Social Data, Lecture 3: Data manipulation in R
jakehofman
 
PDF
Numbers that Actually Matter. Finding Your North Star
Mamoon Hamid
 
”’I den svenska och tyska litteraturens mittpunkt’: Svenska Pommerns roll som...
Andreas Önnerfors
 
Motivación laboral
alexander_hv
 
ระบบสารสนเทศ
Petch Boonyakorn
 
2016 Results & Outlook
Total
 
Jupyter for Education: Beyond Gutenberg and Erasmus
Paco Nathan
 
tarea 7 gabriel
Gabriel Ramírez
 
Your moment is Waiting
rittujacob
 
Agile Data Science
Russell Jurney
 
JSON-LD Update
Gregg Kellogg
 
Agile analytics applications on hadoop
Russell Jurney
 
Enabling Multimodel Graphs with Apache TinkerPop
Jason Plurad
 
Agile Data Science: Building Hadoop Analytics Applications
Russell Jurney
 
ConsumerLab: The Self-Driving Future
Ericsson
 
SF Python Meetup: TextRank in Python
Paco Nathan
 
Agile Data Science 2.0
Russell Jurney
 
Introduction to PySpark
Russell Jurney
 
Zipcar
Alex Li
 
Agile Data Science 2.0 - Big Data Science Meetup
Russell Jurney
 
Modeling Social Data, Lecture 3: Data manipulation in R
jakehofman
 
Numbers that Actually Matter. Finding Your North Star
Mamoon Hamid
 
Ad

Similar to IBM Hadoop-DS Benchmark Report - 30TB (20)

PDF
Sizing SAP on x86 IBM PureFlex with Reference Architecture
Doddi Priyambodo
 
PPT
13721876
Mehrdad Rastegar
 
PDF
Smarter Documentation: Shedding Light on the Black Box of Reporting Data
Kelly Raposo
 
PDF
Benchmarking Hadoop - Which hadoop sql engine leads the herd
Gord Sissons
 
PDF
Improve Predictability & Efficiency with Kanban Metrics using IBM Rational In...
Marc Nehme
 
PDF
Improving Predictability and Efficiency with Kanban Metrics using Rational In...
Paulo Lacerda
 
PDF
Iod 2013 Jackman Schwenger
MARIA N. SCHWENGER
 
PDF
Connect2014 Spot101: Cloud Readiness 101: Analyzing and Visualizing Your IT I...
panagenda
 
PDF
Highly successful performance tuning of an informix database
IBM_Info_Management
 
PDF
IBM Connect 2014 - AD206: Build Apps Rapidly by Leveraging Services from IBM ...
IBM Connections Developers
 
PDF
IBM Connect 2014 - AD206 - Build Apps Rapidly by Leveraging Services from IBM...
Niklas Heidloff
 
PDF
Ims13 ims tools ims v13 migration workshop - IMS UG May 2014 Sydney & Melbo...
Robert Hain
 
PDF
Spark working with a Cloud IDE: Notebook/Shiny Apps
Data Con LA
 
PDF
[IBM Pulse 2014] #1579 DevOps Technical Strategy and Roadmap
Daniel Berg
 
PDF
Informix REST API Tutorial
Brian Hughes
 
PDF
DIY Analytics with Apache Spark
Adam Roberts
 
PDF
IBM Connect 2014 - AD206: Build Apps Rapidly by Leveraging Services from IBM ...
IBM Connections Developers
 
PDF
ICS usergroup dev day2014_application development für die ibm smartcloud for ...
ICS User Group
 
PDF
Enabling a hardware accelerated deep learning data science experience for Apa...
Indrajit Poddar
 
PDF
Integrating BigInsights and Puredata system for analytics with query federati...
Seeling Cheung
 
Sizing SAP on x86 IBM PureFlex with Reference Architecture
Doddi Priyambodo
 
Smarter Documentation: Shedding Light on the Black Box of Reporting Data
Kelly Raposo
 
Benchmarking Hadoop - Which hadoop sql engine leads the herd
Gord Sissons
 
Improve Predictability & Efficiency with Kanban Metrics using IBM Rational In...
Marc Nehme
 
Improving Predictability and Efficiency with Kanban Metrics using Rational In...
Paulo Lacerda
 
Iod 2013 Jackman Schwenger
MARIA N. SCHWENGER
 
Connect2014 Spot101: Cloud Readiness 101: Analyzing and Visualizing Your IT I...
panagenda
 
Highly successful performance tuning of an informix database
IBM_Info_Management
 
IBM Connect 2014 - AD206: Build Apps Rapidly by Leveraging Services from IBM ...
IBM Connections Developers
 
IBM Connect 2014 - AD206 - Build Apps Rapidly by Leveraging Services from IBM...
Niklas Heidloff
 
Ims13 ims tools ims v13 migration workshop - IMS UG May 2014 Sydney & Melbo...
Robert Hain
 
Spark working with a Cloud IDE: Notebook/Shiny Apps
Data Con LA
 
[IBM Pulse 2014] #1579 DevOps Technical Strategy and Roadmap
Daniel Berg
 
Informix REST API Tutorial
Brian Hughes
 
DIY Analytics with Apache Spark
Adam Roberts
 
IBM Connect 2014 - AD206: Build Apps Rapidly by Leveraging Services from IBM ...
IBM Connections Developers
 
ICS usergroup dev day2014_application development für die ibm smartcloud for ...
ICS User Group
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Indrajit Poddar
 
Integrating BigInsights and Puredata system for analytics with query federati...
Seeling Cheung
 
Ad

Recently uploaded (20)

PDF
Upskill to Agentic Automation 2025 - Kickoff Meeting
DianaGray10
 
PPTX
Darren Mills The Migration Modernization Balancing Act: Navigating Risks and...
AWS Chicago
 
PDF
CIFDAQ'S Token Spotlight for 16th July 2025 - ALGORAND
CIFDAQ
 
PPTX
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
PPTX
The Yotta x CloudStack Advantage: Scalable, India-First Cloud
ShapeBlue
 
PDF
Shuen Mei Parth Sharma Boost Productivity, Innovation and Efficiency wit...
AWS Chicago
 
PDF
Generative AI in Healthcare: Benefits, Use Cases & Challenges
Lily Clark
 
PDF
Lecture A - AI Workflows for Banking.pdf
Dr. LAM Yat-fai (林日辉)
 
PDF
Bitcoin+ Escalando sin concesiones - Parte 1
Fernando Paredes García
 
PPTX
Lecture 5 - Agentic AI and model context protocol.pptx
Dr. LAM Yat-fai (林日辉)
 
PDF
CIFDAQ Market Insight for 14th July 2025
CIFDAQ
 
PDF
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
PPTX
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
PPTX
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
ShapeBlue
 
PDF
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
PDF
"Effect, Fiber & Schema: tactical and technical characteristics of Effect.ts"...
Fwdays
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PDF
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
Upskill to Agentic Automation 2025 - Kickoff Meeting
DianaGray10
 
Darren Mills The Migration Modernization Balancing Act: Navigating Risks and...
AWS Chicago
 
CIFDAQ'S Token Spotlight for 16th July 2025 - ALGORAND
CIFDAQ
 
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
The Yotta x CloudStack Advantage: Scalable, India-First Cloud
ShapeBlue
 
Shuen Mei Parth Sharma Boost Productivity, Innovation and Efficiency wit...
AWS Chicago
 
Generative AI in Healthcare: Benefits, Use Cases & Challenges
Lily Clark
 
Lecture A - AI Workflows for Banking.pdf
Dr. LAM Yat-fai (林日辉)
 
Bitcoin+ Escalando sin concesiones - Parte 1
Fernando Paredes García
 
Lecture 5 - Agentic AI and model context protocol.pptx
Dr. LAM Yat-fai (林日辉)
 
CIFDAQ Market Insight for 14th July 2025
CIFDAQ
 
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
ShapeBlue
 
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
"Effect, Fiber & Schema: tactical and technical characteristics of Effect.ts"...
Fwdays
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 

IBM Hadoop-DS Benchmark Report - 30TB

  • 1. Hadoop-DS Benchmark Report for IBM System x3650 M4 using IBM BigInsights Big SQL 3.0 and Red Hat Enterprise Linux Server Release 6.4 FIRST EDITION Published on Oct 24, 2014
  • 2. Page | 2 Published – Oct 24, 2014 The information contained in this document is distributed on an AS IS basis without any warranty either expressed or implied. The use of this information or the implementation of any of these techniques is the customer‟s responsibility and depends on the customer‟s ability to evaluate and integrate them into the customer‟s operational environment. While each item has been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Customers attempting to adapt these techniques to their own environment do so at their own risk. Performance data contained in this document were determined in various controlled laboratory environments and are for reference purposes only. Customers should not adapt these performance numbers to their own environments and are for reference purposes only. Customers should not adapt these performance numbers to their own environments as system performance standards. The results that may be obtained in other operating environments may vary significantly. Users of this document should verify the applicable data for their specific environment. In this document, any references made to an IBM licensed program are not intended to state or imply that only IBM‟s licensed program may be used; any functionally equivalent program may be used. This publication was produced in the United States. IBM may not offer the products, services, or features discussed in this document in other countries, and the information is subject to change without notice. Consult your local IBM representative for information on products and services available in your area. © Copyright International Business Machines Corporation 2014. All rights reserved. Permission is hereby granted to reproduce this document in whole or in part, provided the copyright notice as printed above is set forth in full text on the title page of each item reproduced. U.S. Government Users - Documentation related to restricted rights: Use, duplication, or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp. Trademarks IBM, the IBM logo, System x and System Storage are trademarks or registered trademarks of International Business Machines Corporation. The following terms used in this publication are trademarks of other companies as follows: TPC Benchmark and TPC-DS are trademarks of Transaction Processing Performance Council; Intel and Xeon are trademarks or registered trademarks of Intel Corporation. Other company, product, or service names, which may be denoted by two asterisks (**), may be trademarks or service marks of others. Notes 1 GHz and MHz only measures microprocessor internal clock speed, not application performance. Many factors affect application performance. 2 When referring to hard disk capacity, GB, or gigabyte, means one thousand million bytes. Total user-accessible capacity may be less.
  • 3. Page | 3 Authors Simon Harris: Simon is the Big SQL performance lead working in the IBM BigInsights development team. He has 20 years of experience working in information management including MPP RDBMS, federated database technology, tooling and big data. Simon now specializes in SQL over Hadoop technologies. Abhayan Sundararajan: Abhayan is a Performance Analyst on IBM BigInsights with a focus on Big SQL. He has held a variety of roles within the IBM DB2 team, including functional verification test and a brief foray into development before joining the performance team to work on DB2 BLU. John Poelman: John joined the BigInsights performance team in 2011. While at IBM John has worked as a developer or a performance engineer on a variety of Database, Business Intelligence, and now Big Data software products. Matthew Emmerton: Matt Emmerton is a Senior Software Developer in Information Management at the IBM Toronto Software Lab. He has over 10 years of expertise in database performance analysis and scalability testing. He has participated in many successful world-record TPC and SAP benchmarks. His interests include exploring and exploiting key hardware and operating system technologies in DB2, and developing extensible test suites for standard benchmark workloads. Special thanks Special thanks to the following people for their contribution to the benchmark and content: Berni Schiefer – Distinguish Engineer, Information Management Performance and Benchmarks, DB2 LUW, Big Data, MDM, Optim Data Studio Performance Tools Adriana Zubiri – Program Director, Big Data Development Mike Ahern – BigInsights Performance Mi Shum – Senior Performance Manager, Big Data Cindy Saracco - Solution Architect, IM technologies - Big Data Avrilia Floratou – IBM Research Fatma Özcan – IBM Research Glen Sheffield – Big Data Competitive Analyst Gord Sissons – BigInsights Product Marketing Manager Stewart Tate – Senior Technical Staff Member, Information Management Performance Benchmarks and Solutions Jo A Ramos - Executive Solutions Architect - Big Data and Analytics
  • 4. Page | 4 Table of Contents Authors .............................................................................................................................................................................................. 3 Introduction ...................................................................................................................................................................................... 5 Motivation ......................................................................................................................................................................................... 6 Benchmark Methodology ................................................................................................................................................................. 7 Design and Implementation ............................................................................................................................................................. 9 Results .............................................................................................................................................................................................. 11 Benchmark Audit ............................................................................................................................................................................ 13 Summary ......................................................................................................................................................................................... 14 Appendix A: Cluster Topology and Hardware Configuration ................................................................................................... 15 Appendix B: Create and Load Tables ........................................................................................................................................... 16 Create Flat files ............................................................................................................................................................................ 16 Create and Load Tables ............................................................................................................................................................... 16 Collect Statistics .......................................................................................................................................................................... 34 Appendix C: Tuning ....................................................................................................................................................................... 39 Appendix D: Scaling and Database Population ........................................................................................................................... 44 Appendix E: Queries ...................................................................................................................................................................... 45 Query Template Modifications .................................................................................................................................................... 45 Query Execution Order ................................................................................................................................................................ 45 Appendix F: Attestation Letter.................................................................................................................................................... 108
  • 5. Page | 5 Introduction Performance benchmarks are an integral part of software and systems development as they can evaluate systems performance in an objective way. They have also become highly visible components of the exciting world of marketing SQL over Hadoop solutions. IBM has constructed and used the Hadoop Decision Support (Hadoop-DS) benchmark, which was modeled on the industry standard Transaction Processing Performance Council Benchmark DS (TPC-DS)1 and validated by a TPC certified auditor. While adapting the workload for the nature of a Hadoop system IBM worked to ensure the essential attributes of both typical customer requirements and the benchmark were preserved. TPC-DS was released in January 2012 and most recently revised (Revision 1.2.0) in September 20142. The Hadoop-DS Benchmark is a decision support benchmark. It consists of a suite of business-oriented ad hoc queries. The queries and the data populating the database have been chosen to have broad industry wide relevance while maintaining a sufficient degree of ease of implementation. This benchmark illustrates decision support systems that:  Examine large volumes of data;  Execute queries with a high degree of complexity;  Give answers to critical business questions. Benchmarks results are highly dependent upon workload, specific application requirements, and systems design and implementation. Relative system performance will vary as a result of these and other factors. Therefore, Hadoop-DS should not be used as a substitute for specific customer application benchmarking when critical capacity planning and/or product evaluation decisions are contemplated. 1 TPC Benchmark and TPC-DS are trademarks of the Transaction Processing Performance Council (TPC). 2 The latest revision of the TPC-DS specification can be found at http://www.tpc.org/tpcds/default.asp
  • 6. Page | 6 Motivation Good benchmarks reflect, in a practical way, an abstraction of the essential elements of real customer workloads. Consequently, the aim of this project was to create a benchmark for SQL over Hadoop products which reflect a scenario common to many organizations adopting the technology today. The most common scenario we see involves moving subsets of workloads from the traditional relational data warehouse to SQL over Hadoop solutions (a process commonly referred to as Warehouse Augmentation). For this reason our Hadoop-DS workload was modeled on the existing relational TPC-DS benchmark. The TPC-DS benchmark uses relational database management systems (RDBMSs) to model a decision support system that examines large volumes of data and gives answers to real-world business questions by executing queries of various complexity (such as ad-hoc, reporting, OLAP and data mining type queries). It is therefore an ideal fit to mimic the experience of an organization porting parts of their workload from a traditional warehouse housed on an RDBMS to a SQL over Hadoop technology. As highlighted in IBM‟s “Benchmarking SQL over Hadoop Systems: TPC or not TPC?”3 Research paper, SQL over Hadoop solutions are in the “wild west” of benchmarking. Many vendors use the data generators and queries of existing TPC benchmarks, but cherry pick the parts of the benchmark most likely to highlight their own strengths – thus making comparison between results impossible and meaningless. To reflect real-world situations, Hadoop-DS does not cherry pick the parts of the TPC-DS workload that highlight Big SQL‟s strengths. Instead, we included all parts of the TPC-DS workload that are appropriate for SQL over Hadoop solutions; those being data loading, single user performance and multi-user performance. Since TPC-DS is a benchmark designed for relational database engines, some aspects of the benchmark are not applicable to SQL over Hadoop solutions. Broadly speaking, those are the “Data Maintenance” and “Data Persistence” sections of the benchmark. Consequently these sections were omitted from our Hadoop-DS workload. The TPC-DS benchmark also defines restrictions related to real-life situations – such as preventing the vendor from changing the queries to include additional predicates based on a customized partitioning schema, employing query specific tuning mechanisms (such as optimizer hints), making configuration changes between the single and multi-user tests etc... We endeavored to stay within the bounds of these restrictions for the Hadoop-DS workload and conducted the comparison with candor and due diligence. To validate our candor, we retained the services of Infosizing4, an established and respected benchmark auditing firm with multiple TPC certified auditors, including one with TPC-DS certification, to review and audit the benchmarking results. It is important to note that this is not an official TPC-DS benchmark result since aspects of the standard benchmark that do not apply to SQL over Hadoop solutions were not implemented. However, the independent review of the environment and results by an official auditor shows IBM commitment to openness and fair play in this arena. All deviations from the TPC-DS standard benchmark are noted in the attached auditor‟s attestation letter in Appendix F. In addition, all the information required to reproduce the environment and the Hadoop-DS workload is published in the various Appendices of this document – thus allowing any vendor or third party the ability to independently execute the benchmark and verify the results. 3 “Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?” http://researcher.ibm.com/researcher/files/us- aflorat/BenchmarkingSQL-on-Hadoop.pdf 4 Infosizing: http://www.infosizing.com
  • 7. Page | 7 Benchmark Methodology In this section we provide a high level overview of the Hadoop- DS benchmark process. There are three key stages in the Hadoop-DS benchmark (which reflect similar stages in the TPC-DS benchmark). They are: 1. Data load 2. Query generation and validation 3. Performance test The flow diagram in Figure 1 outlines these three key stages when conducting the Hadoop-DS benchmark: These stages are outlined below. For a detailed description of each phase refer to the “Design and Implementation” section of this document. 1. Data Load The load phase of the benchmark includes all operations required to bring the System Under Test (SUT) to a state where the Performance Test phase can begin. This includes all hardware provisioning and configuration, storage setup, software installation (inc. the OS), verifying the cluster operation, generating the raw data and copying it to HDFS, cluster tuning and all steps required to create and populate the database in order to bring the system into a state ready to accept queries (the Performance Test phase). All desired tuning of the SUT must be completed before the end of the LOAD phase. Once the tables are created and loaded with the raw data, relationships between tables such as primary-foreign key relationships and corresponding referential integrity constraints can be defined. Finally, statistics are collected. These statistics help the Big SQL query optimizer generate efficient access plans during the performance test. The SUT is now ready for the Performance Test phase. 2. Query Generation and Validation Before the Performance Test phase can be begin, the queries must be generated and validated by executing each query against a qualification database and comparing the result with a pre-defined answer set. There are 99 queries in the Hadoop-DS benchmark. Queries are automatically generated from query templates which contain substitution parameters. Specific parameter values depend on both the context the query is run (scale factor, single or multi- stream), and the seed for the random number generator. The seed used was the end time of the timed LOAD phase of the benchmark. Figure 1: High Level Procedure Figure 2: Load Phase
  • 8. Page | 8 3. Performance Test This is the timed phase of the Hadoop-DS benchmark. This phase consists of a single-stream performance test followed immediately by a multi-stream performance test. In the single-stream performance test, a single query stream is executed against the database, and the total elapsed time for all 99 queries is measured. In the multi-stream performance test, multiple query streams are executed against the database, and the total elapsed time from the start of the first query to the completion of the last query is measured. The multi-stream performance test is started immediately following the completion of the single-stream test. There can be no modifications to the system under test, or components restarted between these performance tests. The following steps are used to implement the performance test: Single-Stream Test 1. Stream 0 Execution Multi-Stream (all steps conducted in parallel) 1. Stream 1 Execution 2. Stream 2 Execution 3. Stream 3 Execution 4. Stream 4 Execution Hadoop-DS uses the “Hadoop-DS Qph” metric to report query performance. The Hadoop-DS Qph metric is the effective query throughput, measured as the number of queries executed over a period of time. A primary factor in the Hadoop-DS metric is the scale factor (SF) -- size of data set -- which is used to scale the actual performance numbers. This means that results have a metric scaled to the database size which helps guard against the fact that cluster hardware doesn't always scale linearly and helps differentiate large clusters from small clusters (since performance is typically a factor of cluster size). A Hadoop-DS Qph metric is calculated for each of the single and multi-user runs using the following formula: Hadoop-DS Qph @ SF = ( (SF/100) * Q * S ) / T Where: • SF is the scale factor used in GB (30,000 in our benchmark). SF is divided by 100 in order to normalize the results using 100GB as the baseline. • Q is the total number of queries successfully executed • S is the number of streams (1 for the single user run) • T is the duration of the run measured in hours (with a resolution up to one second) Hadoop-DS Qph metrics are reported at a specific scale factor. For example „Hadoop-DS Qph@30TB‟ represents the effective throughput of the SQL over Hadoop solution against a 30TB database.
  • 9. Page | 9 Design and Implementation This section provides a more detailed description of the configuration used in this implementation of the benchmark. Hardware The benchmark was conducted on a 17 node cluster with each node being an IBM x3650BD server. A complete specification of the hardware used can be found in Appendix A: Cluster Topology and Hardware Configuration. Physical Database Design In-line with Big SQL best practices, a single ext4 filesystems was created on each disk used to store data on all nodes in the cluster (including the master). Once mounted, 3 directories were created on each filesystem for the HDFS data directory, the Map-Red cache and the Big SQL Data directory. This configuration simplifies disk layout and evenly spreads the io for all components across all available disks – and consequently provides good performance. For detailed information, refer to the “Installation options”and “OS storage” sections of Appendix C. The default HDFS replication factor of 3 was used to replicate HDFS blocks between nodes in the cluster. No other replication (at the filesystem or database level) was used. Parquet was chosen as the storage format for the Big SQL tables. Parquet is the optimal format for Big SQL 3.0 – both in terms of performance and disk space consumption. The Paquet storage format has Snappy compression enabled by default in Big SQL. Logical Database Design Big SQL‟s Parquet storage format does not support DATE or TIME data types, so VARCHAR(10) and VARCHAR(16) were used respectively. As a result, a small number of queries required these columns to be CAST to the appropriate DATE or TIME types in order for date arithmetic to be performed upon them. Other than the natural scatter partitioning providing by HDFS, no other explicit horizontal or vertical partitioning was implemented. For detailed information on the DDL used to create the tables, see Appendix B. Data Generation Although data generation is not a timed operation, generation and copy of the raw data to HDFS was parallelized across the data nodes in the cluster to improve efficiency. After generation, a directory exists on HDFS for each table in the schema. This directory is used as the source for the LOAD command. Database Population Tables were populated using the data generated and stored on HDFS during the data generation phase. The tables were loaded sequentially, one after the other. Tables were populated using the Big SQL LOAD command which uses Map-Reduce jobs to read the source data and populate the target file using the specified storage format (parquet in our case). The number of Map-Reduce tasks used to load each of the large fact tables where individually tuned via the num.map.tasks property in order to improve load performance. Details of the LOAD command used can be found in Appendix B. Referential Integrity / Informational Constraints In a traditional data warehouse, referential integrity (“RI”) constraints are often employed to ensure that relationship between tables are maintained over time, as additional data is loaded and/or existing data is refreshed.
  • 10. Page | 10 While today‟s “big data” products support RI constraints, they do not have the maturity required to enforce these constraints, as ingest and lookup capabilities are optimized for large parallel scans, not singleton lookups. As a result, care should be taken to enforce RI at the source, or during the ETL process. A good example is when moving data from an existing data warehouse to a big data platform – if the RI constraints existed in the source RDBMs then it is safe to create equivalent informational constraints on the big data platform as the constraints were already enforced by the RDBMs. The presence of these constraints in the schema provides valuable information to the query compiler / optimizer, and thus these RI constraints are still created, but are unenforced. Unenforced RI constraints are often called informational constraints (“IC”). Big SQL supports the use of Informational Constraints, and ICs were created for every PK and FK relationship in the TPC-DS schema. Appendix B provides full details on all informational constraints created. Statistics As Big SQL uses a cost-based query optimizer, the presence of accurate statistics is essential. Statistics can be gathered in various forms, ranging from simple cardinality statistics on tables, to distribution statistics for single columns and groups of columns within a table, to “statistical views” that provide cardinality and distribution statistics across join products. For this test, a combination of all of these methods was employed. a. cardinality statistics collected on all tables b. distribution statistics collected on all columns c. group distribution statistics collected for composite PKs in all 7 fact tables d. statistical views created on a combination of join predicates See Appendix B for full details of the statistics collected for the Hadoop-DS benchmark. Query Generation and Validation Since there are many variations of SQL dialects, the specification allows the sponsor to make pre-defined minor modifications to the queries so they can be successfully compiled and executed. In Big SQL, 87 of the 99 queries worked directly from generated query templates. The other 12 queries required only simple minor query modifications (mainly type casts) and took less than one hour to complete. Chart 1 shows the query breakdown. Full query text for all 99 queries used during the single-stream run can be found in Appendix E. Performance Test Once the data is loaded, statistics gathered and queries generated the performance phase of the benchmark can commence. During the performance phase, a single stream run is executed, followed immediately by a multi-stream run. For the multi-stream run, four query streams were used. Chart 1: Big SQL 3.0 query breakdown at 30TB
  • 11. Page | 11 Results Figures 3 and 4 summarize the results for executing the Hadoop-DS benchmark against Big SQL using a scale factor of 30TB. Figure 3: Big SQL Results for Hadoop-DS @ 30TB IBM System x3650BD with IBM BigInsights Big SQL v3.0 Hadoop-DS (*) October 24, 2014 Single-Stream Performance Multi-Stream Performance 1,023 Hadoop-DS Qph @ 30TB 2,274 Hadoop-DS Qph @ 30TB Database Size Query Engine Operating System 30 TB IBM BigInsights Big SQL v3.0 Red Hat Enterprise Linux Server Release 6.4 System Components (per cluster node) Processors/Cores/Threads Memory Disk Controllers Disk Drives Network Quantity 2/20/40 8 1 10 1 1 Description Intel Xeon E5-2680 v2, 2.80GHz, 25MB L3 Cache 16GB ECC DDR3 1866MHz LRDIMM IBM ServeRAID-M5210 SAS/SATA Controller 2TB SATA 3.5” HDD (HDFS) 128GB SATA 2.5” SSD (Swap) Onboard dual-port GigE Adapter This implementation of the Hadoop-DS benchmark audited by Francois Raab of Infosizing (www.sizing.com) (*) The Hadoop-DS benchmark is derived from TPC Benchmark DS (TPC-DS) and is not comparable to published TPC-DS results. TPC Benchmark is a trademark of the Transaction Processing Performance Council. Master Node 16 Data Nodes Network (10GigE)
  • 12. Page | 12 IBM System x3650BD with IBM BigInsights Big SQL v3.0 Hadoop-DS (*) November 14, 2014 Start and End Times Test Start Date Start Time End Date End Time Elapsed Time Single Stream 10/15/14 16:01:45 10/16/14 21:02:30 29:00:45 Multi-Stream 10/16/14 21:41:50 10/19/14 01:55:03 52:13:13 Number of Query Streams for Multi-Stream Test 4 Figure 4: Big SQL Elapsed Times for Hadoop-DS@30TB Of particular note is the fact that 4 concurrent query streams (and therefore 4 times as many queries) take only 1.8x longer than a single query stream. Chart 2 highlights Big SQL‟s impressive multi-user scalability. Chart 2: Big SQL Multi-user Scalability using 4 Query Streams @30TB
  • 13. Page | 13 Benchmark Audit Auditor This implementation of the IBM Hadoop-DS Benchmark was audited by Francois Raab of Infosizing. Further information regarding the audit process may be obtained from: InfoSizing 531 Crystal Hills Boulevard Crystal Springs, CO 80829 Telephone: (719) 473-7555 Attestation Letter The auditor‟s attestation letter can be found in Appendix F.
  • 14. Page | 14 Summary The Hadoop-DS benchmark takes a Data Warehousing workload from an RDBMs and ports it to IBM‟s SQL over Hadoop solution – namely Big SQL 3.0. Porting workloads from existing warehouses to SQL over Hadoop is a common scenario for organizations seeking to reduce the cost of their existing data warehousing platforms. For this reason the Hadoop-DS workload was modeled on the Transaction Processing Performance Council Benchmark DS (TPC-DS). The services of a TPC approved auditor were secured to review the benchmark process and results. The results of this benchmark demonstrate the ease with which existing data warehouses can be augmented with IBM Big SQL 3.0. It highlights how Big SQL is able to implement rich SQL with outstanding performance, on a large data set with multiple concurrent users. These findings will be compelling to organizations augmenting data warehouse environments with Hadoop-based technologies. Strict SQL compliance can translate into significant cost savings by allowing customers to leverage existing investments in databases, applications and skills and take advantage of SQL over Hadoop with minimal disruption to existing environments. Enterprise customers cannot afford to have different dialects of SQL across different data management platforms. This benchmark shows that IBM‟s Big SQL demonstrates a high degree of SQL language compatibility with existing RDBMs workloads. Not only is IBM Big SQL compatible with existing RDBMs, it also demonstrates very good performance and scalability for a SQL over Hadoop solution. This means that customers can realize business results faster, ask more complex questions, and realize great efficiencies per unit investment in infrastructure. All of these factors help provide a competitive advantage. The performance and SQL language richness demonstrated through-out this paper demonstrates that IBM Big SQL is the industry leading SQL over Hadoop solution available today.
  • 15. Page | 15 Appendix A: Cluster Topology and Hardware Configuration Figure 5: IBM System x3650BD M4 The measured configuration was a cluster of 17 identical IBM XSeries x3650BD M4 servers with 1 master node and 16 data nodes. Each contained:  CPU: [email protected] v2 2 sockets, 10 cores each, hyper threading enabled. Total of 40 Logical CPUs  Memory: 128 GB RAM at 1866 MHz  Storage: 10 x 2TB 3.5” Serial SATA, 7200RPM. One disk for OS, 9 for data  Storage:4 x 128GB SSD. A single SSD was used for OS Swap. Other SSDs were not used.  Network: Dual port 10 Gb Ethernet  OS: Red Hat Enterprise Linux Server release 6.4 (Santiago)
  • 16. Page | 16 Appendix B: Create and Load Tables Create Flat files Scripts: 001.gen-data-v3-tpcds.sh 002.gen-data-v3-tpcds-forParquet.sh These scripts are essentially wrappers for dsdgen to generate the TPC-DS flat files. They provide support for parallel data generation, as well as the generation of the data directly in HDFS (through the use of named pipes) rather than first staging the flat files on a local disk. Create and Load Tables 040.create-tables-parquet.jsq: The following DDL was used to create the tables in Big SQL: set schema $schema; create hadoop table call_center ( cc_call_center_sk bigint not null, cc_call_center_id varchar(16) not null, cc_rec_start_date varchar(10) , cc_rec_end_date varchar(10) , cc_closed_date_sk bigint , cc_open_date_sk bigint , cc_name varchar(50) , cc_class varchar(50) , cc_employees bigint , cc_sq_ft bigint , cc_hours varchar(20) , cc_manager varchar(40) , cc_mkt_id bigint , cc_mkt_class varchar(50) , cc_mkt_desc varchar(100) , cc_market_manager varchar(40) , cc_division bigint , cc_division_name varchar(50) , cc_company bigint , cc_company_name varchar(50) , cc_street_number varchar(10) , cc_street_name varchar(60) , cc_street_type varchar(15) , cc_suite_number varchar(10) , cc_city varchar(60) , cc_county varchar(30) , cc_state varchar(2) , cc_zip varchar(10) , cc_country varchar(20) , cc_gmt_offset double , cc_tax_percentage double ) STORED AS PARQUETFILE; create hadoop table catalog_page ( cp_catalog_page_sk bigint not null, cp_catalog_page_id varchar(16) not null, cp_start_date_sk bigint , cp_end_date_sk bigint , cp_department varchar(50) , cp_catalog_number bigint , cp_catalog_page_number bigint , cp_description varchar(100) , cp_type varchar(100) )
  • 17. Page | 17 STORED AS PARQUETFILE; create hadoop table catalog_returns ( cr_returned_date_sk bigint , cr_returned_time_sk bigint , cr_item_sk bigint not null, cr_refunded_customer_sk bigint , cr_refunded_cdemo_sk bigint , cr_refunded_hdemo_sk bigint , cr_refunded_addr_sk bigint , cr_returning_customer_sk bigint , cr_returning_cdemo_sk bigint , cr_returning_hdemo_sk bigint , cr_returning_addr_sk bigint , cr_call_center_sk bigint , cr_catalog_page_sk bigint , cr_ship_mode_sk bigint , cr_warehouse_sk bigint , cr_reason_sk bigint , cr_order_number bigint not null, cr_return_quantity bigint , cr_return_amount double , cr_return_tax double , cr_return_amt_inc_tax double , cr_fee double , cr_return_ship_cost double , cr_refunded_cash double , cr_reversed_charge double , cr_store_credit double , cr_net_loss double ) STORED AS PARQUETFILE; create hadoop table catalog_sales ( cs_sold_date_sk bigint , cs_sold_time_sk bigint , cs_ship_date_sk bigint , cs_bill_customer_sk bigint , cs_bill_cdemo_sk bigint , cs_bill_hdemo_sk bigint , cs_bill_addr_sk bigint , cs_ship_customer_sk bigint , cs_ship_cdemo_sk bigint , cs_ship_hdemo_sk bigint , cs_ship_addr_sk bigint , cs_call_center_sk bigint , cs_catalog_page_sk bigint , cs_ship_mode_sk bigint , cs_warehouse_sk bigint , cs_item_sk bigint not null, cs_promo_sk bigint , cs_order_number bigint not null, cs_quantity bigint , cs_wholesale_cost double , cs_list_price double , cs_sales_price double , cs_ext_discount_amt double , cs_ext_sales_price double , cs_ext_wholesale_cost double , cs_ext_list_price double , cs_ext_tax double , cs_coupon_amt double , cs_ext_ship_cost double , cs_net_paid double , cs_net_paid_inc_tax double , cs_net_paid_inc_ship double , cs_net_paid_inc_ship_tax double , cs_net_profit double ) STORED AS PARQUETFILE; create hadoop table customer ( c_customer_sk bigint not null,
  • 18. Page | 18 c_customer_id varchar(16) not null, c_current_cdemo_sk bigint , c_current_hdemo_sk bigint , c_current_addr_sk bigint , c_first_shipto_date_sk bigint , c_first_sales_date_sk bigint , c_salutation varchar(10) , c_first_name varchar(20) , c_last_name varchar(30) , c_preferred_cust_flag varchar(1) , c_birth_day bigint , c_birth_month bigint , c_birth_year bigint , c_birth_country varchar(20) , c_login varchar(13) , c_email_address varchar(50) , c_last_review_date bigint ) STORED AS PARQUETFILE; create hadoop table customer_address ( ca_address_sk bigint not null, ca_address_id varchar(16) not null, ca_street_number varchar(10) , ca_street_name varchar(60) , ca_street_type varchar(15) , ca_suite_number varchar(10) , ca_city varchar(60) , ca_county varchar(30) , ca_state varchar(2) , ca_zip varchar(10) , ca_country varchar(20) , ca_gmt_offset double , ca_location_type varchar(20) ) STORED AS PARQUETFILE; create hadoop table customer_demographics ( cd_demo_sk bigint not null, cd_gender varchar(1) , cd_marital_status varchar(1) , cd_education_status varchar(20) , cd_purchase_estimate bigint , cd_credit_rating varchar(10) , cd_dep_count bigint , cd_dep_employed_count bigint , cd_dep_college_count bigint ) STORED AS PARQUETFILE; create hadoop table date_dim ( d_date_sk bigint not null, d_date_id varchar(16) not null, d_date varchar(10) , d_month_seq bigint , d_week_seq bigint , d_quarter_seq bigint , d_year bigint , d_dow bigint , d_moy bigint , d_dom bigint , d_qoy bigint , d_fy_year bigint , d_fy_quarter_seq bigint , d_fy_week_seq bigint , d_day_name varchar(9) , d_quarter_name varchar(6) , d_holiday varchar(1) , d_weekend varchar(1) , d_following_holiday varchar(1) , d_first_dom bigint , d_last_dom bigint , d_same_day_ly bigint ,
  • 19. Page | 19 d_same_day_lq bigint , d_current_day varchar(1) , d_current_week varchar(1) , d_current_month varchar(1) , d_current_quarter varchar(1) , d_current_year varchar(1) ) STORED AS PARQUETFILE; create hadoop table household_demographics ( hd_demo_sk bigint not null, hd_income_band_sk bigint , hd_buy_potential varchar(15) , hd_dep_count bigint , hd_vehicle_count bigint ) STORED AS PARQUETFILE; create hadoop table income_band ( ib_income_band_sk bigint not null, ib_lower_bound bigint , ib_upper_bound bigint ) STORED AS PARQUETFILE; create hadoop table inventory ( inv_date_sk bigint not null, inv_item_sk bigint not null, inv_warehouse_sk bigint not null, inv_quantity_on_hand bigint ) STORED AS PARQUETFILE; create hadoop table item ( i_item_sk bigint not null, i_item_id varchar(16) not null, i_rec_start_date varchar(10) , i_rec_end_date varchar(10) , i_item_desc varchar(200) , i_current_price double , i_wholesale_cost double , i_brand_id bigint , i_brand varchar(50) , i_class_id bigint , i_class varchar(50) , i_category_id bigint , i_category varchar(50) , i_manufact_id bigint , i_manufact varchar(50) , i_size varchar(20) , i_formulation varchar(20) , i_color varchar(20) , i_units varchar(10) , i_container varchar(10) , i_manager_id bigint , i_product_name varchar(50) ) STORED AS PARQUETFILE; create hadoop table promotion ( p_promo_sk bigint not null, p_promo_id varchar(16) not null, p_start_date_sk bigint , p_end_date_sk bigint , p_item_sk bigint , p_cost double , p_response_target bigint , p_promo_name varchar(50) , p_channel_dmail varchar(1) , p_channel_email varchar(1) , p_channel_catalog varchar(1) ,
  • 20. Page | 20 p_channel_tv varchar(1) , p_channel_radio varchar(1) , p_channel_press varchar(1) , p_channel_event varchar(1) , p_channel_demo varchar(1) , p_channel_details varchar(100) , p_purpose varchar(15) , p_discount_active varchar(1) ) STORED AS PARQUETFILE; create hadoop table reason ( r_reason_sk bigint not null, r_reason_id varchar(16) not null, r_reason_desc varchar(100) ) STORED AS PARQUETFILE; create hadoop table ship_mode ( sm_ship_mode_sk bigint not null, sm_ship_mode_id varchar(16) not null, sm_type varchar(30) , sm_code varchar(10) , sm_carrier varchar(20) , sm_contract varchar(20) ) STORED AS PARQUETFILE; create hadoop table store ( s_store_sk bigint not null, s_store_id varchar(16) not null, s_rec_start_date varchar(10) , s_rec_end_date varchar(10) , s_closed_date_sk bigint , s_store_name varchar(50) , s_number_employees bigint , s_floor_space bigint , s_hours varchar(20) , s_manager varchar(40) , s_market_id bigint , s_geography_class varchar(100) , s_market_desc varchar(100) , s_market_manager varchar(40) , s_division_id bigint , s_division_name varchar(50) , s_company_id bigint , s_company_name varchar(50) , s_street_number varchar(10) , s_street_name varchar(60) , s_street_type varchar(15) , s_suite_number varchar(10) , s_city varchar(60) , s_county varchar(30) , s_state varchar(2) , s_zip varchar(10) , s_country varchar(20) , s_gmt_offset double , s_tax_precentage double ) STORED AS PARQUETFILE; create hadoop table store_returns ( sr_returned_date_sk bigint , sr_return_time_sk bigint , sr_item_sk bigint not null, sr_customer_sk bigint , sr_cdemo_sk bigint , sr_hdemo_sk bigint , sr_addr_sk bigint , sr_store_sk bigint , sr_reason_sk bigint , sr_ticket_number bigint not null,
  • 21. Page | 21 sr_return_quantity bigint , sr_return_amt double , sr_return_tax double , sr_return_amt_inc_tax double , sr_fee double , sr_return_ship_cost double , sr_refunded_cash double , sr_reversed_charge double , sr_store_credit double , sr_net_loss double ) STORED AS PARQUETFILE; create hadoop table store_sales ( ss_sold_date_sk bigint , ss_sold_time_sk bigint , ss_item_sk bigint not null, ss_customer_sk bigint , ss_cdemo_sk bigint , ss_hdemo_sk bigint , ss_addr_sk bigint , ss_store_sk bigint , ss_promo_sk bigint , ss_ticket_number bigint not null, ss_quantity bigint , ss_wholesale_cost double , ss_list_price double , ss_sales_price double , ss_ext_discount_amt double , ss_ext_sales_price double , ss_ext_wholesale_cost double , ss_ext_list_price double , ss_ext_tax double , ss_coupon_amt double , ss_net_paid double , ss_net_paid_inc_tax double , ss_net_profit double ) STORED AS PARQUETFILE; create hadoop table time_dim ( t_time_sk bigint not null, t_time_id varchar(16) not null, t_time bigint , t_hour bigint , t_minute bigint , t_second bigint , t_am_pm varchar(2) , t_shift varchar(20) , t_sub_shift varchar(20) , t_meal_time varchar(20) ) STORED AS PARQUETFILE; create hadoop table warehouse ( w_warehouse_sk bigint not null, w_warehouse_id varchar(16) not null, w_warehouse_name varchar(20) , w_warehouse_sq_ft bigint , w_street_number varchar(10) , w_street_name varchar(60) , w_street_type varchar(15) , w_suite_number varchar(10) , w_city varchar(60) , w_county varchar(30) , w_state varchar(2) , w_zip varchar(10) , w_country varchar(20) , w_gmt_offset double ) STORED AS PARQUETFILE; create hadoop table web_page
  • 22. Page | 22 ( wp_web_page_sk bigint not null, wp_web_page_id varchar(16) not null, wp_rec_start_date varchar(10) , wp_rec_end_date varchar(10) , wp_creation_date_sk bigint , wp_access_date_sk bigint , wp_autogen_flag varchar(1) , wp_customer_sk bigint , wp_url varchar(100) , wp_type varchar(50) , wp_char_count bigint , wp_link_count bigint , wp_image_count bigint , wp_max_ad_count bigint ) STORED AS PARQUETFILE; create hadoop table web_returns ( wr_returned_date_sk bigint , wr_returned_time_sk bigint , wr_item_sk bigint not null, wr_refunded_customer_sk bigint , wr_refunded_cdemo_sk bigint , wr_refunded_hdemo_sk bigint , wr_refunded_addr_sk bigint , wr_returning_customer_sk bigint , wr_returning_cdemo_sk bigint , wr_returning_hdemo_sk bigint , wr_returning_addr_sk bigint , wr_web_page_sk bigint , wr_reason_sk bigint , wr_order_number bigint not null, wr_return_quantity bigint , wr_return_amt double , wr_return_tax double , wr_return_amt_inc_tax double , wr_fee double , wr_return_ship_cost double , wr_refunded_cash double , wr_reversed_charge double , wr_account_credit double , wr_net_loss double ) STORED AS PARQUETFILE; create hadoop table web_sales ( ws_sold_date_sk bigint , ws_sold_time_sk bigint , ws_ship_date_sk bigint , ws_item_sk bigint not null, ws_bill_customer_sk bigint , ws_bill_cdemo_sk bigint , ws_bill_hdemo_sk bigint , ws_bill_addr_sk bigint , ws_ship_customer_sk bigint , ws_ship_cdemo_sk bigint , ws_ship_hdemo_sk bigint , ws_ship_addr_sk bigint , ws_web_page_sk bigint , ws_web_site_sk bigint , ws_ship_mode_sk bigint , ws_warehouse_sk bigint , ws_promo_sk bigint , ws_order_number bigint not null, ws_quantity bigint , ws_wholesale_cost double , ws_list_price double , ws_sales_price double , ws_ext_discount_amt double , ws_ext_sales_price double , ws_ext_wholesale_cost double , ws_ext_list_price double , ws_ext_tax double ,
  • 23. Page | 23 ws_coupon_amt double , ws_ext_ship_cost double , ws_net_paid double , ws_net_paid_inc_tax double , ws_net_paid_inc_ship double , ws_net_paid_inc_ship_tax double , ws_net_profit double ) STORED AS PARQUETFILE; create hadoop table web_site ( web_site_sk bigint not null, web_site_id varchar(16) not null, web_rec_start_date varchar(10) , web_rec_end_date varchar(10) , web_name varchar(50) , web_open_date_sk bigint , web_close_date_sk bigint , web_class varchar(50) , web_manager varchar(40) , web_mkt_id bigint , web_mkt_class varchar(50) , web_mkt_desc varchar(100) , web_market_manager varchar(40) , web_company_id bigint , web_company_name varchar(50) , web_street_number varchar(10) , web_street_name varchar(60) , web_street_type varchar(15) , web_suite_number varchar(10) , web_city varchar(60) , web_county varchar(30) , web_state varchar(2) , web_zip varchar(10) , web_country varchar(20) , web_gmt_offset double , web_tax_percentage double ) STORED AS PARQUETFILE; commit; 045.load-tables.jsq: The following script was used to load the flatfiles into Big SQL in Parquet format: set schema $schema; load hadoop using file url '/HADOOPDS30000G_PARQ/call_center' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table call_center overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); load hadoop using file url '/HADOOPDS30000G_PARQ/catalog_page' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table catalog_page overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); load hadoop using file url '/HADOOPDS30000G_PARQ/catalog_returns' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table catalog_returns overwrite WITH LOAD PROPERTIES ('num.map.tasks'='425'); load hadoop using file url '/HADOOPDS30000G_PARQ/catalog_sales' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table catalog_sales overwrite WITH LOAD PROPERTIES ('num.map.tasks'='4250'); load hadoop using file url '/HADOOPDS30000G_PARQ/customer' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table customer overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); load hadoop using file url '/HADOOPDS30000G_PARQ/customer_address' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table customer_address overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1');
  • 24. Page | 24 load hadoop using file url '/HADOOPDS30000G_PARQ/customer_demographics' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table customer_demographics overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); load hadoop using file url '/HADOOPDS30000G_PARQ/date_dim' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table date_dim overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); load hadoop using file url '/HADOOPDS30000G_PARQ/household_demographics' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table household_demographics overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); load hadoop using file url '/HADOOPDS30000G_PARQ/income_band' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table income_band overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); load hadoop using file url '/HADOOPDS30000G_PARQ/inventory' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table inventory overwrite WITH LOAD PROPERTIES ('num.map.tasks'='160'); load hadoop using file url '/HADOOPDS30000G_PARQ/item' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table item overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); load hadoop using file url '/HADOOPDS30000G_PARQ/promotion' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table promotion overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); load hadoop using file url '/HADOOPDS30000G_PARQ/reason' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table reason overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); load hadoop using file url '/HADOOPDS30000G_PARQ/ship_mode' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table ship_mode overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); load hadoop using file url '/HADOOPDS30000G_PARQ/store' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table store overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); load hadoop using file url '/HADOOPDS30000G_PARQ/store_returns' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table store_returns overwrite WITH LOAD PROPERTIES ('num.map.tasks'='700'); load hadoop using file url '/HADOOPDS30000G_PARQ/store_sales' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table store_sales overwrite WITH LOAD PROPERTIES ('num.map.tasks'='5500'); load hadoop using file url '/HADOOPDS30000G_PARQ/time_dim' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table time_dim overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); load hadoop using file url '/HADOOPDS30000G_PARQ/warehouse/' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table warehouse overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); load hadoop using file url '/HADOOPDS30000G_PARQ/web_page' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table web_page overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); load hadoop using file url '/HADOOPDS30000G_PARQ/web_returns' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table web_returns overwrite WITH LOAD PROPERTIES ('num.map.tasks'='200'); load hadoop using file url '/HADOOPDS30000G_PARQ/web_sales' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table web_sales overwrite WITH LOAD PROPERTIES ('num.map.tasks'='2000'); load hadoop using file url '/HADOOPDS30000G_PARQ/web_site' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table web_site overwrite WITH LOAD PROPERTIES ('num.map.tasks'='1'); 046.load-files-individually.sh: Since the benchmark uses 1GB block sizes for the Parquet files, any table which is smaller than 16 GB but still significant in size (at least 1 GB) will not be spread across all data nodes in the cluster when using the load script above. For this reason, the flat files for customer, customer_address and inventory were generated in several pieces (1 file per data node). Each file was then loaded individually in order to spread the files and blocks across as many of the data nodes as possible. This allows Big SQL to fully parallelize a table scan against these tables across all data nodes . The following script was used to achieve this distribution for the 3 tables mentioned: FLATDIR=$1 TABLE=$2
  • 25. Page | 25 FILE="046.load-files-individually-${TABLE}.jsq" i=0 schema=HADOOPDS30000G_PARQ rm -rf ${FILE} echo "set schema $schema;" >> ${FILE} echo >> ${FILE} hadoop fs -ls ${FLATDIR} | grep -v Found | awk '{print $8}' | while read f do if [[ $i == 0 ]] ; then echo "load hadoop using file url '$f' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table ${TABLE} overwrite ;" >> ${FILE} i=1 else echo "load hadoop using file url '$f' with source properties ('field.delimiter'='|', 'ignore.extra.fields'='true') into table ${TABLE} append ;" >> ${FILE} fi done 055.ri.jsq: Primary Key (PK) and Foreign Key (FK) constraints cannot be enforced by BigSQL but “not enforced” constraints can be used to give the optimizer some added information when it is considering access plans. The following informational constraints were used in the environment (all PK + FK relationships outlined in the TPC-DS specification): set schema $schema; ------------------------------------------------------------ -- primary key definitions ------------------------------------------------------------ alter table call_center add primary key (cc_call_center_sk) not enforced enable query optimization; commit work; alter table catalog_page add primary key (cp_catalog_page_sk) not enforced enable query optimization; commit work; alter table catalog_returns add primary key (cr_item_sk, cr_order_number) not enforced enable query optimization; commit work; alter table catalog_sales add primary key (cs_item_sk, cs_order_number) not enforced enable query optimization; commit work; alter table customer add primary key (c_customer_sk) not enforced enable query optimization; commit work; alter table customer_address add primary key (ca_address_sk) not enforced enable query optimization; commit work; alter table customer_demographics add primary key (cd_demo_sk) not enforced enable query optimization; commit work;
  • 26. Page | 26 alter table date_dim add primary key (d_date_sk) not enforced enable query optimization; commit work; alter table household_demographics add primary key (hd_demo_sk) not enforced enable query optimization; commit work; alter table income_band add primary key (ib_income_band_sk) not enforced enable query optimization; commit work; alter table inventory add primary key (inv_date_sk, inv_item_sk, inv_warehouse_sk) not enforced enable query optimization; commit work; alter table item add primary key (i_item_sk) not enforced enable query optimization; commit work; alter table promotion add primary key (p_promo_sk) not enforced enable query optimization; commit work; alter table reason add primary key (r_reason_sk) not enforced enable query optimization; commit work; alter table ship_mode add primary key (sm_ship_mode_sk) not enforced enable query optimization; commit work; alter table store add primary key (s_store_sk) not enforced enable query optimization; commit work; alter table store_returns add primary key (sr_item_sk, sr_ticket_number) not enforced enable query optimization; commit work; alter table store_sales add primary key (ss_item_sk, ss_ticket_number) not enforced enable query optimization; commit work; alter table time_dim add primary key (t_time_sk) not enforced enable query optimization; commit work; alter table warehouse add primary key (w_warehouse_sk) not enforced enable query optimization; commit work; alter table web_page add primary key (wp_web_page_sk) not enforced enable query optimization; commit work; alter table web_returns add primary key (wr_item_sk, wr_order_number) not enforced enable query optimization; commit work; alter table web_sales
  • 27. Page | 27 add primary key (ws_item_sk, ws_order_number) not enforced enable query optimization; commit work; alter table web_site add primary key (web_site_sk) not enforced enable query optimization; commit work; ------------------------------------------------------------ -- foreign key definitions ------------------------------------------------------------ -- tables with no FKs -- customer_address -- customer_demographics -- item -- date_dim -- warehouse -- ship_mode -- time_dim -- reason -- income_band alter table promotion add constraint fk1 foreign key (p_start_date_sk) references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table promotion add constraint fk2 foreign key (p_end_date_sk) references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table promotion add constraint fk3 foreign key (p_item_sk) references item (i_item_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table store add constraint fk foreign key (s_closed_date_sk) references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table call_center add constraint fk1 foreign key (cc_closed_date_sk) references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table call_center add constraint fk2 foreign key (cc_open_date_sk) references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table customer add constraint fk1 foreign key (c_current_cdemo_sk) references customer_demographics (cd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table customer add constraint fk2 foreign key (c_current_hdemo_sk) references household_demographics (hd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table customer add constraint fk3 foreign key (c_current_addr_sk) references customer_address (ca_address_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table customer add constraint fk4 foreign key (c_first_shipto_date_sk) references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table customer
  • 28. Page | 28 add constraint fk5 foreign key (c_first_sales_date_sk) references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_site add constraint fk1 foreign key (web_open_date_sk) references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_site add constraint fk2 foreign key (web_close_date_sk) references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_page add constraint fk1 foreign key (cp_start_date_sk) references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_page add constraint fk2 foreign key (cp_end_date_sk) references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table household_demographics add constraint fk foreign key (hd_income_band_sk) references income_band (ib_income_band_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_page add constraint fk1 foreign key (wp_creation_date_sk) references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_page add constraint fk2 foreign key (wp_access_date_sk) references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_page add constraint fk3 foreign key (wp_customer_sk) references customer (c_customer_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table store_sales add constraint fk1 foreign key (ss_sold_date_sk) references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table store_sales add constraint fk2 foreign key (ss_sold_time_sk) references time_dim (t_time_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table store_sales add constraint fk3a foreign key (ss_item_sk) references item (i_item_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table store_sales add constraint fk4 foreign key (ss_customer_sk) references customer (c_customer_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table store_sales add constraint fk5 foreign key (ss_cdemo_sk) references customer_demographics (cd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table store_sales add constraint fk6 foreign key (ss_hdemo_sk) references household_demographics (hd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table store_sales add constraint fk7 foreign key (ss_addr_sk)
  • 29. Page | 29 references customer_address (ca_address_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table store_sales add constraint fk8 foreign key (ss_store_sk) references store (s_store_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table store_sales add constraint fk9 foreign key (ss_promo_sk) references promotion (p_promo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table store_returns add constraint fk1 foreign key (sr_returned_date_sk) references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table store_returns add constraint fk2 foreign key (sr_return_time_sk) references time_dim (t_time_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table store_returns add constraint fk3a foreign key (sr_item_sk) references item (i_item_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table store_returns add constraint fk3b foreign key (sr_item_sk, sr_ticket_number) references store_sales (ss_item_sk, ss_ticket_number) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table store_returns add constraint fk4 foreign key (sr_customer_sk) references customer (c_customer_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table store_returns add constraint fk5 foreign key (sr_cdemo_sk) references customer_demographics (cd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table store_returns add constraint fk6 foreign key (sr_hdemo_sk) references household_demographics (hd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table store_returns add constraint fk7 foreign key (sr_addr_sk) references customer_address (ca_address_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table store_returns add constraint fk8 foreign key (sr_store_sk) references store (s_store_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table store_returns add constraint fk9 foreign key (sr_reason_sk) references reason (r_reason_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_sales add constraint fk1 foreign key (cs_sold_date_sk) references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_sales add constraint fk2 foreign key (cs_sold_time_sk) references time_dim (t_time_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_sales add constraint fk3 foreign key (cs_ship_date_sk) references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION;
  • 30. Page | 30 commit work; alter table catalog_sales add constraint fk4 foreign key (cs_bill_customer_sk) references customer (c_customer_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_sales add constraint fk5 foreign key (cs_bill_cdemo_sk) references customer_demographics (cd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_sales add constraint fk6 foreign key (cs_bill_hdemo_sk) references household_demographics (hd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_sales add constraint fk7 foreign key (cs_bill_addr_sk) references customer_address (ca_address_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_sales add constraint fk8 foreign key (cs_ship_customer_sk) references customer (c_customer_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_sales add constraint fk9 foreign key (cs_ship_cdemo_sk) references customer_demographics (cd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_sales add constraint fk10 foreign key (cs_ship_hdemo_sk) references household_demographics (hd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_sales add constraint fk11 foreign key (cs_ship_addr_sk) references customer_address (ca_address_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_sales add constraint fk12 foreign key (cs_call_center_sk) references call_center (cc_call_center_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_sales add constraint fk13 foreign key (cs_catalog_page_sk) references catalog_page (cp_catalog_page_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_sales add constraint fk14 foreign key (cs_ship_mode_sk) references ship_mode (sm_ship_mode_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_sales add constraint fk15 foreign key (cs_warehouse_sk) references warehouse (w_warehouse_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_sales add constraint fk16a foreign key (cs_item_sk) references item (i_item_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_sales add constraint fk17 foreign key (cs_promo_sk) references promotion (p_promo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_returns add constraint fk1 foreign key (cr_returned_date_sk) references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work;
  • 31. Page | 31 alter table catalog_returns add constraint fk2 foreign key (cr_returned_time_sk) references time_dim (t_time_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_returns add constraint fk3 foreign key (cr_item_sk, cr_order_number) references catalog_sales (cs_item_sk, cs_order_number) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_returns add constraint fk4 foreign key (cr_item_sk) references item (i_item_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_returns add constraint fk5 foreign key (cr_refunded_customer_sk) references customer (c_customer_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_returns add constraint fk6 foreign key (cr_refunded_cdemo_sk) references customer_demographics (cd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_returns add constraint fk7 foreign key (cr_refunded_hdemo_sk) references household_demographics (hd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_returns add constraint fk8 foreign key (cr_refunded_addr_sk) references customer_address (ca_address_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_returns add constraint fk9 foreign key (cr_returning_customer_sk) references customer (c_customer_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_returns add constraint fk10 foreign key (cr_returning_cdemo_sk) references customer_demographics (cd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_returns add constraint fk11 foreign key (cr_returning_hdemo_sk) references household_demographics (hd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_returns add constraint fk12 foreign key (cr_returning_addr_sk) references customer_address (ca_address_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_returns add constraint fk13 foreign key (cr_call_center_sk) references call_center (cc_call_center_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_returns add constraint fk14 foreign key (cr_catalog_page_sk) references catalog_page (cp_catalog_page_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_returns add constraint fk15 foreign key (cr_ship_mode_sk) references ship_mode (sm_ship_mode_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table catalog_returns add constraint fk16 foreign key (cr_warehouse_sk) references warehouse (w_warehouse_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work;
  • 32. Page | 32 alter table catalog_returns add constraint fk17 foreign key (cr_reason_sk) references reason (r_reason_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_sales add constraint fk1 foreign key (ws_sold_date_sk) references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_sales add constraint fk2 foreign key (ws_sold_time_sk) references time_dim (t_time_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_sales add constraint fk3 foreign key (ws_ship_date_sk) references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_sales add constraint fk4a foreign key (ws_item_sk) references item (i_item_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_sales add constraint fk5 foreign key (ws_bill_customer_sk) references customer (c_customer_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_sales add constraint fk6 foreign key (ws_bill_cdemo_sk) references customer_demographics (cd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_sales add constraint fk7 foreign key (ws_bill_hdemo_sk) references household_demographics (hd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_sales add constraint fk8 foreign key (ws_bill_addr_sk) references customer_address (ca_address_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_sales add constraint fk9 foreign key (ws_ship_customer_sk) references customer (c_customer_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_sales add constraint fk10 foreign key (ws_ship_cdemo_sk) references customer_demographics (cd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_sales add constraint fk11 foreign key (ws_ship_hdemo_sk) references household_demographics (hd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_sales add constraint fk12 foreign key (ws_ship_addr_sk) references customer_address (ca_address_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_sales add constraint fk13 foreign key (ws_web_page_sk) references web_page (wp_web_page_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_sales add constraint fk14 foreign key (ws_web_site_sk) references web_site (web_site_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_sales
  • 33. Page | 33 add constraint fk15 foreign key (ws_ship_mode_sk) references ship_mode (sm_ship_mode_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_sales add constraint fk16 foreign key (ws_warehouse_sk) references warehouse (w_warehouse_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_sales add constraint fk17 foreign key (ws_promo_sk) references promotion (p_promo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_returns add constraint fk1 foreign key (wr_returned_date_sk) references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_returns add constraint fk2 foreign key (wr_returned_time_sk) references time_dim (t_time_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_returns add constraint fk3a foreign key (wr_item_sk) references item (i_item_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_returns add constraint fk3b foreign key (wr_item_sk, wr_order_number) references web_sales (ws_item_sk, ws_order_number) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_returns add constraint fk4 foreign key (wr_refunded_customer_sk) references customer (c_customer_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_returns add constraint fk5 foreign key (wr_refunded_cdemo_sk) references customer_demographics (cd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_returns add constraint fk6 foreign key (wr_refunded_hdemo_sk) references household_demographics (hd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_returns add constraint fk7 foreign key (wr_refunded_addr_sk) references customer_address (ca_address_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_returns add constraint fk8 foreign key (wr_returning_customer_sk) references customer (c_customer_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_returns add constraint fk9 foreign key (wr_returning_cdemo_sk) references customer_demographics (cd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_returns add constraint fk10 foreign key (wr_returning_hdemo_sk) references household_demographics (hd_demo_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_returns add constraint fk11 foreign key (wr_returning_addr_sk) references customer_address (ca_address_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_returns add constraint fk12 foreign key (wr_web_page_sk)
  • 34. Page | 34 references web_page (wp_web_page_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table web_returns add constraint fk13 foreign key (wr_reason_sk) references reason (r_reason_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table inventory add constraint fk1 foreign key (inv_date_sk) references date_dim (d_date_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table inventory add constraint fk2 foreign key (inv_item_sk) references item (i_item_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; alter table inventory add constraint fk3 foreign key (inv_warehouse_sk) references warehouse (w_warehouse_sk) NOT ENFORCED ENABLE QUERY OPTIMIZATION; commit work; Collect Statistics 060.analyze-withCGS.jsq: The following script was used to collect statistics for the database. Distribution stats were collected for every column in the database and group distribution stats were collected for the composite primary keys in the 7 fact tables. set schema $schema; ANALYZE TABLE call_center COMPUTE STATISTICS FOR COLUMNS cc_call_center_sk, cc_call_center_id, cc_rec_start_date, cc_rec_end_date, cc_closed_date_sk, cc_open_date_sk, cc_name, cc_class, cc_employees, cc_sq_ft, cc_hours, cc_manager, cc_mkt_id, cc_mkt_class, cc_mkt_desc, cc_market_manager, cc_division, cc_division_name, cc_company, cc_company_name, cc_street_number, cc_street_name, cc_street_type, cc_suite_number, cc_city, cc_county, cc_state, cc_zip, cc_country, cc_gmt_offset, cc_tax_percentage; ANALYZE TABLE catalog_page COMPUTE STATISTICS FOR COLUMNS cp_catalog_page_sk, cp_catalog_page_id, cp_start_date_sk, cp_end_date_sk, cp_department, cp_catalog_number, cp_catalog_page_number, cp_description, cp_type; ANALYZE TABLE catalog_returns COMPUTE STATISTICS FOR COLUMNS cr_returned_date_sk, cr_returned_time_sk, cr_item_sk, cr_refunded_customer_sk, cr_refunded_cdemo_sk, cr_refunded_hdemo_sk, cr_refunded_addr_sk, cr_returning_customer_sk, cr_returning_cdemo_sk, cr_returning_hdemo_sk, cr_returning_addr_sk, cr_call_center_sk, cr_catalog_page_sk, cr_ship_mode_sk, cr_warehouse_sk, cr_reason_sk, cr_order_number, cr_return_quantity, cr_return_amount, cr_return_tax, cr_return_amt_inc_tax, cr_fee, cr_return_ship_cost, cr_refunded_cash, cr_reversed_charge, cr_store_credit, cr_net_loss, (cr_item_sk, cr_order_number); ANALYZE TABLE catalog_sales COMPUTE STATISTICS FOR COLUMNS cs_sold_date_sk, cs_sold_time_sk, cs_ship_date_sk, cs_bill_customer_sk, cs_bill_cdemo_sk, cs_bill_hdemo_sk, cs_bill_addr_sk, cs_ship_customer_sk, cs_ship_cdemo_sk, cs_ship_hdemo_sk, cs_ship_addr_sk, cs_call_center_sk, cs_catalog_page_sk, cs_ship_mode_sk, cs_warehouse_sk, cs_item_sk, cs_promo_sk, cs_order_number, cs_quantity, cs_wholesale_cost, cs_list_price, cs_sales_price, cs_ext_discount_amt, cs_ext_sales_price, cs_ext_wholesale_cost, cs_ext_list_price, cs_ext_tax, cs_coupon_amt, cs_ext_ship_cost, cs_net_paid, cs_net_paid_inc_tax, cs_net_paid_inc_ship, cs_net_paid_inc_ship_tax, cs_net_profit, (cs_item_sk, cs_order_number); ANALYZE TABLE customer COMPUTE STATISTICS FOR COLUMNS c_customer_sk, c_customer_id, c_current_cdemo_sk, c_current_hdemo_sk, c_current_addr_sk, c_first_shipto_date_sk, c_first_sales_date_sk, c_salutation, c_first_name, c_last_name, c_preferred_cust_flag, c_birth_day, c_birth_month, c_birth_year, c_birth_country, c_login, c_email_address, c_last_review_date; ANALYZE TABLE customer_address COMPUTE STATISTICS FOR COLUMNS ca_address_sk, ca_address_id, ca_street_number, ca_street_name, ca_street_type, ca_suite_number, ca_city, ca_county, ca_state, ca_zip, ca_country, ca_gmt_offset, ca_location_type; ANALYZE TABLE customer_demographics COMPUTE STATISTICS FOR COLUMNS cd_demo_sk, cd_gender, cd_marital_status, cd_education_status, cd_purchase_estimate, cd_credit_rating, cd_dep_count, cd_dep_employed_count, cd_dep_college_count;
  • 35. Page | 35 ANALYZE TABLE date_dim COMPUTE STATISTICS FOR COLUMNS d_date_sk, d_date_id, d_date, d_month_seq, d_week_seq, d_quarter_seq, d_year, d_dow, d_moy, d_dom, d_qoy, d_fy_year, d_fy_quarter_seq, d_fy_week_seq, d_day_name, d_quarter_name, d_holiday, d_weekend, d_following_holiday, d_first_dom, d_last_dom, d_same_day_ly, d_same_day_lq, d_current_day, d_current_week, d_current_month, d_current_quarter, d_current_year; ANALYZE TABLE household_demographics COMPUTE STATISTICS FOR COLUMNS hd_demo_sk, hd_income_band_sk, hd_buy_potential, hd_dep_count, hd_vehicle_count; ANALYZE TABLE income_band COMPUTE STATISTICS FOR COLUMNS ib_income_band_sk, ib_lower_bound, ib_upper_bound; ANALYZE TABLE inventory COMPUTE STATISTICS FOR COLUMNS inv_date_sk, inv_item_sk, inv_warehouse_sk, inv_quantity_on_hand, (inv_date_sk, inv_item_sk, inv_warehouse_sk); ANALYZE TABLE item COMPUTE STATISTICS FOR COLUMNS i_item_sk, i_item_id, i_rec_start_date, i_rec_end_date, i_item_desc, i_current_price, i_wholesale_cost, i_brand_id, i_brand, i_class_id, i_class, i_category_id, i_category, i_manufact_id, i_manufact, i_size, i_formulation, i_color, i_units, i_container, i_manager_id, i_product_name; ANALYZE TABLE promotion COMPUTE STATISTICS FOR COLUMNS p_promo_sk, p_promo_id, p_start_date_sk, p_end_date_sk, p_item_sk, p_cost, p_response_target, p_promo_name, p_channel_dmail, p_channel_email, p_channel_catalog, p_channel_tv, p_channel_radio, p_channel_press, p_channel_event, p_channel_demo, p_channel_details, p_purpose, p_discount_active; ANALYZE TABLE reason COMPUTE STATISTICS FOR COLUMNS r_reason_sk, r_reason_id, r_reason_desc; ANALYZE TABLE ship_mode COMPUTE STATISTICS FOR COLUMNS sm_ship_mode_sk, sm_ship_mode_id, sm_type, sm_code, sm_carrier, sm_contract; ANALYZE TABLE store COMPUTE STATISTICS FOR COLUMNS s_store_sk, s_store_id, s_rec_start_date, s_rec_end_date, s_closed_date_sk, s_store_name, s_number_employees, s_floor_space, s_hours, s_manager, s_market_id, s_geography_class, s_market_desc, s_market_manager, s_division_id, s_division_name, s_company_id, s_company_name, s_street_number, s_street_name, s_street_type, s_suite_number, s_city, s_county, s_state, s_zip, s_country, s_gmt_offset, s_tax_precentage; ANALYZE TABLE store_returns COMPUTE STATISTICS FOR COLUMNS sr_returned_date_sk, sr_return_time_sk, sr_item_sk, sr_customer_sk, sr_cdemo_sk, sr_hdemo_sk, sr_addr_sk, sr_store_sk, sr_reason_sk, sr_ticket_number, sr_return_quantity, sr_return_amt, sr_return_tax, sr_return_amt_inc_tax, sr_fee, sr_return_ship_cost, sr_refunded_cash, sr_reversed_charge, sr_store_credit, sr_net_loss, (sr_item_sk, sr_ticket_number); ANALYZE TABLE store_sales COMPUTE STATISTICS FOR COLUMNS ss_sold_date_sk, ss_sold_time_sk, ss_item_sk, ss_customer_sk, ss_cdemo_sk, ss_hdemo_sk, ss_addr_sk, ss_store_sk, ss_promo_sk, ss_ticket_number, ss_quantity, ss_wholesale_cost, ss_list_price, ss_sales_price, ss_ext_discount_amt, ss_ext_sales_price, ss_ext_wholesale_cost, ss_ext_list_price, ss_ext_tax, ss_coupon_amt, ss_net_paid, ss_net_paid_inc_tax, ss_net_profit, (ss_item_sk, ss_ticket_number); ANALYZE TABLE time_dim COMPUTE STATISTICS FOR COLUMNS t_time_sk, t_time_id, t_time, t_hour, t_minute, t_second, t_am_pm, t_shift, t_sub_shift, t_meal_time; ANALYZE TABLE warehouse COMPUTE STATISTICS FOR COLUMNS w_warehouse_sk, w_warehouse_id, w_warehouse_name, w_warehouse_sq_ft, w_street_number, w_street_name, w_street_type, w_suite_number, w_city, w_county, w_state, w_zip, w_country, w_gmt_offset; ANALYZE TABLE web_page COMPUTE STATISTICS FOR COLUMNS wp_web_page_sk, wp_web_page_id, wp_rec_start_date, wp_rec_end_date, wp_creation_date_sk, wp_access_date_sk, wp_autogen_flag, wp_customer_sk, wp_url, wp_type, wp_char_count, wp_link_count, wp_image_count, wp_max_ad_count; ANALYZE TABLE web_returns COMPUTE STATISTICS FOR COLUMNS wr_returned_date_sk, wr_returned_time_sk, wr_item_sk, wr_refunded_customer_sk, wr_refunded_cdemo_sk, wr_refunded_hdemo_sk, wr_refunded_addr_sk, wr_returning_customer_sk, wr_returning_cdemo_sk, wr_returning_hdemo_sk, wr_returning_addr_sk, wr_web_page_sk, wr_reason_sk, wr_order_number, wr_return_quantity, wr_return_amt, wr_return_tax, wr_return_amt_inc_tax, wr_fee, wr_return_ship_cost, wr_refunded_cash, wr_reversed_charge, wr_account_credit, wr_net_loss, (wr_item_sk, wr_order_number); ANALYZE TABLE web_sales COMPUTE STATISTICS FOR COLUMNS ws_sold_date_sk, ws_sold_time_sk, ws_ship_date_sk, ws_item_sk, ws_bill_customer_sk, ws_bill_cdemo_sk, ws_bill_hdemo_sk, ws_bill_addr_sk, ws_ship_customer_sk, ws_ship_cdemo_sk, ws_ship_hdemo_sk, ws_ship_addr_sk, ws_web_page_sk, ws_web_site_sk, ws_ship_mode_sk, ws_warehouse_sk, ws_promo_sk, ws_order_number, ws_quantity, ws_wholesale_cost, ws_list_price, ws_sales_price, ws_ext_discount_amt, ws_ext_sales_price, ws_ext_wholesale_cost, ws_ext_list_price, ws_ext_tax, ws_coupon_amt, ws_ext_ship_cost, ws_net_paid, ws_net_paid_inc_tax, ws_net_paid_inc_ship, ws_net_paid_inc_ship_tax, ws_net_profit, (ws_item_sk, ws_order_number); ANALYZE TABLE web_site COMPUTE STATISTICS FOR COLUMNS web_site_sk, web_site_id, web_rec_start_date, web_rec_end_date, web_name, web_open_date_sk, web_close_date_sk, web_class, web_manager, web_mkt_id, web_mkt_class, web_mkt_desc, web_market_manager, web_company_id, web_company_name, web_street_number,
  • 36. Page | 36 web_street_name, web_street_type, web_suite_number, web_city, web_county, web_state, web_zip, web_country, web_gmt_offset, web_tax_percentage; 064.statviews.sh: This script was used to create “statviews” and collect statistics about them. The statviews give Big SQL‟s optimizer more information about joins on PK-FK columns. Only a subset of joins are modeled. DBNAME=$1 schema=$2 db2 connect to ${DBNAME} db2 -v set schema ${schema} # workaround for bug with statviews # need to select from any random table at the begninning of the connection # or we'll get a -901 during runstats on CS_GVIEW or CR_GVIEW db2 -v "select count(*) from date_dim" db2 -v "drop view cr_gview" db2 -v "drop view cs_gview" db2 -v "drop view sr_gview" db2 -v "drop view ss_gview" db2 -v "drop view wr_gview" db2 -v "drop view ws_gview" db2 -v "drop view c_gview" db2 -v "drop view inv_gview" db2 -v "drop view sv_date_dim" db2 -v "create view CR_GVIEW (c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64, c65, c66, c67, c68, c69, c70, c71, c72, c73, c74, c75, c76, c77, c78, c79, c80, c81, c82, c83,c84, c85, c86, c87, c88, c89, c90, c91, c92, c93, c94, c95, c96, c97, c98, c99, d_d_date) as ( select T2.*, T3.*, T4.*, T5.*, T6.*, T7.*, DATE(T5.D_DATE) as D_D_DATE from CATALOG_RETURNS as T1, CATALOG_PAGE as T2, CUSTOMER_ADDRESS as T3, CUSTOMER as T4, DATE_DIM as T5, CUSTOMER_ADDRESS as T6, CUSTOMER as T7 where T1.CR_CATALOG_PAGE_SK = T2.CP_CATALOG_PAGE_SK and T1.CR_REFUNDED_ADDR_SK = T3.CA_ADDRESS_SK and T1.CR_REFUNDED_CUSTOMER_SK = T4.C_CUSTOMER_SK and T1.CR_RETURNED_DATE_SK = T5.D_DATE_SK and T1.CR_RETURNING_ADDR_SK = T6.CA_ADDRESS_SK and T1.CR_RETURNING_CUSTOMER_SK = T7.C_CUSTOMER_SK )" db2 -v "create view CS_GVIEW (c1, c2, c3, c4, c5,c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64, c65, c66, c67, c68, c69, c70, c71, c72, c73, c74, c75, c76, c77, c78, c79, c80, c81, c82, c83,c84, c85, c86, c87, c88, c89, c90, c91, c92, c93, c94, c95, c96, c97, c98, c99, c100, c101, d_d_date1, d_d_date2) as ( select T2.*, T3.*, T4.*, T5.*, T6.*, DATE(T4.D_DATE) as D_D_DATE1, DATE(T6.D_DATE) as D_D_DATE2 from CATALOG_SALES as T1, CUSTOMER as T2, CATALOG_PAGE as T3, DATE_DIM as T4, CUSTOMER as T5, DATE_DIM as T6 where T1.CS_BILL_CUSTOMER_SK = T2.C_CUSTOMER_SK and T1.CS_CATALOG_PAGE_SK = T3.CP_CATALOG_PAGE_SK and T1.CS_SHIP_DATE_SK = T4.D_DATE_SK and T1.CS_SHIP_CUSTOMER_SK = T5.C_CUSTOMER_SK and T1.CS_SOLD_DATE_SK = T6.D_DATE_SK )" db2 -v "create view SR_GVIEW as ( select T2.*, T3.*, T4.*, T5.*, DATE(T3.D_DATE) as D_D_DATE from STORE_RETURNS as T1, CUSTOMER as T2, DATE_DIM as T3, TIME_DIM as T4, STORE as T5
  • 37. Page | 37 where T1.SR_CUSTOMER_SK = T2.C_CUSTOMER_SK and T1.SR_RETURNED_DATE_SK = T3.D_DATE_SK and T1.SR_RETURN_TIME_SK = T4.T_TIME_SK and T1.SR_STORE_SK = T5.S_STORE_SK )" db2 -v "create view SS_GVIEW as ( select T2.*, T3.*, T4.*, DATE(T2.D_DATE) as D_D_DATE from STORE_SALES as T1, DATE_DIM as T2, TIME_DIM as T3, STORE as T4 where T1.SS_SOLD_DATE_SK = T2.D_DATE_SK and T1.SS_SOLD_TIME_SK = T3.T_TIME_SK and T1.SS_STORE_SK = T4.S_STORE_SK )" db2 -v "create view WR_GVIEW (c1, c2, c3, c4, c5,c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64, c65, c66, c67, c68, c69, c70, c71, c72, c73, c74, c75, c76, c77, c78, c79, c80, c81, c82, c83,c84, c85, c86, c87, c88, c89, c90, c91, c92, c93, c94, c95, c96, c97, c98, c99, c100, c101, c102, c103, c104, c105, c106, c107, c108, D_D_DATE) as ( select T2.*, T3.*, T4.*, T5.*, T6.*, T7.*, T8.*, DATE(T5.D_DATE) as D_D_DATE from WEB_RETURNS as T1, CUSTOMER_ADDRESS as T2, CUSTOMER_DEMOGRAPHICS as T3, CUSTOMER as T4, DATE_DIM as T5, CUSTOMER_ADDRESS as T6, CUSTOMER_DEMOGRAPHICS as T7, CUSTOMER as T8 where T1.WR_REFUNDED_ADDR_SK = T2.CA_ADDRESS_SK and T1.WR_REFUNDED_CDEMO_SK = T3.CD_DEMO_SK and T1.WR_REFUNDED_CUSTOMER_SK = T4.C_CUSTOMER_SK and T1.WR_RETURNED_DATE_SK = T5.D_DATE_SK and T1.WR_RETURNING_ADDR_SK = T6.CA_ADDRESS_SK and T1.WR_RETURNING_CDEMO_SK = T7.CD_DEMO_SK and T1.WR_RETURNING_CUSTOMER_SK = T8.C_CUSTOMER_SK )" db2 -v "create view WS_GVIEW (c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64, c65, c66, c67, c68, c69, c70, c71, c72, c73, c74, c75, c76, c77, c78, c79, c80, c81, c82, c83,c84, c85, c86, c87, c88, c89, c90, c91, c92, D_D_DATE, E_D_DATE) as ( select T2.*, T3.*, T4.*, T5.*, DATE(T3.D_DATE) as D_D_DATE, DATE(T5.D_DATE) as E_D_DATE from WEB_SALES as T1, CUSTOMER as T2, DATE_DIM as T3, CUSTOMER as T4, DATE_DIM as T5 where T1.WS_BILL_CUSTOMER_SK = T2.C_CUSTOMER_SK and T1.WS_SHIP_CUSTOMER_SK = T4.C_CUSTOMER_SK and T1.WS_SHIP_DATE_SK = T3.D_DATE_SK and T1.WS_SOLD_DATE_SK = T5.D_DATE_SK )" db2 -v "create view C_GVIEW (c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12, c13, c14, c15, c16, c17, c18, c19, c20, c21, c22, c23, c24, c25, c26, c27, c28, c29, c30, c31, c32, c33, c34, c35, c36, c37, c38, c39, c40, c41, c42, c43, c44, c45, c46, c47, c48, c49, c50, c51, c52, c53, c54, c55, c56, c57, c58, c59, c60, c61, c62, c63, c64, c65, c66, c67, c68, c69, c70, c71, c72, c73, c74, c75, c76, c77, c78, D_D_DATE, E_D_DATE) as ( select T2.*, T3.*, T4.*, T5.*, DATE(T4.D_DATE) as D_D_DATE, DATE(T5.D_DATE) as E_D_DATE from CUSTOMER as T1, CUSTOMER_ADDRESS as T2, CUSTOMER_DEMOGRAPHICS as T3, DATE_DIM as T4, DATE_DIM as T5 where T1.C_CURRENT_ADDR_SK = T2.CA_ADDRESS_SK and T1.C_CURRENT_CDEMO_SK = T3.CD_DEMO_SK and T1.C_FIRST_SALES_DATE_SK = T4.D_DATE_SK and T1.C_FIRST_SHIPTO_DATE_SK = T5.D_DATE_SK )" db2 -v "create view INV_GVIEW as (select T2.*, DATE(T2.D_DATE) as D_D_DATE from INVENTORY as T1, DATE_DIM as T2 where T1.INV_DATE_SK=T2.D_DATE_SK)" db2 -v "create view SV_DATE_DIM as (select date(d_date) as d_d_date from DATE_DIM)" db2 -v "alter view CR_GVIEW enable query optimization"
  • 38. Page | 38 db2 -v "alter view CS_GVIEW enable query optimization" db2 -v "alter view SR_GVIEW enable query optimization" db2 -v "alter view SS_GVIEW enable query optimization" db2 -v "alter view WR_GVIEW enable query optimization" db2 -v "alter view WS_GVIEW enable query optimization" db2 -v "alter view C_GVIEW enable query optimization" db2 -v "alter view INV_GVIEW enable query optimization" db2 -v "alter view SV_DATE_DIM enable query optimization" # workaround for bug with statviews # need to run first runstats twice or we don't actually get any stats time db2 -v "runstats on table SV_DATE_DIM with distribution" time db2 -v "runstats on table SV_DATE_DIM with distribution" time db2 -v "runstats on table CR_GVIEW with distribution tablesample BERNOULLI(1)" time db2 -v "runstats on table CS_GVIEW with distribution tablesample BERNOULLI(1)" time db2 -v "runstats on table SR_GVIEW with distribution tablesample BERNOULLI(1)" time db2 -v "runstats on table SS_GVIEW with distribution tablesample BERNOULLI(1)" time db2 -v "runstats on table WR_GVIEW with distribution tablesample BERNOULLI(1)" time db2 -v "runstats on table WS_GVIEW with distribution tablesample BERNOULLI(1)" time db2 -v "runstats on table C_GVIEW with distribution tablesample BERNOULLI(1)" time db2 -v "runstats on table INV_GVIEW with distribution tablesample BERNOULLI(1)" db2 commit db2 terminate
  • 39. Page | 39 Appendix C: Tuning Installation options: During install, the following Big SQL properties were set. Node resource percentage was set to 90% in order to provide as much of the cluster resources as possible to Big SQL: Big SQL administrator user: bigsql Big SQL FCM start port: 62000 Big SQL 1 server port: 7052 Scheduler service port: 7053 Scheduler administration port: 7054 Big SQL server port: 51000 Node resources percentage: 90% The following disk layout is in accordance with current BigInsights and Big SQL 3.0 best practices which recommend distributing all I/O for the Hadoop cluster across all disks: BigSQL2 data directory: /data1/db2/bigsql,/data2/db2/bigsql,/data3/db2/bigsql,/data4/db2/bigsql,/data5/db2/bigsql,/data6/db2/bigsql, /data7/db2/bigsql,/data8/db2/bigsql,/data9/db2/bigsql Cache directory: /data1/hadoop/mapred/local,/data2/hadoop/mapred/local,/data3/hadoop/mapred/local,/data4/hadoop/mapred/local, /data5/hadoop/mapred/local,/data6/hadoop/mapred/local,/data7/hadoop/mapred/local,/data8/hadoop/mapred/local, /data9/hadoop/mapred/local DataNode data directory: /data1/hadoop/hdfs/data,/data2/hadoop/hdfs/data,/data3/hadoop/hdfs/data,/data4/hadoop/hdfs/data,/data5/hadoop/hdfs/data,/data6/hadoop/hdfs/data,/data7/hadoop/hdfs/data,/data8/hadoop/hdfs/data,/data9/hadoop/hdfs/data Big SQL tuning options: ## Configured for 128 GB of memory per node ## 30 GB bufferpool ## 3.125 GB sortheap / 50 GB sheapthres_shr ## reader memory: 20% of total memory by default (user can raise it to 30%) ## ## other useful conf changes: ## mapred-site.xml ## mapred.tasktracker.map.tasks.maximum=20 ## mapred.tasktracker.reduce.tasks.maximum=6 ## mapreduce.map.java.opts="-Xmx3000m ..." ## mapreduce.reduce.java.opts="-Xmx3000m ..." ## ## bigsql-conf.xml ## dfsio.num_scanner_threads=12 ## dfsio.read_size=4194304 ## dfsio.num_threads_per_disk=2 ## scheduler.client.request.timeout=600000 DBNAME=$1
  • 40. Page | 40 db2 connect to ${DBNAME} db2 -v "call syshadoop.big_sql_service_mode('on')" db2 -v "alter bufferpool IBMDEFAULTBP size 891520 " db2 -v "alter tablespace TEMPSPACE1 no file system caching" db2 -v "update db cfg for ${DBNAME} using sortheap 819200 sheapthres_shr 13107200" db2 -v "update db cfg for ${DBNAME} using dft_degree 8" db2 -v "update dbm cfg using max_querydegree ANY" db2 -v "update dbm cfg using aslheapsz 15" db2 -v "update dbm cfg using cpuspeed 1.377671e-07" db2 -v "update dbm cfg using INSTANCE_MEMORY 85" db2 -v "update dbm cfg using CONN_ELAPSE 18" ## Disable auto maintenance db2 -v "update db cfg for bigsql using AUTO_MAINT OFF AUTO_TBL_MAINT OFF AUTO_RUNSTATS OFF AUTO_STMT_STATS OFF" db2 terminate BigInsights mapred-site.xml tuning: The following changes (highlighted) were made to the Hadoop mapred-site.xml file to tune the number of map-reduce slots, and the maximum memory allocated to these slots. In Big SQL, Map-Reduce is used for the LOAD and ANALYZE commands only, not query execution. The properties were tuned in order to get the best possible performance from these commands. <property> <!-- The maximum number of map tasks that will be run simultaneously by a task tracker. Default: 2. Recommendations: set relevant to number of CPUs and amount of memory on each data node. --> <name>mapred.tasktracker.map.tasks.maximum</name> <!--value><%= Math.max(2, Math.ceil(0.66 * Math.min(numOfDisks, numOfCores, totalMem/1000) * 1.75) - 2) %></value--> <value>20</value> </property> <property> <!-- The maximum number of reduce tasks that will be run simultaneously by a task tracker. Default: 2. Recommendations: set relevant to number of CPUs and amount of memory on each data node, note that reduces usually take more memory and do more I/O than maps. --> <name>mapred.tasktracker.reduce.tasks.maximum</name> <!--value><%= Math.max(2, Math.ceil(0.33 * Math.min(numOfDisks, numOfCores, totalMem/1000) * 1.75) - 2)%></value--> <value>6</value> </property> <property>
  • 41. Page | 41 <!-- Max heap of child JVM spawned by tasktracker. Ideally as large as the task machine can afford. The default -Xmx200m is usually too small. --> <name>mapreduce.map.java.opts</name> <value>-Xmx3000m -Xms1000m -Xmn100m -Xtune:virtualized - Xshareclasses:name=mrscc_%g,groupAccess,cacheDir=/var/ibm/biginsights/hadoop/tmp,nonFatal -Xscmx20m - Xdump:java:file=/var/ibm/biginsights/hadoop/tmp/javacore.%Y%m%d.%H%M%S.%pid.%seq.txt - Xdump:heap:file=/var/ibm/biginsights/hadoop/tmp/heapdump.%Y%m%d.%H%M%S.%pid.%seq.phd</value> </property> <property> <!-- Max heap of child JVM spawned by tasktracker. Ideally as large as the task machine can afford. The default -Xmx200m is usually too small. --> <name>mapreduce.reduce.java.opts</name> <value>-Xmx3000m -Xms1000m -Xmn100m -Xtune:virtualized - Xshareclasses:name=mrscc_%g,groupAccess,cacheDir=/var/ibm/biginsights/hadoop/tmp,nonFatal -Xscmx20m - Xdump:java:file=/var/ibm/biginsights/hadoop/tmp/javacore.%Y%m%d.%H%M%S.%pid.%seq.txt - Xdump:heap:file=/var/ibm/biginsights/hadoop/tmp/heapdump.%Y%m%d.%H%M%S.%pid.%seq.phd</value> </property> Big SQL dfs reader options: The following properties were changed in the Big SQL bigsql-conf.xml file to tune dfs reader properties: <property> <!-- Number of threads reading from each disk. Set this to 0 to use default values. --> <name>dfsio.num_threads_per_disk</name> <value>2</value> <!--value>0</value--> </property> <property> <!-- Read Size (in bytes) - Size of the reads sent to Hdfs (i.e., also the max I/O read buffer size). Default is 8*1024*1024 = 8388608 bytes --> <name>dfsio.read_size</name> <value>4194304</value> <!--value>8388608</value--> </property> ….. <property> <!-- (Advanced) Cap on the number of scanner threads that will be created. If set to 0, the system decides. --> <name>dfsio.num_scanner_threads</name> <value>12</value> </property> Big SQL dfs logging: The minLogLevel property was changed in the Big SQL glog-dfsio.properties file to reduce the amount of logging by the dfs readers:
  • 42. Page | 42 glog_enabled=true log_dir=/var/ibm/biginsights/bigsql/logs log_filename=bigsql-ndfsio.log # 0 - INFO # 1 - WARN # 2 - ERROR # 3 - FATAL minloglevel=3 OS Storage: The following script was used to create ext4 filesystems on all disks (to be used to store data) on all nodes in the cluster (inc. the master) – in-line with Big SQL best practices. Note- a single one SSD was used for swap during the test, the rest were unused: #!/bin/bash # READ / WRITE Performance tests for EXT4 file systems # Author - Stewart Tate, [email protected] # Copyright (C) 2013, IBM Corp. All rights reserved.: ################################################################# # the follow is server unique and MUST be adjusted! # ################################################################# drives=(b g h i j k l m n) SSDdrives=(c d e f) echo "Create EXT4 file systems, version 130213b" echo " " pause() { sleep 2 } # make ext4 file systems on HDDs echo "Create EXT4 file systems on HDDs" for dev_range in ${drives[@]} do echo "y" | mkfs.ext4 -b 4096 -O dir_index,extent /dev/sd$dev_range done for dev_range in ${drives[@]} do parted /dev/sd$dev_range print done pause # make ext4 file systems on SSDs echo "Create EXT4 file systems on SSDs" for dev_range in ${SSDdrives[@]} do echo "y" | mkfs.ext4 -b 4096 -O dir_index,extent /dev/sd$dev_range done for dev_range in ${SSDdrives[@]} do parted /dev/sd$dev_range print
  • 43. Page | 43 echo "Partitions aligned(important for performance) if following returns 0:" blockdev --getalignoff /dev/sd$dev_range done exit The filesystems are then mounted using the following script: #!/bin/bash # READ / WRITE Performance tests for EXT4 file systems # Author - Stewart Tate, [email protected] # Copyright (C) 2013, IBM Corp. All rights reserved.: ################################################################# # the follow is server unique and MUST be adjusted! # ################################################################# drives=(b g h i j k l m n) SSDdrives=(c d e f) echo "Mount EXT4 file systems, version 130213b" echo " " pause() { sleep 2 } j=0 echo "Create EXT4 mount points for HDDs" for i in ${drives[@]} do let j++ mkdir /data$j mount -vs -t ext4 -o nobarrier,noatime,nodiratime,nobh,nouser_xattr,data=writeback,commit=100 /dev/sd$i /data$j done j=0 echo "Create EXT4 mount points for SSDs" for i in ${SSDdrives[@]} do let j++ mkdir /datassd$j mount -vs -t ext4 -o nobarrier,noatime,nodiratime,discard,nobh,nouser_xattr,data=writeback,commit=100 /dev/sd$i /datassd$j done echo "Done." exit OS kernel changes: echo 0 > /proc/sys/vm/swappiness echo "net.ipv6.conf.all.disable_ipv6 = 1" >> /etc/sysctl.conf Active Hadoop components: In order to release valuable resources on the cluster only the following BigInsights components were started during the single- and multi-stream runs: bigsql, Hadoop, hive, catalog, zookeeper and console.
  • 44. Page | 44 Appendix D: Scaling and Database Population The following table details the cardinality of each table along with its on disk size stored in the parquet format. Table Cardinality Size on disk (in bytes) call_center 60 13373 catalog_page 46000 2843783 catalog_returns 4319733388 408957106444 catalog_sales 43198299410 4094116501072 customer 80000000 4512791675 customer_address 40000000 986555270 customer_demographics 1920800 7871742 date_dim 73049 1832116 household_demographics 7200 30851 income_band 20 692 inventory 1627857000 8646662210 item 462000 46060058 promotion 2300 116753 reason 72 1890 ship_mode 20 1497 store 1704 154852 store_returns 8639847757 656324500809 store_sales 86400432613 5623520800788 time_dim 86400 1134623 warehouse 27 3782 web_page 4602 86318 web_returns 2160007345 205828528963 web_sales 21600036511 2017966660709 web_site 84 15977 TOTAL 13020920276247
  • 45. Page | 45 Appendix E: Queries Query Generation The queries used in this workload were generated using the TPC dsqgen tool, which applies random parameter substitutions to a set of query templates. Each query “stream” consisted of the 99 queries in a different order and each had different random parameter substitutions applied. Query Template Modifications We modified 12 (out of 99) query templates so the queries would execute on Big SQL and match the TPC-DS supplied result sets against the 1GB qualification database. The following query modifications were made: Minor Query Modification Queries using the MQM Output formatting functions - Scalar functions whose sole purpose is to affect output formatting or intermediate arithmetic result precision (such as CASTs) may be applied to items in the outermost SELECT list of the query. query84, query97 Explicit Casting: For queries that divide values of two integer columns and compare the results with a decimal value, explicit casting into decimal of the integer columns is permissible for the purpose of matching the qualification output. Explicit integer division query 21, query34, query78, query83 Implicit integer division (e.g. avg()) query7, query22, query26, query27, query39 Date expressions - For queries that include an expression involving manipulation of dates (e.g., adding/subtracting days/months/years, or extracting years from dates), vendor-specific syntax may be used instead of the specified syntax. Replacement syntax must have equivalent semantic behavior. Examples of acceptable implementations include "YEAR(<column>)" to extract the year from a date column or "DATE(<date>) + 3 MONTHS" to add 3 months to a date. query72 Query Execution Order The execution order of queries within all streams is determined by the toolkit and is based upon a random number whose seed is the timestamp of the end of the load. In this instance, the queries were executed in the following order: Single-Stream Multi-Stream Stream 0 Stream 1 Stream 2 Stream 3 Stream 4 96 83 56 89 79 7 32 98 5 39 75 30 59 52 93 44 92 24 62 41 39 66 88 53 29 80 84 2 7 32 32 98 5 39 66 19 58 6 80 84 25 16 27 63 8 78 77 87 72 71 86 40 90 18 45 1 96 83 56 89 91 13 91 13 91
  • 46. Page | 46 21 36 28 69 14 43 95 68 23 46 27 63 8 19 58 94 99 76 74 48 45 3 75 30 59 58 6 80 84 2 64 12 1 96 83 36 28 69 14 21 33 85 26 10 78 46 51 11 86 40 62 41 17 94 99 16 27 63 8 19 10 78 77 87 72 63 8 19 58 6 69 14 21 36 28 60 50 31 37 81 59 52 93 4 44 37 81 54 38 97 98 5 39 66 88 85 26 10 78 77 70 57 15 43 95 67 82 55 22 33 28 69 14 21 36 81 54 38 97 61 97 61 42 47 35 66 88 53 29 60 90 18 45 3 75 17 94 99 76 74 47 35 67 82 55 95 68 23 46 51 92 24 62 41 17 3 75 30 59 52 51 11 86 40 90 35 67 82 55 22 49 9 25 16 27 9 25 16 27 63 31 37 81 54 38 11 86 40 90 18 93 4 44 92 24 29 60 50 31 37 38 97 61 42 47 22 33 85 26 10 89 79 73 34 70 15 43 95 68 23 6 80 84 2 7 52 93 4 44 92 50 31 37 81 54 42 47 35 67 82 41 17 94 99 76 8 19 58 6 80 12 1 96 83 56 20 64 12 1 96
  • 47. Page | 47 88 53 29 60 50 82 55 22 33 85 23 46 51 11 86 14 21 36 28 69 57 15 43 95 68 65 20 64 12 1 71 65 20 64 12 34 70 57 15 43 48 49 9 25 16 30 59 52 93 4 74 48 49 9 25 87 72 71 65 20 77 87 72 71 65 73 34 70 57 15 84 2 7 32 98 54 38 97 61 42 55 22 33 85 26 56 89 79 73 34 2 7 32 98 5 26 10 78 77 87 40 90 18 45 3 72 71 65 20 64 53 29 60 50 31 79 73 34 70 57 18 45 3 75 30 13 91 13 91 13 24 62 41 17 94 4 44 92 24 62 99 76 74 48 49 68 23 46 51 11 83 56 89 79 73 61 42 47 35 67 5 39 66 88 53 76 74 48 49 9 Query Text: The following is the full query text for all 99 queries executed in the single-stream run. -- start query 1 in stream 0 using template query96.tpl and seed 1798621055 select count(*) from store_sales ,household_demographics ,time_dim, store where ss_sold_time_sk = time_dim.t_time_sk and ss_hdemo_sk = household_demographics.hd_demo_sk and ss_store_sk = s_store_sk and time_dim.t_hour = 8 and time_dim.t_minute >= 30 and household_demographics.hd_dep_count = 6 and store.s_store_name = 'ese' order by count(*) fetch first 100 rows only; -- end query 1 in stream 0 using template query96.tpl -- start query 2 in stream 0 using template query7.tpl and seed 335100942 select i_item_id, avg(cast(ss_quantity as double)) agg1, avg(ss_list_price) agg2,
  • 48. Page | 48 avg(ss_coupon_amt) agg3, avg(ss_sales_price) agg4 from store_sales, customer_demographics, date_dim, item, promotion where ss_sold_date_sk = d_date_sk and ss_item_sk = i_item_sk and ss_cdemo_sk = cd_demo_sk and ss_promo_sk = p_promo_sk and cd_gender = 'M' and cd_marital_status = 'W' and cd_education_status = 'College' and (p_channel_email = 'N' or p_channel_event = 'N') and d_year = 2001 group by i_item_id order by i_item_id fetch first 100 rows only; -- end query 2 in stream 0 using template query7.tpl -- start query 3 in stream 0 using template query75.tpl and seed 1536248466 WITH all_sales AS ( SELECT d_year ,i_brand_id ,i_class_id ,i_category_id ,i_manufact_id ,SUM(sales_cnt) AS sales_cnt ,SUM(sales_amt) AS sales_amt FROM (SELECT d_year ,i_brand_id ,i_class_id ,i_category_id ,i_manufact_id ,cs_quantity - COALESCE(cr_return_quantity,0) AS sales_cnt ,cs_ext_sales_price - COALESCE(cr_return_amount,0.0) AS sales_amt FROM catalog_sales JOIN item ON i_item_sk=cs_item_sk JOIN date_dim ON d_date_sk=cs_sold_date_sk LEFT JOIN catalog_returns ON (cs_order_number=cr_order_number AND cs_item_sk=cr_item_sk) WHERE i_category='Home' UNION SELECT d_year ,i_brand_id ,i_class_id ,i_category_id ,i_manufact_id ,ss_quantity - COALESCE(sr_return_quantity,0) AS sales_cnt ,ss_ext_sales_price - COALESCE(sr_return_amt,0.0) AS sales_amt FROM store_sales JOIN item ON i_item_sk=ss_item_sk JOIN date_dim ON d_date_sk=ss_sold_date_sk LEFT JOIN store_returns ON (ss_ticket_number=sr_ticket_number AND ss_item_sk=sr_item_sk) WHERE i_category='Home' UNION SELECT d_year ,i_brand_id ,i_class_id ,i_category_id ,i_manufact_id ,ws_quantity - COALESCE(wr_return_quantity,0) AS sales_cnt ,ws_ext_sales_price - COALESCE(wr_return_amt,0.0) AS sales_amt FROM web_sales JOIN item ON i_item_sk=ws_item_sk JOIN date_dim ON d_date_sk=ws_sold_date_sk LEFT JOIN web_returns ON (ws_order_number=wr_order_number AND ws_item_sk=wr_item_sk) WHERE i_category='Home') sales_detail GROUP BY d_year, i_brand_id, i_class_id, i_category_id, i_manufact_id) SELECT prev_yr.d_year AS prev_year ,curr_yr.d_year AS year ,curr_yr.i_brand_id ,curr_yr.i_class_id ,curr_yr.i_category_id ,curr_yr.i_manufact_id ,prev_yr.sales_cnt AS prev_yr_cnt ,curr_yr.sales_cnt AS curr_yr_cnt ,curr_yr.sales_cnt- prev_yr.sales_cnt AS sales_cnt_diff ,curr_yr.sales_amt- prev_yr.sales_amt AS sales_amt_diff FROM all_sales curr_yr, all_sales prev_yr
  • 49. Page | 49 WHERE curr_yr.i_brand_id=prev_yr.i_brand_id AND curr_yr.i_class_id=prev_yr.i_class_id AND curr_yr.i_category_id=prev_yr.i_category_id AND curr_yr.i_manufact_id=prev_yr.i_manufact_id AND curr_yr.d_year=1999 AND prev_yr.d_year=1999-1 AND CAST(curr_yr.sales_cnt AS DECIMAL(17,2))/CAST(prev_yr.sales_cnt AS DECIMAL(17,2))<0.9 ORDER BY sales_cnt_diff fetch first 100 rows only; -- end query 3 in stream 0 using template query75.tpl -- start query 4 in stream 0 using template query44.tpl and seed 549695959 select asceding.rnk, i1.i_product_name best_performing, i2.i_product_name worst_performing from(select * from (select item_sk,rank() over (order by rank_col asc) rnk from (select ss_item_sk item_sk,avg(ss_net_profit) rank_col from store_sales ss1 where ss_store_sk = 218 group by ss_item_sk having avg(ss_net_profit) > 0.9*(select avg(ss_net_profit) rank_col from store_sales where ss_store_sk = 218 and ss_promo_sk is null group by ss_store_sk))V1)V11 where rnk < 11) asceding, (select * from (select item_sk,rank() over (order by rank_col desc) rnk from (select ss_item_sk item_sk,avg(ss_net_profit) rank_col from store_sales ss1 where ss_store_sk = 218 group by ss_item_sk having avg(ss_net_profit) > 0.9*(select avg(ss_net_profit) rank_col from store_sales where ss_store_sk = 218 and ss_promo_sk is null group by ss_store_sk))V2)V21 where rnk < 11) descending, item i1, item i2 where asceding.rnk = descending.rnk and i1.i_item_sk=asceding.item_sk and i2.i_item_sk=descending.item_sk order by asceding.rnk fetch first 100 rows only; -- end query 4 in stream 0 using template query44.tpl -- start query 5 in stream 0 using template query39.tpl and seed 547624773 with inv as (select w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy ,stdev,mean, case mean when 0 then null else stdev/mean end cov from(select w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy ,stddev_samp(inv_quantity_on_hand) stdev,avg(cast(inv_quantity_on_hand as double)) mean from inventory ,item ,warehouse ,date_dim where inv_item_sk = i_item_sk and inv_warehouse_sk = w_warehouse_sk and inv_date_sk = d_date_sk and d_year =1998 group by w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy) foo where case mean when 0 then 0 else stdev/mean end > 1) select inv1.w_warehouse_sk,inv1.i_item_sk,inv1.d_moy,inv1.mean, inv1.cov ,inv2.w_warehouse_sk,inv2.i_item_sk,inv2.d_moy,inv2.mean, inv2.cov from inv inv1,inv inv2 where inv1.i_item_sk = inv2.i_item_sk and inv1.w_warehouse_sk = inv2.w_warehouse_sk and inv1.d_moy=2 and inv2.d_moy=2+1 order by inv1.w_warehouse_sk,inv1.i_item_sk,inv1.d_moy,inv1.mean,inv1.cov ,inv2.d_moy,inv2.mean, inv2.cov ;
  • 50. Page | 50 with inv as (select w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy ,stdev,mean, case mean when 0 then null else stdev/mean end cov from(select w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy ,stddev_samp(inv_quantity_on_hand) stdev,avg(cast(inv_quantity_on_hand as double)) mean from inventory ,item ,warehouse ,date_dim where inv_item_sk = i_item_sk and inv_warehouse_sk = w_warehouse_sk and inv_date_sk = d_date_sk and d_year =1998 group by w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy) foo where case mean when 0 then 0 else stdev/mean end > 1) select inv1.w_warehouse_sk,inv1.i_item_sk,inv1.d_moy,inv1.mean, inv1.cov ,inv2.w_warehouse_sk,inv2.i_item_sk,inv2.d_moy,inv2.mean, inv2.cov from inv inv1,inv inv2 where inv1.i_item_sk = inv2.i_item_sk and inv1.w_warehouse_sk = inv2.w_warehouse_sk and inv1.d_moy=2 and inv2.d_moy=2+1 and inv1.cov > 1.5 order by inv1.w_warehouse_sk,inv1.i_item_sk,inv1.d_moy,inv1.mean,inv1.cov ,inv2.d_moy,inv2.mean, inv2.cov ; -- end query 5 in stream 0 using template query39.tpl -- start query 6 in stream 0 using template query80.tpl and seed 800632380 with ssr as (select s_store_id as store_id, sum(ss_ext_sales_price) as sales, sum(coalesce(sr_return_amt, 0)) as returns, sum(ss_net_profit - coalesce(sr_net_loss, 0)) as profit from store_sales left outer join store_returns on (ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number), date_dim, store, item, promotion where ss_sold_date_sk = d_date_sk and d_date between cast('1999-08-15' as date) and (cast('1999-08-15' as date) + 30 days) and ss_store_sk = s_store_sk and ss_item_sk = i_item_sk and i_current_price > 50 and ss_promo_sk = p_promo_sk and p_channel_tv = 'N' group by s_store_id) , csr as (select cp_catalog_page_id as catalog_page_id, sum(cs_ext_sales_price) as sales, sum(coalesce(cr_return_amount, 0)) as returns, sum(cs_net_profit - coalesce(cr_net_loss, 0)) as profit from catalog_sales left outer join catalog_returns on (cs_item_sk = cr_item_sk and cs_order_number = cr_order_number), date_dim, catalog_page, item, promotion where cs_sold_date_sk = d_date_sk and d_date between cast('1999-08-15' as date) and (cast('1999-08-15' as date) + 30 days) and cs_catalog_page_sk = cp_catalog_page_sk and cs_item_sk = i_item_sk and i_current_price > 50 and cs_promo_sk = p_promo_sk and p_channel_tv = 'N' group by cp_catalog_page_id) , wsr as (select web_site_id, sum(ws_ext_sales_price) as sales, sum(coalesce(wr_return_amt, 0)) as returns,
  • 51. Page | 51 sum(ws_net_profit - coalesce(wr_net_loss, 0)) as profit from web_sales left outer join web_returns on (ws_item_sk = wr_item_sk and ws_order_number = wr_order_number), date_dim, web_site, item, promotion where ws_sold_date_sk = d_date_sk and d_date between cast('1999-08-15' as date) and (cast('1999-08-15' as date) + 30 days) and ws_web_site_sk = web_site_sk and ws_item_sk = i_item_sk and i_current_price > 50 and ws_promo_sk = p_promo_sk and p_channel_tv = 'N' group by web_site_id) select channel , id , sum(sales) as sales , sum(returns) as returns , sum(profit) as profit from (select 'store channel' as channel , 'store' || store_id as id , sales , returns , profit from ssr union all select 'catalog channel' as channel , 'catalog_page' || catalog_page_id as id , sales , returns , profit from csr union all select 'web channel' as channel , 'web_site' || web_site_id as id , sales , returns , profit from wsr ) x group by rollup (channel, id) order by channel ,id fetch first 100 rows only; -- end query 6 in stream 0 using template query80.tpl -- start query 7 in stream 0 using template query32.tpl and seed 965672451 select sum(cs_ext_discount_amt) as "excess discount amount" from catalog_sales ,item ,date_dim where i_manufact_id = 452 and i_item_sk = cs_item_sk and d_date between '2001-03-31' and (cast('2001-03-31' as date) + 90 days) and d_date_sk = cs_sold_date_sk and cs_ext_discount_amt > ( select 1.3 * avg(cs_ext_discount_amt) from catalog_sales ,date_dim where cs_item_sk = i_item_sk and d_date between '2001-03-31' and (cast('2001-03-31' as date) + 90 days) and d_date_sk = cs_sold_date_sk ) fetch first 100 rows only; -- end query 7 in stream 0 using template query32.tpl -- start query 8 in stream 0 using template query19.tpl and seed 98764099 select i_brand_id brand_id, i_brand brand, i_manufact_id, i_manufact, sum(ss_ext_sales_price) ext_price from date_dim, store_sales, item,customer,customer_address,store where d_date_sk = ss_sold_date_sk and ss_item_sk = i_item_sk and i_manager_id=46 and d_moy=11
  • 52. Page | 52 and d_year=2002 and ss_customer_sk = c_customer_sk and c_current_addr_sk = ca_address_sk and substr(ca_zip,1,5) <> substr(s_zip,1,5) and ss_store_sk = s_store_sk group by i_brand ,i_brand_id ,i_manufact_id ,i_manufact order by ext_price desc ,i_brand ,i_brand_id ,i_manufact_id ,i_manufact fetch first 100 rows only ; -- end query 8 in stream 0 using template query19.tpl -- start query 9 in stream 0 using template query25.tpl and seed 280059134 select i_item_id ,i_item_desc ,s_store_id ,s_store_name ,stddev_samp(ss_net_profit) as store_sales_profit ,stddev_samp(sr_net_loss) as store_returns_loss ,stddev_samp(cs_net_profit) as catalog_sales_profit from store_sales ,store_returns ,catalog_sales ,date_dim d1 ,date_dim d2 ,date_dim d3 ,store ,item where d1.d_moy = 4 and d1.d_year = 2002 and d1.d_date_sk = ss_sold_date_sk and i_item_sk = ss_item_sk and s_store_sk = ss_store_sk and ss_customer_sk = sr_customer_sk and ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number and sr_returned_date_sk = d2.d_date_sk and d2.d_moy between 4 and 10 and d2.d_year = 2002 and sr_customer_sk = cs_bill_customer_sk and sr_item_sk = cs_item_sk and cs_sold_date_sk = d3.d_date_sk and d3.d_moy between 4 and 10 and d3.d_year = 2002 group by i_item_id ,i_item_desc ,s_store_id ,s_store_name order by i_item_id ,i_item_desc ,s_store_id ,s_store_name fetch first 100 rows only; -- end query 9 in stream 0 using template query25.tpl -- start query 10 in stream 0 using template query78.tpl and seed 76559093 with ws as (select d_year AS ws_sold_year, ws_item_sk, ws_bill_customer_sk ws_customer_sk, sum(ws_quantity) ws_qty, sum(ws_wholesale_cost) ws_wc, sum(ws_sales_price) ws_sp from web_sales left join web_returns on wr_order_number=ws_order_number and ws_item_sk=wr_item_sk join date_dim on ws_sold_date_sk = d_date_sk where wr_order_number is null group by d_year, ws_item_sk, ws_bill_customer_sk ), cs as (select d_year AS cs_sold_year, cs_item_sk, cs_bill_customer_sk cs_customer_sk, sum(cs_quantity) cs_qty, sum(cs_wholesale_cost) cs_wc, sum(cs_sales_price) cs_sp from catalog_sales left join catalog_returns on cr_order_number=cs_order_number and cs_item_sk=cr_item_sk
  • 53. Page | 53 join date_dim on cs_sold_date_sk = d_date_sk where cr_order_number is null group by d_year, cs_item_sk, cs_bill_customer_sk ), ss as (select d_year AS ss_sold_year, ss_item_sk, ss_customer_sk, sum(ss_quantity) ss_qty, sum(ss_wholesale_cost) ss_wc, sum(ss_sales_price) ss_sp from store_sales left join store_returns on sr_ticket_number=ss_ticket_number and ss_item_sk=sr_item_sk join date_dim on ss_sold_date_sk = d_date_sk where sr_ticket_number is null group by d_year, ss_item_sk, ss_customer_sk ) select ss_item_sk, round(cast(ss_qty as double)/cast(coalesce(ws_qty+cs_qty,1) as double),2) ratio, ss_qty store_qty, ss_wc store_wholesale_cost, ss_sp store_sales_price, coalesce(ws_qty,0)+coalesce(cs_qty,0) other_chan_qty, coalesce(ws_wc,0)+coalesce(cs_wc,0) other_chan_wholesale_cost, coalesce(ws_sp,0)+coalesce(cs_sp,0) other_chan_sales_price from ss left join ws on (ws_sold_year=ss_sold_year and ws_item_sk=ss_item_sk and ws_customer_sk=ss_customer_sk) left join cs on (cs_sold_year=ss_sold_year and cs_item_sk=cs_item_sk and cs_customer_sk=ss_customer_sk) where coalesce(ws_qty,0)>0 and coalesce(cs_qty, 0)>0 and ss_sold_year=2001 order by ss_item_sk, ss_qty desc, ss_wc desc, ss_sp desc, other_chan_qty, other_chan_wholesale_cost, other_chan_sales_price, round(ss_qty/(coalesce(ws_qty+cs_qty,1)),2) fetch first 100 rows only; -- end query 10 in stream 0 using template query78.tpl -- start query 11 in stream 0 using template query86.tpl and seed 1622352946 select sum(ws_net_paid) as total_sum ,i_category ,i_class ,grouping(i_category)+grouping(i_class) as lochierarchy ,rank() over ( partition by grouping(i_category)+grouping(i_class), case when grouping(i_class) = 0 then i_category end order by sum(ws_net_paid) desc) as rank_within_parent from web_sales ,date_dim d1 ,item where d1.d_month_seq between 1193 and 1193+11 and d1.d_date_sk = ws_sold_date_sk and i_item_sk = ws_item_sk group by rollup(i_category,i_class) order by lochierarchy desc, case when lochierarchy = 0 then i_category end, rank_within_parent fetch first 100 rows only; -- end query 11 in stream 0 using template query86.tpl -- start query 12 in stream 0 using template query1.tpl and seed 1023042618 with customer_total_return as (select sr_customer_sk as ctr_customer_sk ,sr_store_sk as ctr_store_sk ,sum(SR_FEE) as ctr_total_return from store_returns ,date_dim where sr_returned_date_sk = d_date_sk and d_year =2001 group by sr_customer_sk ,sr_store_sk) select c_customer_id from customer_total_return ctr1 ,store ,customer
  • 54. Page | 54 where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2 from customer_total_return ctr2 where ctr1.ctr_store_sk = ctr2.ctr_store_sk) and s_store_sk = ctr1.ctr_store_sk and s_state = 'CA' and ctr1.ctr_customer_sk = c_customer_sk order by c_customer_id fetch first 100 rows only; -- end query 12 in stream 0 using template query1.tpl -- start query 13 in stream 0 using template query91.tpl and seed 1522589387 select cc_call_center_id Call_Center, cc_name Call_Center_Name, cc_manager Manager, sum(cr_net_loss) Returns_Loss from call_center, catalog_returns, date_dim, customer, customer_address, customer_demographics, household_demographics where cr_call_center_sk = cc_call_center_sk and cr_returned_date_sk = d_date_sk and cr_returning_customer_sk= c_customer_sk and cd_demo_sk = c_current_cdemo_sk and hd_demo_sk = c_current_hdemo_sk and ca_address_sk = c_current_addr_sk and d_year = 2000 and d_moy = 11 and ( (cd_marital_status = 'M' and cd_education_status = 'Unknown') or(cd_marital_status = 'W' and cd_education_status = 'Advanced Degree')) and hd_buy_potential like '5001-10000%' and ca_gmt_offset = -6 group by cc_call_center_id,cc_name,cc_manager,cd_marital_status,cd_education_status order by sum(cr_net_loss) desc; -- end query 13 in stream 0 using template query91.tpl -- start query 14 in stream 0 using template query21.tpl and seed 464370483 select * from(select w_warehouse_name ,i_item_id ,sum(case when (cast(d_date as date) < cast ('2000-06-11' as date)) then inv_quantity_on_hand else 0 end) as inv_before ,sum(case when (cast(d_date as date) >= cast ('2000-06-11' as date)) then inv_quantity_on_hand else 0 end) as inv_after from inventory ,warehouse ,item ,date_dim where i_current_price between 0.99 and 1.49 and i_item_sk = inv_item_sk and inv_warehouse_sk = w_warehouse_sk and inv_date_sk = d_date_sk and d_date between (cast ('2000-06-11' as date) - 30 days) and (cast ('2000-06-11' as date) + 30 days) group by w_warehouse_name, i_item_id) x where (case when inv_before > 0 then cast(inv_after as double) / cast(inv_before as double) else null end) between 2.0/3.0 and 3.0/2.0 order by w_warehouse_name ,i_item_id fetch first 100 rows only; -- end query 14 in stream 0 using template query21.tpl -- start query 15 in stream 0 using template query43.tpl and seed 456971165 select s_store_name, s_store_id, sum(case when (d_day_name='Sunday') then ss_sales_price else null end) sun_sales, sum(case when (d_day_name='Monday') then ss_sales_price else null end) mon_sales, sum(case when (d_day_name='Tuesday') then ss_sales_price else null end) tue_sales, sum(case when (d_day_name='Wednesday') then ss_sales_price else null end) wed_sales, sum(case when (d_day_name='Thursday') then ss_sales_price else null end) thu_sales,
  • 55. Page | 55 sum(case when (d_day_name='Friday') then ss_sales_price else null end) fri_sales, sum(case when (d_day_name='Saturday') then ss_sales_price else null end) sat_sales from date_dim, store_sales, store where d_date_sk = ss_sold_date_sk and s_store_sk = ss_store_sk and s_gmt_offset = -6 and d_year = 2001 group by s_store_name, s_store_id order by s_store_name, s_store_id,sun_sales,mon_sales,tue_sales,wed_sales,thu_sales,fri_sales,sat_sales fetch first 100 rows only; -- end query 15 in stream 0 using template query43.tpl -- start query 16 in stream 0 using template query27.tpl and seed 1310184181 select i_item_id, s_state, grouping(s_state) g_state, avg(cast(ss_quantity as double)) agg1, avg(ss_list_price) agg2, avg(ss_coupon_amt) agg3, avg(ss_sales_price) agg4 from store_sales, customer_demographics, date_dim, store, item where ss_sold_date_sk = d_date_sk and ss_item_sk = i_item_sk and ss_store_sk = s_store_sk and ss_cdemo_sk = cd_demo_sk and cd_gender = 'M' and cd_marital_status = 'U' and cd_education_status = '2 yr Degree' and d_year = 1999 and s_state in ('KS','MI', 'TX', 'SC', 'MN', 'WV') group by rollup (i_item_id, s_state) order by i_item_id ,s_state fetch first 100 rows only; -- end query 16 in stream 0 using template query27.tpl -- start query 17 in stream 0 using template query94.tpl and seed 471030633 select count(distinct ws_order_number) as "order count" ,sum(ws_ext_ship_cost) as "total shipping cost" ,sum(ws_net_profit) as "total net profit" from web_sales ws1 ,date_dim ,customer_address ,web_site where d_date between '2000-4-01' and (cast('2000-4-01' as date) + 60 days) and ws1.ws_ship_date_sk = d_date_sk and ws1.ws_ship_addr_sk = ca_address_sk and ca_state = 'OR' and ws1.ws_web_site_sk = web_site_sk and web_company_name = 'pri' and exists (select * from web_sales ws2 where ws1.ws_order_number = ws2.ws_order_number and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk) and not exists(select * from web_returns wr1 where ws1.ws_order_number = wr1.wr_order_number) order by count(distinct ws_order_number) fetch first 100 rows only; -- end query 17 in stream 0 using template query94.tpl -- start query 18 in stream 0 using template query45.tpl and seed 454612518 select ca_zip, ca_city, sum(ws_sales_price) from web_sales, customer, customer_address, date_dim, item where ws_bill_customer_sk = c_customer_sk and c_current_addr_sk = ca_address_sk and ws_item_sk = i_item_sk and ( substr(ca_zip,1,5) in ('85669', '86197','88274','83405','86475', '85392', '85460', '80348', '81792') or i_item_id in (select i_item_id from item where i_item_sk in (2, 3, 5, 7, 11, 13, 17, 19, 23, 29) ) ) and ws_sold_date_sk = d_date_sk and d_qoy = 2 and d_year = 2001 group by ca_zip, ca_city order by ca_zip, ca_city
  • 56. Page | 56 fetch first 100 rows only; -- end query 18 in stream 0 using template query45.tpl -- start query 19 in stream 0 using template query58.tpl and seed 171616907 with ss_items as (select i_item_id item_id ,sum(ss_ext_sales_price) ss_item_rev from store_sales ,item ,date_dim where ss_item_sk = i_item_sk and d_date in (select d_date from date_dim where d_week_seq = (select d_week_seq from date_dim where d_date = '2000-07-12')) and ss_sold_date_sk = d_date_sk group by i_item_id), cs_items as (select i_item_id item_id ,sum(cs_ext_sales_price) cs_item_rev from catalog_sales ,item ,date_dim where cs_item_sk = i_item_sk and d_date in (select d_date from date_dim where d_week_seq = (select d_week_seq from date_dim where d_date = '2000-07-12')) and cs_sold_date_sk = d_date_sk group by i_item_id), ws_items as (select i_item_id item_id ,sum(ws_ext_sales_price) ws_item_rev from web_sales ,item ,date_dim where ws_item_sk = i_item_sk and d_date in (select d_date from date_dim where d_week_seq =(select d_week_seq from date_dim where d_date = '2000-07-12')) and ws_sold_date_sk = d_date_sk group by i_item_id) select ss_items.item_id ,ss_item_rev ,ss_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 ss_dev ,cs_item_rev ,cs_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 cs_dev ,ws_item_rev ,ws_item_rev/(ss_item_rev+cs_item_rev+ws_item_rev)/3 * 100 ws_dev ,(ss_item_rev+cs_item_rev+ws_item_rev)/3 average from ss_items,cs_items,ws_items where ss_items.item_id=cs_items.item_id and ss_items.item_id=ws_items.item_id and ss_item_rev between 0.9 * cs_item_rev and 1.1 * cs_item_rev and ss_item_rev between 0.9 * ws_item_rev and 1.1 * ws_item_rev and cs_item_rev between 0.9 * ss_item_rev and 1.1 * ss_item_rev and cs_item_rev between 0.9 * ws_item_rev and 1.1 * ws_item_rev and ws_item_rev between 0.9 * ss_item_rev and 1.1 * ss_item_rev and ws_item_rev between 0.9 * cs_item_rev and 1.1 * cs_item_rev order by item_id ,ss_item_rev fetch first 100 rows only; -- end query 19 in stream 0 using template query58.tpl -- start query 20 in stream 0 using template query64.tpl and seed 479369059 with cs_ui as (select cs_item_sk ,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund from catalog_sales ,catalog_returns where cs_item_sk = cr_item_sk and cs_order_number = cr_order_number group by cs_item_sk
  • 57. Page | 57 having sum(cs_ext_list_price)>2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)), cross_sales as (select i_product_name product_name ,i_item_sk item_sk ,s_store_name store_name ,s_zip store_zip ,ad1.ca_street_number b_street_number ,ad1.ca_street_name b_streen_name ,ad1.ca_city b_city ,ad1.ca_zip b_zip ,ad2.ca_street_number c_street_number ,ad2.ca_street_name c_street_name ,ad2.ca_city c_city ,ad2.ca_zip c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year s2year ,count(*) cnt ,sum(ss_wholesale_cost) s1 ,sum(ss_list_price) s2 ,sum(ss_coupon_amt) s3 FROM store_sales ,store_returns ,cs_ui ,date_dim d1 ,date_dim d2 ,date_dim d3 ,store ,customer ,customer_demographics cd1 ,customer_demographics cd2 ,promotion ,household_demographics hd1 ,household_demographics hd2 ,customer_address ad1 ,customer_address ad2 ,income_band ib1 ,income_band ib2 ,item WHERE ss_store_sk = s_store_sk AND ss_sold_date_sk = d1.d_date_sk AND ss_customer_sk = c_customer_sk AND ss_cdemo_sk= cd1.cd_demo_sk AND ss_hdemo_sk = hd1.hd_demo_sk AND ss_addr_sk = ad1.ca_address_sk and ss_item_sk = i_item_sk and ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number and ss_item_sk = cs_ui.cs_item_sk and c_current_cdemo_sk = cd2.cd_demo_sk AND c_current_hdemo_sk = hd2.hd_demo_sk AND c_current_addr_sk = ad2.ca_address_sk and c_first_sales_date_sk = d2.d_date_sk and c_first_shipto_date_sk = d3.d_date_sk and ss_promo_sk = p_promo_sk and hd1.hd_income_band_sk = ib1.ib_income_band_sk and hd2.hd_income_band_sk = ib2.ib_income_band_sk and cd1.cd_marital_status <> cd2.cd_marital_status and i_color in ('pink','turquoise','peach','powder','floral','bisque') and i_current_price between 51 and 51 + 10 and i_current_price between 51 + 1 and 51 + 15 group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number ,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number ,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year ,d3.d_year ) select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number ,cs1.b_streen_name ,cs1.b_city ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city
  • 58. Page | 58 ,cs1.c_zip ,cs1.syear ,cs1.cnt ,cs1.s1 ,cs1.s2 ,cs1.s3 ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt from cross_sales cs1,cross_sales cs2 where cs1.item_sk=cs2.item_sk and cs1.syear = 1999 and cs2.syear = 1999 + 1 and cs2.cnt <= cs1.cnt and cs1.store_name = cs2.store_name and cs1.store_zip = cs2.store_zip order by cs1.product_name ,cs1.store_name ,cs2.cnt; -- end query 20 in stream 0 using template query64.tpl -- start query 21 in stream 0 using template query36.tpl and seed 636400895 select sum(ss_net_profit)/sum(ss_ext_sales_price) as gross_margin ,i_category ,i_class ,grouping(i_category)+grouping(i_class) as lochierarchy ,rank() over ( partition by grouping(i_category)+grouping(i_class), case when grouping(i_class) = 0 then i_category end order by sum(ss_net_profit)/sum(ss_ext_sales_price) asc) as rank_within_parent from store_sales ,date_dim d1 ,item ,store where d1.d_year = 1998 and d1.d_date_sk = ss_sold_date_sk and i_item_sk = ss_item_sk and s_store_sk = ss_store_sk and s_state in ('GA','LA','MI','MO', 'FL','OK','KS','WV') group by rollup(i_category,i_class) order by lochierarchy desc ,case when lochierarchy = 0 then i_category end ,rank_within_parent fetch first 100 rows only; -- end query 21 in stream 0 using template query36.tpl -- start query 22 in stream 0 using template query33.tpl and seed 961967872 with ss as ( select i_manufact_id,sum(ss_ext_sales_price) total_sales from store_sales, date_dim, customer_address, item where i_manufact_id in (select i_manufact_id from item where i_category in ('Books')) and ss_item_sk = i_item_sk and ss_sold_date_sk = d_date_sk and d_year = 2000 and d_moy = 4 and ss_addr_sk = ca_address_sk and ca_gmt_offset = -6 group by i_manufact_id), cs as ( select i_manufact_id,sum(cs_ext_sales_price) total_sales from catalog_sales, date_dim, customer_address, item where i_manufact_id in (select
  • 59. Page | 59 i_manufact_id from item where i_category in ('Books')) and cs_item_sk = i_item_sk and cs_sold_date_sk = d_date_sk and d_year = 2000 and d_moy = 4 and cs_bill_addr_sk = ca_address_sk and ca_gmt_offset = -6 group by i_manufact_id), ws as ( select i_manufact_id,sum(ws_ext_sales_price) total_sales from web_sales, date_dim, customer_address, item where i_manufact_id in (select i_manufact_id from item where i_category in ('Books')) and ws_item_sk = i_item_sk and ws_sold_date_sk = d_date_sk and d_year = 2000 and d_moy = 4 and ws_bill_addr_sk = ca_address_sk and ca_gmt_offset = -6 group by i_manufact_id) select i_manufact_id ,sum(total_sales) total_sales from (select * from ss union all select * from cs union all select * from ws) tmp1 group by i_manufact_id order by total_sales fetch first 100 rows only; -- end query 22 in stream 0 using template query33.tpl -- start query 23 in stream 0 using template query46.tpl and seed 473844672 select c_last_name ,c_first_name ,ca_city ,bought_city ,ss_ticket_number ,amt,profit from (select ss_ticket_number ,ss_customer_sk ,ca_city bought_city ,sum(ss_coupon_amt) amt ,sum(ss_net_profit) profit from store_sales,date_dim,store,household_demographics, customer_address where store_sales.ss_sold_date_sk = date_dim.d_date_sk and store_sales.ss_store_sk = store.s_store_sk and store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk and store_sales.ss_addr_sk = customer_address.ca_address_sk and (household_demographics.hd_dep_count = 2 or household_demographics.hd_vehicle_count= 0) and date_dim.d_dow in (6,0) and date_dim.d_year in (1999,1999+1,1999+2) and store.s_city in ('Needmore','Henderson','Washington','Monroe','Hamilton') group by ss_ticket_number,ss_customer_sk,ss_addr_sk,ca_city) dn,customer,customer_address current_addr where ss_customer_sk = c_customer_sk and customer.c_current_addr_sk = current_addr.ca_address_sk and current_addr.ca_city <> bought_city order by c_last_name ,c_first_name ,ca_city ,bought_city ,ss_ticket_number fetch first 100 rows only; -- end query 23 in stream 0 using template query46.tpl -- start query 24 in stream 0 using template query62.tpl and seed 168156768 select substr(w_warehouse_name,1,20) ,sm_type
  • 60. Page | 60 ,web_name ,sum(case when (ws_ship_date_sk - ws_sold_date_sk <= 30 ) then 1 else 0 end) as "30 days" ,sum(case when (ws_ship_date_sk - ws_sold_date_sk > 30) and (ws_ship_date_sk - ws_sold_date_sk <= 60) then 1 else 0 end ) as "31-60 days" ,sum(case when (ws_ship_date_sk - ws_sold_date_sk > 60) and (ws_ship_date_sk - ws_sold_date_sk <= 90) then 1 else 0 end) as "61- 90 days" ,sum(case when (ws_ship_date_sk - ws_sold_date_sk > 90) and (ws_ship_date_sk - ws_sold_date_sk <= 120) then 1 else 0 end) as "91-120 days" ,sum(case when (ws_ship_date_sk - ws_sold_date_sk > 120) then 1 else 0 end) as ">120 days" from web_sales ,warehouse ,ship_mode ,web_site ,date_dim where d_month_seq between 1178 and 1178 + 11 and ws_ship_date_sk = d_date_sk and ws_warehouse_sk = w_warehouse_sk and ws_ship_mode_sk = sm_ship_mode_sk and ws_web_site_sk = web_site_sk group by substr(w_warehouse_name,1,20) ,sm_type ,web_name order by substr(w_warehouse_name,1,20) ,sm_type ,web_name fetch first 100 rows only; -- end query 24 in stream 0 using template query62.tpl -- start query 25 in stream 0 using template query16.tpl and seed 357537407 select count(distinct cs_order_number) as "order count" ,sum(cs_ext_ship_cost) as "total shipping cost" ,sum(cs_net_profit) as "total net profit" from catalog_sales cs1 ,date_dim ,customer_address ,call_center where d_date between '2002-4-01' and (cast('2002-4-01' as date) + 60 days) and cs1.cs_ship_date_sk = d_date_sk and cs1.cs_ship_addr_sk = ca_address_sk and ca_state = 'TX' and cs1.cs_call_center_sk = cc_call_center_sk and cc_county in ('Franklin Parish','Levy County','Sierra County','Gage County', 'Gogebic County' ) and exists (select * from catalog_sales cs2 where cs1.cs_order_number = cs2.cs_order_number and cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk) and not exists(select * from catalog_returns cr1 where cs1.cs_order_number = cr1.cr_order_number) order by count(distinct cs_order_number) fetch first 100 rows only; -- end query 25 in stream 0 using template query16.tpl -- start query 26 in stream 0 using template query10.tpl and seed 772495384 select cd_gender, cd_marital_status, cd_education_status, count(*) cnt1, cd_purchase_estimate, count(*) cnt2, cd_credit_rating, count(*) cnt3, cd_dep_count, count(*) cnt4, cd_dep_employed_count, count(*) cnt5, cd_dep_college_count, count(*) cnt6 from
  • 61. Page | 61 customer c,customer_address ca,customer_demographics where c.c_current_addr_sk = ca.ca_address_sk and ca_county in ('Knox County','Twin Falls County','Chautauqua County','Trousdale County','Lawrence County') and cd_demo_sk = c.c_current_cdemo_sk and exists (select * from store_sales,date_dim where c.c_customer_sk = ss_customer_sk and ss_sold_date_sk = d_date_sk and d_year = 2002 and d_moy between 3 and 3+3) and (exists (select * from web_sales,date_dim where c.c_customer_sk = ws_bill_customer_sk and ws_sold_date_sk = d_date_sk and d_year = 2002 and d_moy between 3 ANd 3+3) or exists (select * from catalog_sales,date_dim where c.c_customer_sk = cs_ship_customer_sk and cs_sold_date_sk = d_date_sk and d_year = 2002 and d_moy between 3 and 3+3)) group by cd_gender, cd_marital_status, cd_education_status, cd_purchase_estimate, cd_credit_rating, cd_dep_count, cd_dep_employed_count, cd_dep_college_count order by cd_gender, cd_marital_status, cd_education_status, cd_purchase_estimate, cd_credit_rating, cd_dep_count, cd_dep_employed_count, cd_dep_college_count fetch first 100 rows only; -- end query 26 in stream 0 using template query10.tpl -- start query 27 in stream 0 using template query63.tpl and seed 348248518 select * from (select i_manager_id ,sum(ss_sales_price) sum_sales ,avg(sum(ss_sales_price)) over (partition by i_manager_id) avg_monthly_sales from item ,store_sales ,date_dim ,store where ss_item_sk = i_item_sk and ss_sold_date_sk = d_date_sk and ss_store_sk = s_store_sk and d_month_seq in (1186,1186+1,1186+2,1186+3,1186+4,1186+5,1186+6,1186+7,1186+8,1186+9,1186+10,1186+11) and (( i_category in ('Books','Children','Electronics') and i_class in ('personal','portable','refernece','self-help') and i_brand in ('scholaramalgamalg #14','scholaramalgamalg #7', 'exportiunivamalg #9','scholaramalgamalg #9')) or( i_category in ('Women','Music','Men') and i_class in ('accessories','classical','fragrances','pants') and i_brand in ('amalgimporto #1','edu packscholar #1','exportiimporto #1', 'importoamalg #1'))) group by i_manager_id, d_moy) tmp1 where case when avg_monthly_sales > 0 then abs (sum_sales - avg_monthly_sales) / avg_monthly_sales else null end > 0.1 order by i_manager_id ,avg_monthly_sales ,sum_sales fetch first 100 rows only; -- end query 27 in stream 0 using template query63.tpl -- start query 28 in stream 0 using template query69.tpl and seed 1875439018 select cd_gender, cd_marital_status, cd_education_status, count(*) cnt1, cd_purchase_estimate, count(*) cnt2,
  • 62. Page | 62 cd_credit_rating, count(*) cnt3 from customer c,customer_address ca,customer_demographics where c.c_current_addr_sk = ca.ca_address_sk and ca_state in ('VA','PA','LA') and cd_demo_sk = c.c_current_cdemo_sk and exists (select * from store_sales,date_dim where c.c_customer_sk = ss_customer_sk and ss_sold_date_sk = d_date_sk and d_year = 2004 and d_moy between 3 and 3+2) and (not exists (select * from web_sales,date_dim where c.c_customer_sk = ws_bill_customer_sk and ws_sold_date_sk = d_date_sk and d_year = 2004 and d_moy between 3 and 3+2) and not exists (select * from catalog_sales,date_dim where c.c_customer_sk = cs_ship_customer_sk and cs_sold_date_sk = d_date_sk and d_year = 2004 and d_moy between 3 and 3+2)) group by cd_gender, cd_marital_status, cd_education_status, cd_purchase_estimate, cd_credit_rating order by cd_gender, cd_marital_status, cd_education_status, cd_purchase_estimate, cd_credit_rating fetch first 100 rows only; -- end query 28 in stream 0 using template query69.tpl -- start query 29 in stream 0 using template query60.tpl and seed 43130453 with ss as ( select i_item_id,sum(ss_ext_sales_price) total_sales from store_sales, date_dim, customer_address, item where i_item_id in (select i_item_id from item where i_category in ('Music')) and ss_item_sk = i_item_sk and ss_sold_date_sk = d_date_sk and d_year = 2001 and d_moy = 10 and ss_addr_sk = ca_address_sk and ca_gmt_offset = -6 group by i_item_id), cs as ( select i_item_id,sum(cs_ext_sales_price) total_sales from catalog_sales, date_dim, customer_address, item where i_item_id in (select i_item_id from item where i_category in ('Music')) and cs_item_sk = i_item_sk and cs_sold_date_sk = d_date_sk and d_year = 2001 and d_moy = 10 and cs_bill_addr_sk = ca_address_sk and ca_gmt_offset = -6 group by i_item_id), ws as ( select i_item_id,sum(ws_ext_sales_price) total_sales from
  • 63. Page | 63 web_sales, date_dim, customer_address, item where i_item_id in (select i_item_id from item where i_category in ('Music')) and ws_item_sk = i_item_sk and ws_sold_date_sk = d_date_sk and d_year = 2001 and d_moy = 10 and ws_bill_addr_sk = ca_address_sk and ca_gmt_offset = -6 group by i_item_id) select i_item_id ,sum(total_sales) total_sales from (select * from ss union all select * from cs union all select * from ws) tmp1 group by i_item_id order by i_item_id ,total_sales fetch first 100 rows only; -- end query 29 in stream 0 using template query60.tpl -- start query 30 in stream 0 using template query59.tpl and seed 835871049 with wss as (select d_week_seq, ss_store_sk, sum(case when (d_day_name='Sunday') then ss_sales_price else null end) sun_sales, sum(case when (d_day_name='Monday') then ss_sales_price else null end) mon_sales, sum(case when (d_day_name='Tuesday') then ss_sales_price else null end) tue_sales, sum(case when (d_day_name='Wednesday') then ss_sales_price else null end) wed_sales, sum(case when (d_day_name='Thursday') then ss_sales_price else null end) thu_sales, sum(case when (d_day_name='Friday') then ss_sales_price else null end) fri_sales, sum(case when (d_day_name='Saturday') then ss_sales_price else null end) sat_sales from store_sales,date_dim where d_date_sk = ss_sold_date_sk group by d_week_seq,ss_store_sk ) select s_store_name1,s_store_id1,d_week_seq1 ,sun_sales1/sun_sales2,mon_sales1/mon_sales2 ,tue_sales1/tue_sales1,wed_sales1/wed_sales2,thu_sales1/thu_sales2 ,fri_sales1/fri_sales2,sat_sales1/sat_sales2 from (select s_store_name s_store_name1,wss.d_week_seq d_week_seq1 ,s_store_id s_store_id1,sun_sales sun_sales1 ,mon_sales mon_sales1,tue_sales tue_sales1 ,wed_sales wed_sales1,thu_sales thu_sales1 ,fri_sales fri_sales1,sat_sales sat_sales1 from wss,store,date_dim d where d.d_week_seq = wss.d_week_seq and ss_store_sk = s_store_sk and d_month_seq between 1176 and 1176 + 11) y, (select s_store_name s_store_name2,wss.d_week_seq d_week_seq2 ,s_store_id s_store_id2,sun_sales sun_sales2 ,mon_sales mon_sales2,tue_sales tue_sales2 ,wed_sales wed_sales2,thu_sales thu_sales2 ,fri_sales fri_sales2,sat_sales sat_sales2 from wss,store,date_dim d where d.d_week_seq = wss.d_week_seq and ss_store_sk = s_store_sk and d_month_seq between 1176+ 12 and 1176 + 23) x where s_store_id1=s_store_id2 and d_week_seq1=d_week_seq2-52 order by s_store_name1,s_store_id1,d_week_seq1 fetch first 100 rows only; -- end query 30 in stream 0 using template query59.tpl -- start query 31 in stream 0 using template query37.tpl and seed 606286915 select i_item_id ,i_item_desc ,i_current_price from item, inventory, date_dim, catalog_sales where i_current_price between 29 and 29 + 30
  • 64. Page | 64 and inv_item_sk = i_item_sk and d_date_sk=inv_date_sk and d_date between cast('1999-02-22' as date) and (cast('1999-02-22' as date) + 60 days) and i_manufact_id in (701,681,796,684) and inv_quantity_on_hand between 100 and 500 and cs_item_sk = i_item_sk group by i_item_id,i_item_desc,i_current_price order by i_item_id fetch first 100 rows only; -- end query 31 in stream 0 using template query37.tpl -- start query 32 in stream 0 using template query98.tpl and seed 1764313172 select i_item_desc ,i_category ,i_class ,i_current_price ,sum(ss_ext_sales_price) as itemrevenue ,sum(ss_ext_sales_price)*100/sum(sum(ss_ext_ sales_price)) over (partition by i_class) as revenueratio from store_sales ,item ,date_dim where ss_item_sk = i_item_sk and i_category in ('Women', 'Shoes', 'Jewelry') and ss_sold_date_sk = d_date_sk and d_date between cast('2001-02-15' as date) and (cast('2001-02- 15' as date) + 30 days) group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio; -- end query 32 in stream 0 using template query98.tpl -- start query 33 in stream 0 using template query85.tpl and seed 1218061953 select substr(r_reason_desc,1,20) ,avg(ws_quantity) ,avg(wr_refunded_cash) ,avg(wr_fee) from web_sales, web_returns, web_page, customer_demographics cd1, customer_demographics cd2, customer_address, date_dim, reason where ws_web_page_sk = wp_web_page_sk and ws_item_sk = wr_item_sk and ws_order_number = wr_order_number and ws_sold_date_sk = d_date_sk and d_year = 2001 and cd1.cd_demo_sk = wr_refunded_cdemo_sk and cd2.cd_demo_sk = wr_returning_cdemo_sk and ca_address_sk = wr_refunded_addr_sk and r_reason_sk = wr_reason_sk and ( ( cd1.cd_marital_status = 'U' and cd1.cd_marital_status = cd2.cd_marital_status and cd1.cd_education_status = '2 yr Degree' and cd1.cd_education_status = cd2.cd_education_status and ws_sales_price between 100.00 and 150.00 ) or ( cd1.cd_marital_status = 'W' and cd1.cd_marital_status = cd2.cd_marital_status and cd1.cd_education_status = 'College' and cd1.cd_education_status = cd2.cd_education_status and ws_sales_price between 50.00 and 100.00 ) or
  • 65. Page | 65 ( cd1.cd_marital_status = 'S' and cd1.cd_marital_status = cd2.cd_marital_status and cd1.cd_education_status = 'Secondary' and cd1.cd_education_status = cd2.cd_education_status and ws_sales_price between 150.00 and 200.00 ) ) and ( ( ca_country = 'United States' and ca_state in ('VA', 'KS', 'ND') and ws_net_profit between 100 and 200 ) or ( ca_country = 'United States' and ca_state in ('SD', 'KY', 'GA') and ws_net_profit between 150 and 300 ) or ( ca_country = 'United States' and ca_state in ('TX', 'AL', 'WA') and ws_net_profit between 50 and 250 ) ) group by r_reason_desc order by substr(r_reason_desc,1,20) ,avg(ws_quantity) ,avg(wr_refunded_cash) ,avg(wr_fee) fetch first 100 rows only; -- end query 33 in stream 0 using template query85.tpl -- start query 34 in stream 0 using template query70.tpl and seed 255476072 select sum(ss_net_profit) as total_sum ,s_state ,s_county ,grouping(s_state)+grouping(s_county) as lochierarchy ,rank() over ( partition by grouping(s_state)+grouping(s_county), case when grouping(s_county) = 0 then s_state end order by sum(ss_net_profit) desc) as rank_within_parent from store_sales ,date_dim d1 ,store where d1.d_month_seq between 1191 and 1191+11 and d1.d_date_sk = ss_sold_date_sk and s_store_sk = ss_store_sk and s_state in ( select s_state from (select s_state as s_state, rank() over ( partition by s_state order by sum(ss_net_profit) desc) as ranking from store_sales, store, date_dim where d_month_seq between 1191 and 1191+11 and d_date_sk = ss_sold_date_sk and s_store_sk = ss_store_sk group by s_state ) tmp1 where ranking <= 5 ) group by rollup(s_state,s_county) order by lochierarchy desc ,case when lochierarchy = 0 then s_state end ,rank_within_parent fetch first 100 rows only; -- end query 34 in stream 0 using template query70.tpl -- start query 35 in stream 0 using template query67.tpl and seed 932833149 select *
  • 66. Page | 66 from (select i_category ,i_class ,i_brand ,i_product_name ,d_year ,d_qoy ,d_moy ,s_store_id ,sumsales ,rank() over (partition by i_category order by sumsales desc) rk from (select i_category ,i_class ,i_brand ,i_product_name ,d_year ,d_qoy ,d_moy ,s_store_id ,sum(coalesce(ss_sales_price*ss_ quantity,0)) sumsales from store_sales ,date_dim ,store ,item where ss_sold_date_sk=d_date_sk and ss_item_sk=i_item_sk and ss_store_sk = s_store_sk and d_month_seq between 1214 and 1214+11 group by rollup(i_category, i_class, i_brand, i_product_name, d_year, d_qoy, d_moy,s_store_id))dw1) dw2 where rk <= 100 order by i_category ,i_class ,i_brand ,i_product_name ,d_year ,d_qoy ,d_moy ,s_store_id ,sumsales ,rk fetch first 100 rows only; -- end query 35 in stream 0 using template query67.tpl -- start query 36 in stream 0 using template query28.tpl and seed 1281706826 select * from (select avg(ss_list_price) B1_LP ,count(ss_list_price) B1_CNT ,count(distinct ss_list_price) B1_CNTD from store_sales where ss_quantity between 0 and 5 and (ss_list_price between 57 and 57+10 or ss_coupon_amt between 11751 and 11751+1000 or ss_wholesale_cost between 4 and 4+20)) B1, (select avg(ss_list_price) B2_LP ,count(ss_list_price) B2_CNT ,count(distinct ss_list_price) B2_CNTD from store_sales where ss_quantity between 6 and 10 and (ss_list_price between 166 and 166+10 or ss_coupon_amt between 7402 and 7402+1000 or ss_wholesale_cost between 38 and 38+20)) B2, (select avg(ss_list_price) B3_LP ,count(ss_list_price) B3_CNT ,count(distinct ss_list_price) B3_CNTD from store_sales where ss_quantity between 11 and 15 and (ss_list_price between 8 and 8+10 or ss_coupon_amt between 5854 and 5854+1000 or ss_wholesale_cost between 67 and 67+20)) B3, (select avg(ss_list_price) B4_LP ,count(ss_list_price) B4_CNT ,count(distinct ss_list_price) B4_CNTD from store_sales where ss_quantity between 16 and 20 and (ss_list_price between 103 and 103+10 or ss_coupon_amt between 5165 and 5165+1000 or ss_wholesale_cost between 14 and 14+20)) B4, (select avg(ss_list_price) B5_LP ,count(ss_list_price) B5_CNT ,count(distinct ss_list_price) B5_CNTD from store_sales where ss_quantity between 21 and 25 and (ss_list_price between 137 and 137+10 or ss_coupon_amt between 12978 and 12978+1000
  • 67. Page | 67 or ss_wholesale_cost between 30 and 30+20)) B5, (select avg(ss_list_price) B6_LP ,count(ss_list_price) B6_CNT ,count(distinct ss_list_price) B6_CNTD from store_sales where ss_quantity between 26 and 30 and (ss_list_price between 10 and 10+10 or ss_coupon_amt between 5270 and 5270+1000 or ss_wholesale_cost between 79 and 79+20)) B6 fetch first 100 rows only; -- end query 36 in stream 0 using template query28.tpl -- start query 37 in stream 0 using template query81.tpl and seed 2018716314 with customer_total_return as (select cr_returning_customer_sk as ctr_customer_sk ,ca_state as ctr_state, sum(cr_return_amt_inc_tax) as ctr_total_return from catalog_returns ,date_dim ,customer_address where cr_returned_date_sk = d_date_sk and d_year =1998 and cr_returning_addr_sk = ca_address_sk group by cr_returning_customer_sk ,ca_state ) select c_customer_id,c_salutation,c_first_name,c_last_name,ca_street_number,ca_street_name ,ca_street_type,ca_suite_number,ca_city,ca_county,ca_state,ca_zip,ca_country,ca_gmt_offset ,ca_location_type,ctr_total_return from customer_total_return ctr1 ,customer_address ,customer where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2 from customer_total_return ctr2 where ctr1.ctr_state = ctr2.ctr_state) and ca_address_sk = c_current_addr_sk and ca_state = 'GA' and ctr1.ctr_customer_sk = c_customer_sk order by c_customer_id,c_salutation,c_first_name,c_last_name,ca_street_number,ca_street_name ,ca_street_type,ca_suite_number,ca_city,ca_county,ca_state,ca_zip,ca_country,ca_gmt_offset ,ca_location_type,ctr_total_return fetch first 100 rows only; -- end query 37 in stream 0 using template query81.tpl -- start query 38 in stream 0 using template query97.tpl and seed 1786889920 with ssci as ( select ss_customer_sk customer_sk ,ss_item_sk item_sk from store_sales,date_dim where ss_sold_date_sk = d_date_sk and d_month_seq between 1204 and 1204 + 11 group by ss_customer_sk ,ss_item_sk), csci as( select cs_bill_customer_sk customer_sk ,cs_item_sk item_sk from catalog_sales,date_dim where cs_sold_date_sk = d_date_sk and d_month_seq between 1204 and 1204 + 11 group by cs_bill_customer_sk ,cs_item_sk) select sum(case when ssci.customer_sk is not null and csci.customer_sk is null then cast(1 as bigint) else cast(0 as bigint) end) store_only ,sum(case when ssci.customer_sk is null and csci.customer_sk is not null then cast(1 as bigint) else cast(0 as bigint) end) catalog_only ,sum(case when ssci.customer_sk is not null and csci.customer_sk is not null then cast(1 as bigint) else cast(0 as bigint) end) store_and_catalog from ssci full outer join csci on (ssci.customer_sk=csci.customer_sk and ssci.item_sk = csci.item_sk) fetch first 100 rows only; -- end query 38 in stream 0 using template query97.tpl -- start query 39 in stream 0 using template query66.tpl and seed 486724873 select w_warehouse_name ,w_warehouse_sq_ft
  • 68. Page | 68 ,w_city ,w_county ,w_state ,w_country ,ship_carriers ,year ,sum(jan_sales) as jan_sales ,sum(feb_sales) as feb_sales ,sum(mar_sales) as mar_sales ,sum(apr_sales) as apr_sales ,sum(may_sales) as may_sales ,sum(jun_sales) as jun_sales ,sum(jul_sales) as jul_sales ,sum(aug_sales) as aug_sales ,sum(sep_sales) as sep_sales ,sum(oct_sales) as oct_sales ,sum(nov_sales) as nov_sales ,sum(dec_sales) as dec_sales ,sum(jan_sales/w_warehouse_sq_ft) as jan_sales_per_sq_foot ,sum(feb_sales/w_warehouse_sq_ft) as feb_sales_per_sq_foot ,sum(mar_sales/w_warehouse_sq_ft) as mar_sales_per_sq_foot ,sum(apr_sales/w_warehouse_sq_ft) as apr_sales_per_sq_foot ,sum(may_sales/w_warehouse_sq_ft) as may_sales_per_sq_foot ,sum(jun_sales/w_warehouse_sq_ft) as jun_sales_per_sq_foot ,sum(jul_sales/w_warehouse_sq_ft) as jul_sales_per_sq_foot ,sum(aug_sales/w_warehouse_sq_ft) as aug_sales_per_sq_foot ,sum(sep_sales/w_warehouse_sq_ft) as sep_sales_per_sq_foot ,sum(oct_sales/w_warehouse_sq_ft) as oct_sales_per_sq_foot ,sum(nov_sales/w_warehouse_sq_ft) as nov_sales_per_sq_foot ,sum(dec_sales/w_warehouse_sq_ft) as dec_sales_per_sq_foot ,sum(jan_net) as jan_net ,sum(feb_net) as feb_net ,sum(mar_net) as mar_net ,sum(apr_net) as apr_net ,sum(may_net) as may_net ,sum(jun_net) as jun_net ,sum(jul_net) as jul_net ,sum(aug_net) as aug_net ,sum(sep_net) as sep_net ,sum(oct_net) as oct_net ,sum(nov_net) as nov_net ,sum(dec_net) as dec_net from ( (select w_warehouse_name ,w_warehouse_sq_ft ,w_city ,w_county ,w_state ,w_country ,'LATVIAN' || ',' || 'BARIAN' as ship_carriers ,d_year as year ,sum(case when d_moy = 1 then ws_ext_sales_price* ws_quantity else 0 end) as jan_sales ,sum(case when d_moy = 2 then ws_ext_sales_price* ws_quantity else 0 end) as feb_sales ,sum(case when d_moy = 3 then ws_ext_sales_price* ws_quantity else 0 end) as mar_sales ,sum(case when d_moy = 4 then ws_ext_sales_price* ws_quantity else 0 end) as apr_sales ,sum(case when d_moy = 5 then ws_ext_sales_price* ws_quantity else 0 end) as may_sales ,sum(case when d_moy = 6 then ws_ext_sales_price* ws_quantity else 0 end) as jun_sales ,sum(case when d_moy = 7 then ws_ext_sales_price* ws_quantity else 0 end) as jul_sales ,sum(case when d_moy = 8 then ws_ext_sales_price* ws_quantity else 0 end) as aug_sales ,sum(case when d_moy = 9 then ws_ext_sales_price* ws_quantity else 0 end) as sep_sales ,sum(case when d_moy = 10 then ws_ext_sales_price* ws_quantity else 0 end) as oct_sales ,sum(case when d_moy = 11 then ws_ext_sales_price* ws_quantity else 0 end) as nov_sales ,sum(case when d_moy = 12 then ws_ext_sales_price* ws_quantity else 0 end) as dec_sales ,sum(case when d_moy = 1
  • 69. Page | 69 then ws_net_paid_inc_tax * ws_quantity else 0 end) as jan_net ,sum(case when d_moy = 2 then ws_net_paid_inc_tax * ws_quantity else 0 end) as feb_net ,sum(case when d_moy = 3 then ws_net_paid_inc_tax * ws_quantity else 0 end) as mar_net ,sum(case when d_moy = 4 then ws_net_paid_inc_tax * ws_quantity else 0 end) as apr_net ,sum(case when d_moy = 5 then ws_net_paid_inc_tax * ws_quantity else 0 end) as may_net ,sum(case when d_moy = 6 then ws_net_paid_inc_tax * ws_quantity else 0 end) as jun_net ,sum(case when d_moy = 7 then ws_net_paid_inc_tax * ws_quantity else 0 end) as jul_net ,sum(case when d_moy = 8 then ws_net_paid_inc_tax * ws_quantity else 0 end) as aug_net ,sum(case when d_moy = 9 then ws_net_paid_inc_tax * ws_quantity else 0 end) as sep_net ,sum(case when d_moy = 10 then ws_net_paid_inc_tax * ws_quantity else 0 end) as oct_net ,sum(case when d_moy = 11 then ws_net_paid_inc_tax * ws_quantity else 0 end) as nov_net ,sum(case when d_moy = 12 then ws_net_paid_inc_tax * ws_quantity else 0 end) as dec_net from web_sales ,warehouse ,date_dim ,time_dim ,ship_mode where ws_warehouse_sk = w_warehouse_sk and ws_sold_date_sk = d_date_sk and ws_sold_time_sk = t_time_sk and ws_ship_mode_sk = sm_ship_mode_sk and d_year = 2001 and t_time between 46669 and 46669+28800 and sm_carrier in ('LATVIAN','BARIAN') group by w_warehouse_name ,w_warehouse_sq_ft ,w_city ,w_county ,w_state ,w_country ,d_year ) union all (select w_warehouse_name ,w_warehouse_sq_ft ,w_city ,w_county ,w_state ,w_country ,'LATVIAN' || ',' || 'BARIAN' as ship_carriers ,d_year as year ,sum(case when d_moy = 1 then cs_sales_price* cs_quantity else 0 end) as jan_sales ,sum(case when d_moy = 2 then cs_sales_price* cs_quantity else 0 end) as feb_sales ,sum(case when d_moy = 3 then cs_sales_price* cs_quantity else 0 end) as mar_sales ,sum(case when d_moy = 4 then cs_sales_price* cs_quantity else 0 end) as apr_sales ,sum(case when d_moy = 5 then cs_sales_price* cs_quantity else 0 end) as may_sales ,sum(case when d_moy = 6 then cs_sales_price* cs_quantity else 0 end) as jun_sales ,sum(case when d_moy = 7 then cs_sales_price* cs_quantity else 0 end) as jul_sales ,sum(case when d_moy = 8 then cs_sales_price* cs_quantity else 0 end) as aug_sales ,sum(case when d_moy = 9 then cs_sales_price* cs_quantity else 0 end) as sep_sales ,sum(case when d_moy = 10 then cs_sales_price* cs_quantity else 0 end) as oct_sales ,sum(case when d_moy = 11 then cs_sales_price* cs_quantity else 0 end) as nov_sales
  • 70. Page | 70 ,sum(case when d_moy = 12 then cs_sales_price* cs_quantity else 0 end) as dec_sales ,sum(case when d_moy = 1 then cs_net_profit * cs_quantity else 0 end) as jan_net ,sum(case when d_moy = 2 then cs_net_profit * cs_quantity else 0 end) as feb_net ,sum(case when d_moy = 3 then cs_net_profit * cs_quantity else 0 end) as mar_net ,sum(case when d_moy = 4 then cs_net_profit * cs_quantity else 0 end) as apr_net ,sum(case when d_moy = 5 then cs_net_profit * cs_quantity else 0 end) as may_net ,sum(case when d_moy = 6 then cs_net_profit * cs_quantity else 0 end) as jun_net ,sum(case when d_moy = 7 then cs_net_profit * cs_quantity else 0 end) as jul_net ,sum(case when d_moy = 8 then cs_net_profit * cs_quantity else 0 end) as aug_net ,sum(case when d_moy = 9 then cs_net_profit * cs_quantity else 0 end) as sep_net ,sum(case when d_moy = 10 then cs_net_profit * cs_quantity else 0 end) as oct_net ,sum(case when d_moy = 11 then cs_net_profit * cs_quantity else 0 end) as nov_net ,sum(case when d_moy = 12 then cs_net_profit * cs_quantity else 0 end) as dec_net from catalog_sales ,warehouse ,date_dim ,time_dim ,ship_mode where cs_warehouse_sk = w_warehouse_sk and cs_sold_date_sk = d_date_sk and cs_sold_time_sk = t_time_sk and cs_ship_mode_sk = sm_ship_mode_sk and d_year = 2001 and t_time between 46669 AND 46669+28800 and sm_carrier in ('LATVIAN','BARIAN') group by w_warehouse_name ,w_warehouse_sq_ft ,w_city ,w_county ,w_state ,w_country ,d_year ) ) x group by w_warehouse_name ,w_warehouse_sq_ft ,w_city ,w_county ,w_state ,w_country ,ship_carriers ,year order by w_warehouse_name fetch first 100 rows only; -- end query 39 in stream 0 using template query66.tpl -- start query 40 in stream 0 using template query90.tpl and seed 1038311841 select cast(amc as decimal(15,4))/cast(pmc as decimal(15,4)) am_pm_ratio from ( select count(*) amc from web_sales, household_demographics , time_dim, web_page where ws_sold_time_sk = time_dim.t_time_sk and ws_ship_hdemo_sk = household_demographics.hd_demo_sk and ws_web_page_sk = web_page.wp_web_page_sk and time_dim.t_hour between 7 and 7+1 and household_demographics.hd_dep_count = 1 and web_page.wp_char_count between 5000 and 5200) at, ( select count(*) pmc from web_sales, household_demographics , time_dim, web_page where ws_sold_time_sk = time_dim.t_time_sk and ws_ship_hdemo_sk = household_demographics.hd_demo_sk and ws_web_page_sk = web_page.wp_web_page_sk
  • 71. Page | 71 and time_dim.t_hour between 18 and 18+1 and household_demographics.hd_dep_count = 1 and web_page.wp_char_count between 5000 and 5200) pt order by am_pm_ratio fetch first 100 rows only; -- end query 40 in stream 0 using template query90.tpl -- start query 41 in stream 0 using template query17.tpl and seed 2078761835 select i_item_id ,i_item_desc ,s_state ,count(ss_quantity) as store_sales_quantitycount ,avg(ss_quantity) as store_sales_quantityave ,stddev_samp(ss_quantity) as store_sales_quantitystdev ,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov ,count(sr_return_quantity) as_store_returns_quantitycount ,avg(sr_return_quantity) as_store_returns_quantityave ,stddev_samp(sr_return_quantity) as_store_returns_quantitystdev ,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as store_returns_quantitycov ,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) as catalog_sales_quantityave ,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitystdev ,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov from store_sales ,store_returns ,catalog_sales ,date_dim d1 ,date_dim d2 ,date_dim d3 ,store ,item where d1.d_quarter_name = '1998Q1' and d1.d_date_sk = ss_sold_date_sk and i_item_sk = ss_item_sk and s_store_sk = ss_store_sk and ss_customer_sk = sr_customer_sk and ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number and sr_returned_date_sk = d2.d_date_sk and d2.d_quarter_name in ('1998Q1','1998Q2','1998Q3') and sr_customer_sk = cs_bill_customer_sk and sr_item_sk = cs_item_sk and cs_sold_date_sk = d3.d_date_sk and d3.d_quarter_name in ('1998Q1','1998Q2','1998Q3') group by i_item_id ,i_item_desc ,s_state order by i_item_id ,i_item_desc ,s_state fetch first 100 rows only; -- end query 41 in stream 0 using template query17.tpl -- start query 42 in stream 0 using template query47.tpl and seed 218857573 with v1 as( select i_category, i_brand, s_store_name, s_company_name, d_year, d_moy, sum(ss_sales_price) sum_sales, avg(sum(ss_sales_price)) over (partition by i_category, i_brand, s_store_name, s_company_name, d_year) avg_monthly_sales, rank() over (partition by i_category, i_brand, s_store_name, s_company_name order by d_year, d_moy) rn from item, store_sales, date_dim, store where ss_item_sk = i_item_sk and ss_sold_date_sk = d_date_sk and ss_store_sk = s_store_sk and ( d_year = 2000 or ( d_year = 2000-1 and d_moy =12) or ( d_year = 2000+1 and d_moy =1) ) group by i_category, i_brand, s_store_name, s_company_name, d_year, d_moy), v2 as( select v1.s_store_name
  • 72. Page | 72 ,v1.d_year ,v1.avg_monthly_sales ,v1.sum_sales, v1_lag.sum_sales psum, v1_lead.sum_sales nsum from v1, v1 v1_lag, v1 v1_lead where v1.i_category = v1_lag.i_category and v1.i_category = v1_lead.i_category and v1.i_brand = v1_lag.i_brand and v1.i_brand = v1_lead.i_brand and v1.s_store_name = v1_lag.s_store_name and v1.s_store_name = v1_lead.s_store_name and v1.s_company_name = v1_lag.s_company_name and v1.s_company_name = v1_lead.s_company_name and v1.rn = v1_lag.rn + 1 and v1.rn = v1_lead.rn - 1) select * from v2 where d_year = 2000 and avg_monthly_sales > 0 and case when avg_monthly_sales > 0 then abs(sum_sales - avg_monthly_sales) / avg_monthly_sales else null end > 0.1 order by sum_sales - avg_monthly_sales, 3 fetch first 100 rows only; -- end query 42 in stream 0 using template query47.tpl -- start query 43 in stream 0 using template query95.tpl and seed 2064779767 with ws_wh as (select ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2 from web_sales ws1,web_sales ws2 where ws1.ws_order_number = ws2.ws_order_number and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk) select count(distinct ws_order_number) as "order count" ,sum(ws_ext_ship_cost) as "total shipping cost" ,sum(ws_net_profit) as "total net profit" from web_sales ws1 ,date_dim ,customer_address ,web_site where d_date between '2002-4-01' and (cast('2002-4-01' as date) + 60 days) and ws1.ws_ship_date_sk = d_date_sk and ws1.ws_ship_addr_sk = ca_address_sk and ca_state = 'SC' and ws1.ws_web_site_sk = web_site_sk and web_company_name = 'pri' and ws1.ws_order_number in (select ws_order_number from ws_wh) and ws1.ws_order_number in (select wr_order_number from web_returns,ws_wh where wr_order_number = ws_wh.ws_order_number) order by count(distinct ws_order_number) fetch first 100 rows only; -- end query 43 in stream 0 using template query95.tpl -- start query 44 in stream 0 using template query92.tpl and seed 227248084 select sum(ws_ext_discount_amt) as "Excess Discount Amount" from web_sales ,item ,date_dim where i_manufact_id = 85 and i_item_sk = ws_item_sk and d_date between '1999-01-05' and (cast('1999-01-05' as date) + 90 days) and d_date_sk = ws_sold_date_sk and ws_ext_discount_amt > ( SELECT 1.3 * avg(ws_ext_discount_amt) FROM web_sales ,date_dim WHERE ws_item_sk = i_item_sk and d_date between '1999-01-05' and (cast('1999-01-05' as date) + 90 days) and d_date_sk = ws_sold_date_sk ) order by sum(ws_ext_discount_amt) fetch first 100 rows only;
  • 73. Page | 73 -- end query 44 in stream 0 using template query92.tpl -- start query 45 in stream 0 using template query3.tpl and seed 1565567065 select dt.d_year ,item.i_brand_id brand_id ,item.i_brand brand ,sum(ss_sales_price) sum_agg from date_dim dt ,store_sales ,item where dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = item.i_item_sk and item.i_manufact_id = 423 and dt.d_moy=11 group by dt.d_year ,item.i_brand ,item.i_brand_id order by dt.d_year ,sum_agg desc ,brand_id fetch first 100 rows only; -- end query 45 in stream 0 using template query3.tpl -- start query 46 in stream 0 using template query51.tpl and seed 1975445669 WITH web_v1 as ( select ws_item_sk item_sk, d_date, sum(sum(ws_sales_price)) over (partition by ws_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from web_sales ,date_dim where ws_sold_date_sk=d_date_sk and d_month_seq between 1188 and 1188+11 and ws_item_sk is not NULL group by ws_item_sk, d_date), store_v1 as ( select ss_item_sk item_sk, d_date, sum(sum(ss_sales_price)) over (partition by ss_item_sk order by d_date rows between unbounded preceding and current row) cume_sales from store_sales ,date_dim where ss_sold_date_sk=d_date_sk and d_month_seq between 1188 and 1188+11 and ss_item_sk is not NULL group by ss_item_sk, d_date) select * from (select item_sk ,d_date ,web_sales ,store_sales ,max(web_sales) over (partition by item_sk order by d_date rows between unbounded preceding and current row) web_cumulative ,max(store_sales) over (partition by item_sk order by d_date rows between unbounded preceding and current row) store_cumulative from (select case when web.item_sk is not null then web.item_sk else store.item_sk end item_sk ,case when web.d_date is not null then web.d_date else store.d_date end d_date ,web.cume_sales web_sales ,store.cume_sales store_sales from web_v1 web full outer join store_v1 store on (web.item_sk = store.item_sk and web.d_date = store.d_date) )x )y where web_cumulative > store_cumulative order by item_sk ,d_date fetch first 100 rows only; -- end query 46 in stream 0 using template query51.tpl -- start query 47 in stream 0 using template query35.tpl and seed 1491334050 select ca_state, cd_gender, cd_marital_status, count(*) cnt1, sum(cd_dep_count), sum(cd_dep_count), avg(cd_dep_count), cd_dep_employed_count, count(*) cnt2, sum(cd_dep_employed_count), sum(cd_dep_employed_count), avg(cd_dep_employed_count),
  • 74. Page | 74 cd_dep_college_count, count(*) cnt3, sum(cd_dep_college_count), sum(cd_dep_college_count), avg(cd_dep_college_count) from customer c,customer_address ca,customer_demographics where c.c_current_addr_sk = ca.ca_address_sk and cd_demo_sk = c.c_current_cdemo_sk and exists (select * from store_sales,date_dim where c.c_customer_sk = ss_customer_sk and ss_sold_date_sk = d_date_sk and d_year = 2001 and d_qoy < 4) and (exists (select * from web_sales,date_dim where c.c_customer_sk = ws_bill_customer_sk and ws_sold_date_sk = d_date_sk and d_year = 2001 and d_qoy < 4) or exists (select * from catalog_sales,date_dim where c.c_customer_sk = cs_ship_customer_sk and cs_sold_date_sk = d_date_sk and d_year = 2001 and d_qoy < 4)) group by ca_state, cd_gender, cd_marital_status, cd_dep_count, cd_dep_employed_count, cd_dep_college_count order by ca_state, cd_gender, cd_marital_status, cd_dep_count, cd_dep_employed_count, cd_dep_college_count fetch first 100 rows only; -- end query 47 in stream 0 using template query35.tpl -- start query 48 in stream 0 using template query49.tpl and seed 686314496 select 'web' as channel ,web.item ,web.return_ratio ,web.return_rank ,web.currency_rank from ( select item ,return_ratio ,currency_ratio ,rank() over (order by return_ratio) as return_rank ,rank() over (order by currency_ratio) as currency_rank from ( select ws.ws_item_sk as item ,(cast(sum(coalesce(wr.wr_return_quantity,0)) as dec(15,4))/ cast(sum(coalesce(ws.ws_quantity,0)) as dec(15,4) )) as return_ratio ,(cast(sum(coalesce(wr.wr_return_amt,0)) as dec(15,4))/ cast(sum(coalesce(ws.ws_net_paid,0)) as dec(15,4) )) as currency_ratio from web_sales ws left outer join web_returns wr on (ws.ws_order_number = wr.wr_order_number and ws.ws_item_sk = wr.wr_item_sk) ,date_dim where wr.wr_return_amt > 10000 and ws.ws_net_profit > 1 and ws.ws_net_paid > 0 and ws.ws_quantity > 0 and ws_sold_date_sk = d_date_sk and d_year = 1999 and d_moy = 12 group by ws.ws_item_sk ) in_web ) web where (
  • 75. Page | 75 web.return_rank <= 10 or web.currency_rank <= 10 ) union select 'catalog' as channel ,catalog.item ,catalog.return_ratio ,catalog.return_rank ,catalog.currency_rank from ( select item ,return_ratio ,currency_ratio ,rank() over (order by return_ratio) as return_rank ,rank() over (order by currency_ratio) as currency_rank from ( select cs.cs_item_sk as item ,(cast(sum(coalesce(cr.cr_return_quantity,0)) as dec(15,4))/ cast(sum(coalesce(cs.cs_quantity,0)) as dec(15,4) )) as return_ratio ,(cast(sum(coalesce(cr.cr_return_amount,0)) as dec(15,4))/ cast(sum(coalesce(cs.cs_net_paid,0)) as dec(15,4) )) as currency_ratio from catalog_sales cs left outer join catalog_returns cr on (cs.cs_order_number = cr.cr_order_number and cs.cs_item_sk = cr.cr_item_sk) ,date_dim where cr.cr_return_amount > 10000 and cs.cs_net_profit > 1 and cs.cs_net_paid > 0 and cs.cs_quantity > 0 and cs_sold_date_sk = d_date_sk and d_year = 1999 and d_moy = 12 group by cs.cs_item_sk ) in_cat ) catalog where ( catalog.return_rank <= 10 or catalog.currency_rank <=10 ) union select 'store' as channel ,store.item ,store.return_ratio ,store.return_rank ,store.currency_rank from ( select item ,return_ratio ,currency_ratio ,rank() over (order by return_ratio) as return_rank ,rank() over (order by currency_ratio) as currency_rank from ( select sts.ss_item_sk as item ,(cast(sum(coalesce(sr.sr_return_quantity,0)) as dec(15,4))/cast(sum(coalesce(sts.ss_quantity,0)) as dec(15,4) )) as return_ratio ,(cast(sum(coalesce(sr.sr_return_amt,0)) as dec(15,4))/cast(sum(coalesce(sts.ss_net_paid,0)) as dec(15,4) )) as currency_ratio from store_sales sts left outer join store_returns sr on (sts.ss_ticket_number = sr.sr_ticket_number and sts.ss_item_sk = sr.sr_item_sk) ,date_dim where sr.sr_return_amt > 10000 and sts.ss_net_profit > 1 and sts.ss_net_paid > 0 and sts.ss_quantity > 0 and ss_sold_date_sk = d_date_sk and d_year = 1999 and d_moy = 12
  • 76. Page | 76 group by sts.ss_item_sk ) in_store ) store where ( store.return_rank <= 10 or store.currency_rank <= 10 ) order by 1,4,5 fetch first 100 rows only; -- end query 48 in stream 0 using template query49.tpl -- start query 49 in stream 0 using template query9.tpl and seed 1544661402 select case when (select count(*) from store_sales where ss_quantity between 1 and 20) > 864784291 then (select avg(ss_ext_list_price) from store_sales where ss_quantity between 1 and 20) else (select avg(ss_net_profit) from store_sales where ss_quantity between 1 and 20) end bucket1 , case when (select count(*) from store_sales where ss_quantity between 21 and 40) > 719640698 then (select avg(ss_ext_list_price) from store_sales where ss_quantity between 21 and 40) else (select avg(ss_net_profit) from store_sales where ss_quantity between 21 and 40) end bucket2, case when (select count(*) from store_sales where ss_quantity between 41 and 60) > 842161476 then (select avg(ss_ext_list_price) from store_sales where ss_quantity between 41 and 60) else (select avg(ss_net_profit) from store_sales where ss_quantity between 41 and 60) end bucket3, case when (select count(*) from store_sales where ss_quantity between 61 and 80) > 918194147 then (select avg(ss_ext_list_price) from store_sales where ss_quantity between 61 and 80) else (select avg(ss_net_profit) from store_sales where ss_quantity between 61 and 80) end bucket4, case when (select count(*) from store_sales where ss_quantity between 81 and 100) > 694910283 then (select avg(ss_ext_list_price) from store_sales where ss_quantity between 81 and 100) else (select avg(ss_net_profit) from store_sales where ss_quantity between 81 and 100) end bucket5 from reason where r_reason_sk = 1 ; -- end query 49 in stream 0 using template query9.tpl -- start query 50 in stream 0 using template query31.tpl and seed 535157530 with ss as (select ca_county,d_qoy, d_year,sum(ss_ext_sales_price) as store_sales from store_sales,date_dim,customer_address where ss_sold_date_sk = d_date_sk and ss_addr_sk=ca_address_sk group by ca_county,d_qoy, d_year), ws as (select ca_county,d_qoy, d_year,sum(ws_ext_sales_price) as web_sales from web_sales,date_dim,customer_address where ws_sold_date_sk = d_date_sk and ws_bill_addr_sk=ca_address_sk group by ca_county,d_qoy, d_year) select /* tt */ ss1.ca_county ,ss1.d_year ,ws2.web_sales/ws1.web_sales web_q1_q2_increase
  • 77. Page | 77 ,ss2.store_sales/ss1.store_sales store_q1_q2_increase ,ws3.web_sales/ws2.web_sales web_q2_q3_increase ,ss3.store_sales/ss2.store_sales store_q2_q3_increase from ss ss1 ,ss ss2 ,ss ss3 ,ws ws1 ,ws ws2 ,ws ws3 where ss1.d_qoy = 1 and ss1.d_year = 1998 and ss1.ca_county = ss2.ca_county and ss2.d_qoy = 2 and ss2.d_year = 1998 and ss2.ca_county = ss3.ca_county and ss3.d_qoy = 3 and ss3.d_year = 1998 and ss1.ca_county = ws1.ca_county and ws1.d_qoy = 1 and ws1.d_year = 1998 and ws1.ca_county = ws2.ca_county and ws2.d_qoy = 2 and ws2.d_year = 1998 and ws1.ca_county = ws3.ca_county and ws3.d_qoy = 3 and ws3.d_year =1998 and case when ws1.web_sales > 0 then ws2.web_sales/ws1.web_sales else null end > case when ss1.store_sales > 0 then ss2.store_sales/ss1.store_sales else null end and case when ws2.web_sales > 0 then ws3.web_sales/ws2.web_sales else null end > case when ss2.store_sales > 0 then ss3.store_sales/ss2.store_sales else null end order by ss1.d_year; -- end query 50 in stream 0 using template query31.tpl -- start query 51 in stream 0 using template query11.tpl and seed 1727350231 with year_total as ( select c_customer_id customer_id ,c_first_name customer_first_name ,c_last_name customer_last_name ,c_preferred_cust_flag customer_preferred_cust_flag ,c_birth_country customer_birth_country ,c_login customer_login ,c_email_address customer_email_address ,d_year dyear ,sum(ss_ext_list_price-ss_ext_discount_amt) year_total ,'s' sale_type from customer ,store_sales ,date_dim where c_customer_sk = ss_customer_sk and ss_sold_date_sk = d_date_sk group by c_customer_id ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_country ,c_login ,c_email_address ,d_year union all select c_customer_id customer_id ,c_first_name customer_first_name ,c_last_name customer_last_name ,c_preferred_cust_flag customer_preferred_cust_flag ,c_birth_country customer_birth_country ,c_login customer_login ,c_email_address customer_email_address ,d_year dyear ,sum(ws_ext_list_price-ws_ext_discount_amt) year_total ,'w' sale_type from customer ,web_sales ,date_dim where c_customer_sk = ws_bill_customer_sk and ws_sold_date_sk = d_date_sk group by c_customer_id ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_country ,c_login ,c_email_address ,d_year
  • 78. Page | 78 ) select t_s_secyear.customer_last_name from year_total t_s_firstyear ,year_total t_s_secyear ,year_total t_w_firstyear ,year_total t_w_secyear where t_s_secyear.customer_id = t_s_firstyear.customer_id and t_s_firstyear.customer_id = t_w_secyear.customer_id and t_s_firstyear.customer_id = t_w_firstyear.customer_id and t_s_firstyear.sale_type = 's' and t_w_firstyear.sale_type = 'w' and t_s_secyear.sale_type = 's' and t_w_secyear.sale_type = 'w' and t_s_firstyear.dyear = 2001 and t_s_secyear.dyear = 2001+1 and t_w_firstyear.dyear = 2001 and t_w_secyear.dyear = 2001+1 and t_s_firstyear.year_total > 0 and t_w_firstyear.year_total > 0 and case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else null end > case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else null end order by t_s_secyear.customer_last_name fetch first 100 rows only; -- end query 51 in stream 0 using template query11.tpl -- start query 52 in stream 0 using template query93.tpl and seed 1891392271 select ss_customer_sk ,sum(act_sales) sumsales from (select ss_item_sk ,ss_ticket_number ,ss_customer_sk ,case when sr_return_quantity is not null then (ss_quantity- sr_return_quantity)*ss_sales_price else (ss_quantity*ss_sales_price) end act_sales from store_sales left outer join store_returns on (sr_item_sk = ss_item_sk and sr_ticket_number = ss_ticket_number) ,reason where sr_reason_sk = r_reason_sk and r_reason_desc = 'reason 47') t group by ss_customer_sk order by sumsales, ss_customer_sk fetch first 100 rows only; -- end query 52 in stream 0 using template query93.tpl -- start query 53 in stream 0 using template query29.tpl and seed 700403081 select i_item_id ,i_item_desc ,s_store_id ,s_store_name ,avg(ss_quantity) as store_sales_quantity ,avg(sr_return_quantity) as store_returns_quantity ,avg(cs_quantity) as catalog_sales_quantity from store_sales ,store_returns ,catalog_sales ,date_dim d1 ,date_dim d2 ,date_dim d3 ,store ,item where d1.d_moy = 4 and d1.d_year = 2000 and d1.d_date_sk = ss_sold_date_sk and i_item_sk = ss_item_sk and s_store_sk = ss_store_sk and ss_customer_sk = sr_customer_sk and ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number and sr_returned_date_sk = d2.d_date_sk and d2.d_moy between 4 and 4 + 3 and d2.d_year = 2000 and sr_customer_sk = cs_bill_customer_sk and sr_item_sk = cs_item_sk and cs_sold_date_sk = d3.d_date_sk and d3.d_year in (2000,2000+1,2000+2) group by i_item_id ,i_item_desc ,s_store_id
  • 79. Page | 79 ,s_store_name order by i_item_id ,i_item_desc ,s_store_id ,s_store_name fetch first 100 rows only; -- end query 53 in stream 0 using template query29.tpl -- start query 54 in stream 0 using template query38.tpl and seed 179097785 select count(*) from ( select distinct c_last_name, c_first_name, d_date from store_sales, date_dim, customer where store_sales.ss_sold_date_sk = date_dim.d_date_sk and store_sales.ss_customer_sk = customer.c_customer_sk and d_month_seq between 1183 and 1183 + 11 intersect select distinct c_last_name, c_first_name, d_date from catalog_sales, date_dim, customer where catalog_sales.cs_sold_date_sk = date_dim.d_date_sk and catalog_sales.cs_bill_customer_sk = customer.c_customer_sk and d_month_seq between 1183 and 1183 + 11 intersect select distinct c_last_name, c_first_name, d_date from web_sales, date_dim, customer where web_sales.ws_sold_date_sk = date_dim.d_date_sk and web_sales.ws_bill_customer_sk = customer.c_customer_sk and d_month_seq between 1183 and 1183 + 11 ) hot_cust fetch first 100 rows only; -- end query 54 in stream 0 using template query38.tpl -- start query 55 in stream 0 using template query22.tpl and seed 1074257943 select i_product_name ,i_brand ,i_class ,i_category ,avg(cast(inv_quantity_on_hand as double)) qoh from inventory ,date_dim ,item ,warehouse where inv_date_sk=d_date_sk and inv_item_sk=i_item_sk and inv_warehouse_sk = w_warehouse_sk and d_month_seq between 1203 and 1203 + 11 group by rollup(i_product_name ,i_brand ,i_class ,i_category) order by qoh, i_product_name, i_brand, i_class, i_category fetch first 100 rows only; -- end query 55 in stream 0 using template query22.tpl -- start query 56 in stream 0 using template query89.tpl and seed 1684776629 select * from( select i_category, i_class, i_brand, s_store_name, s_company_name, d_moy, sum(ss_sales_price) sum_sales, avg(sum(ss_sales_price)) over (partition by i_category, i_brand, s_store_name, s_company_name) avg_monthly_sales from item, store_sales, date_dim, store where ss_item_sk = i_item_sk and ss_sold_date_sk = d_date_sk and ss_store_sk = s_store_sk and d_year in (2002) and ((i_category in ('Shoes','Music','Children') and i_class in ('mens','classical','toddlers') ) or (i_category in ('Home','Electronics','Sports') and i_class in ('lighting','portable','athletic shoes') )) group by i_category, i_class, i_brand, s_store_name, s_company_name, d_moy) tmp1
  • 80. Page | 80 where case when (avg_monthly_sales <> 0) then (abs(sum_sales - avg_monthly_sales) / avg_monthly_sales) else null end > 0.1 order by sum_sales - avg_monthly_sales, s_store_name fetch first 100 rows only; -- end query 56 in stream 0 using template query89.tpl -- start query 57 in stream 0 using template query15.tpl and seed 631273844 select ca_zip ,sum(cs_sales_price) from catalog_sales ,customer ,customer_address ,date_dim where cs_bill_customer_sk = c_customer_sk and c_current_addr_sk = ca_address_sk and ( substr(ca_zip,1,5) in ('85669', '86197','88274','83405','86475', '85392', '85460', '80348', '81792') or ca_state in ('CA','WA','GA') or cs_sales_price > 500) and cs_sold_date_sk = d_date_sk and d_qoy = 1 and d_year = 2002 group by ca_zip order by ca_zip fetch first 100 rows only; -- end query 57 in stream 0 using template query15.tpl -- start query 58 in stream 0 using template query6.tpl and seed 327264001 select a.ca_state state, count(*) cnt from customer_address a ,customer c ,store_sales s ,date_dim d ,item i where a.ca_address_sk = c.c_current_addr_sk and c.c_customer_sk = s.ss_customer_sk and s.ss_sold_date_sk = d.d_date_sk and s.ss_item_sk = i.i_item_sk and d.d_month_seq = (select distinct (d_month_seq) from date_dim where d_year = 1999 and d_moy = 3 ) and i.i_current_price > 1.2 * (select avg(j.i_current_price) from item j where j.i_category = i.i_category) group by a.ca_state having count(*) >= 10 order by cnt fetch first 100 rows only; -- end query 58 in stream 0 using template query6.tpl -- start query 59 in stream 0 using template query52.tpl and seed 1783319695 select dt.d_year ,item.i_brand_id brand_id ,item.i_brand brand ,sum(ss_ext_sales_price) ext_price from date_dim dt ,store_sales ,item where dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = item.i_item_sk and item.i_manager_id = 1 and dt.d_moy=12 and dt.d_year=1998 group by dt.d_year ,item.i_brand ,item.i_brand_id order by dt.d_year ,ext_price desc ,brand_id fetch first 100 rows only ; -- end query 59 in stream 0 using template query52.tpl -- start query 60 in stream 0 using template query50.tpl and seed 499173639 select s_store_name ,s_company_id ,s_street_number ,s_street_name ,s_street_type ,s_suite_number ,s_city ,s_county ,s_state ,s_zip
  • 81. Page | 81 ,sum(case when (sr_returned_date_sk - ss_sold_date_sk <= 30 ) then 1 else 0 end) as "30 days" ,sum(case when (sr_returned_date_sk - ss_sold_date_sk > 30) and (sr_returned_date_sk - ss_sold_date_sk <= 60) then 1 else 0 end ) as "31-60 days" ,sum(case when (sr_returned_date_sk - ss_sold_date_sk > 60) and (sr_returned_date_sk - ss_sold_date_sk <= 90) then 1 else 0 end) as "61- 90 days" ,sum(case when (sr_returned_date_sk - ss_sold_date_sk > 90) and (sr_returned_date_sk - ss_sold_date_sk <= 120) then 1 else 0 end) as "91-120 days" ,sum(case when (sr_returned_date_sk - ss_sold_date_sk > 120) then 1 else 0 end) as ">120 days" from store_sales ,store_returns ,store ,date_dim d1 ,date_dim d2 where d2.d_year = 2002 and d2.d_moy = 9 and ss_ticket_number = sr_ticket_number and ss_item_sk = sr_item_sk and ss_sold_date_sk = d1.d_date_sk and sr_returned_date_sk = d2.d_date_sk and ss_customer_sk = sr_customer_sk and ss_store_sk = s_store_sk group by s_store_name ,s_company_id ,s_street_number ,s_street_name ,s_street_type ,s_suite_number ,s_city ,s_county ,s_state ,s_zip order by s_store_name ,s_company_id ,s_street_number ,s_street_name ,s_street_type ,s_suite_number ,s_city ,s_county ,s_state ,s_zip fetch first 100 rows only; -- end query 60 in stream 0 using template query50.tpl -- start query 61 in stream 0 using template query42.tpl and seed 801946299 select dt.d_year ,item.i_category_id ,item.i_category ,sum(ss_ext_sales_price) from date_dim dt ,store_sales ,item where dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = item.i_item_sk and item.i_manager_id = 1 and dt.d_moy=12 and dt.d_year=1999 group by dt.d_year ,item.i_category_id ,item.i_category order by sum(ss_ext_sales_price) desc,dt.d_year ,item.i_category_id ,item.i_category fetch first 100 rows only ; -- end query 61 in stream 0 using template query42.tpl -- start query 62 in stream 0 using template query41.tpl and seed 1556130879 select distinct(i_product_name) from item i1 where i_manufact_id between 708 and 708+40 and (select count(*) as item_cnt from item where (i_manufact = i1.i_manufact and ((i_category = 'Women' and (i_color = 'blanched' or i_color = 'sandy') and (i_units = 'Gram' or i_units = 'Gross') and (i_size = 'small' or i_size = 'large')
  • 82. Page | 82 ) or (i_category = 'Women' and (i_color = 'peru' or i_color = 'firebrick') and (i_units = 'Lb' or i_units = 'Pound') and (i_size = 'petite' or i_size = 'economy') ) or (i_category = 'Men' and (i_color = 'coral' or i_color = 'ivory') and (i_units = 'Oz' or i_units = 'Dram') and (i_size = 'medium' or i_size = 'extra large') ) or (i_category = 'Men' and (i_color = 'cornflower' or i_color = 'yellow') and (i_units = 'Box' or i_units = 'Dozen') and (i_size = 'small' or i_size = 'large') ))) or (i_manufact = i1.i_manufact and ((i_category = 'Women' and (i_color = 'orchid' or i_color = 'navy') and (i_units = 'Pallet' or i_units = 'Ton') and (i_size = 'small' or i_size = 'large') ) or (i_category = 'Women' and (i_color = 'smoke' or i_color = 'chartreuse') and (i_units = 'Tsp' or i_units = 'Cup') and (i_size = 'petite' or i_size = 'economy') ) or (i_category = 'Men' and (i_color = 'turquoise' or i_color = 'almond') and (i_units = 'N/A' or i_units = 'Carton') and (i_size = 'medium' or i_size = 'extra large') ) or (i_category = 'Men' and (i_color = 'dim' or i_color = 'mint') and (i_units = 'Unknown' or i_units = 'Tbl') and (i_size = 'small' or i_size = 'large') )))) > 0 order by i_product_name fetch first 100 rows only; -- end query 62 in stream 0 using template query41.tpl -- start query 63 in stream 0 using template query8.tpl and seed 1332175075 select s_store_name ,sum(ss_net_profit) from store_sales ,date_dim ,store, (select ca_zip from ( (SELECT substr(ca_zip,1,5) ca_zip FROM customer_address WHERE substr(ca_zip,1,5) IN ( '17520','56461','11390','87479','50201','64392', '77741','76113','54207','15320','44569', '35851','46871','50295','14109','70069', '42274','72697','49813','18583','63339', '60505','99432','79884','33972','42525', '35092','25778','22629','64234','29226', '75520','98109','16929','55589','40349', '19272','40489','28727','21155','14808', '49719','90782','96126','56778','31988', '59430','94944','40599','83996','13656', '56186','43140','61896','41823','37763', '88569','63139','49977','68798','61598', '12149','11627','83980','49908','32429', '12310','95102','68778','28297','44532', '78974','23090','44128','59881','17124', '70629','98394','50450','55883','33325', '85623','38485','49236','16046','86766', '52396','36647','74681','92467','76826', '28698','76613','49428','60613','78399', '32006','56656','50099','62541','24195',
  • 83. Page | 83 '59554','47479','27633','86644','77196', '11416','25315','69480','55282','21296', '74115','39036','57192','26772','41446', '85594','26170','32014','33686','17417', '38479','39798','26984','15384','19701', '31840','75749','10821','19540','74993', '14695','36295','77284','30705','60499', '88870','22740','93118','24062','89801', '64498','24353','63764','48640','20763', '81686','86801','14510','97250','63328', '14274','84750','37540','13141','43656', '42594','62162','30856','58781','97307', '30425','47381','29354','10208','53823', '34767','45240','92270','20139','32558', '85961','25518','18478','68301','28043', '95864','19684','23565','71884','28618', '83171','68892','91727','77558','42337', '13172','53387','18098','27450','54631', '87025','25044','61857','17079','37192', '56817','24721','86299','21755','11584', '29803','55705','52938','80270','15967', '68622','53938','45389','25599','29162', '31836','26662','13248','21731','12125', '83522','25188','99246','81384','26365', '10509','52595','27999','35400','64795', '80604','11425','62684','40040','63709', '31391','39805','23995','56156','30740', '12236','49117','30327','93204','36708', '99477','15829','19538','30944','41725', '13717','92499','62369','38977','22461', '78617','56289','77625','75530','89141', '82011','12189','86300','13413','49611', '98875','20610','94062','19966','44440', '53620','18960','24320','72802','64177', '96013','44107','48600','76115','59687', '91038','47763','10294','33300','68127', '23375','91206','63518','98214','14559', '12392','35386','10880','47707','63489', '82126','49457','45829','18615','15584', '22508','12710','50721','75233','61958', '16475','55756','69792','35722','27456', '51312','60703','34980','18416','16237', '12628','38465','52536','31266','48258', '86017','97136','50287','23293','90970', '37154','65745','34215','91448','84360', '79510','19040','64301','17062','20644', '54703','29273','13740','56946','20456', '68468','84097','84416','12869','26949', '52033','96951','31672','81587','95676', '37610','93894','39271','12437','56669', '18231','76214','11829','16427','68398', '33984','63223','71905','88964','44052', '99132','16234','11429','77186','63739', '78627','10605','66822','59934','78689', '15739','48704','66247','88173','75206', '35437','89093','65467','80334','53903', '67968','24348','21275','45914','34383', '30991','71679','15697','30316','30118',
  • 84. Page | 84 '15177','66847','20352','93605','69947', '73326','86839','62858','89802')) intersect (select ca_zip from (SELECT substr(ca_zip,1,5) ca_zip,count(*) cnt FROM customer_address, customer WHERE ca_address_sk = c_current_addr_sk and c_preferred_cust_flag='Y' group by ca_zip having count(*) > 10)A1))A2) V1 where ss_store_sk = s_store_sk and ss_sold_date_sk = d_date_sk and d_qoy = 2 and d_year = 1998 and (substr(s_zip,1,2) = substr(V1.ca_zip,1,2)) group by s_store_name order by s_store_name fetch first 100 rows only; -- end query 63 in stream 0 using template query8.tpl -- start query 64 in stream 0 using template query12.tpl and seed 893042407 select i_item_desc ,i_category ,i_class ,i_current_price ,sum(ws_ext_sales_price) as itemrevenue ,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_ sales_price)) over (partition by i_class) as revenueratio from web_sales ,item ,date_dim where ws_item_sk = i_item_sk and i_category in ('Jewelry', 'Men', 'Music') and ws_sold_date_sk = d_date_sk and d_date between cast('2002-05-04' as date) and (cast('2002-05- 04' as date) + 30 days) group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio fetch first 100 rows only; -- end query 64 in stream 0 using template query12.tpl -- start query 65 in stream 0 using template query20.tpl and seed 315194146 select i_item_desc ,i_category ,i_class ,i_current_price ,sum(cs_ext_sales_price) as itemrevenue ,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over (partition by i_class) as revenueratio from catalog_sales ,item ,date_dim where cs_item_sk = i_item_sk and i_category in ('Home', 'Electronics', 'Children') and cs_sold_date_sk = d_date_sk and d_date between cast('2002-03-06' as date) and (cast('2002-03- 06' as date) + 30 days) group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio fetch first 100 rows only; -- end query 65 in stream 0 using template query20.tpl -- start query 66 in stream 0 using template query88.tpl and seed 1356399435
  • 85. Page | 85 select * from (select count(*) h8_30_to_9 from store_sales, household_demographics , time_dim, store where ss_sold_time_sk = time_dim.t_time_sk and ss_hdemo_sk = household_demographics.hd_demo_sk and ss_store_sk = s_store_sk and time_dim.t_hour = 8 and time_dim.t_minute >= 30 and ((household_demographics.hd_dep_count = 4 and household_demographics.hd_vehicle_count<=4+2) or (household_demographics.hd_dep_count = 3 and household_demographics.hd_vehicle_count<=3+2) or (household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2)) and store.s_store_name = 'ese') s1, (select count(*) h9_to_9_30 from store_sales, household_demographics , time_dim, store where ss_sold_time_sk = time_dim.t_time_sk and ss_hdemo_sk = household_demographics.hd_demo_sk and ss_store_sk = s_store_sk and time_dim.t_hour = 9 and time_dim.t_minute < 30 and ((household_demographics.hd_dep_count = 4 and household_demographics.hd_vehicle_count<=4+2) or (household_demographics.hd_dep_count = 3 and household_demographics.hd_vehicle_count<=3+2) or (household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2)) and store.s_store_name = 'ese') s2, (select count(*) h9_30_to_10 from store_sales, household_demographics , time_dim, store where ss_sold_time_sk = time_dim.t_time_sk and ss_hdemo_sk = household_demographics.hd_demo_sk and ss_store_sk = s_store_sk and time_dim.t_hour = 9 and time_dim.t_minute >= 30 and ((household_demographics.hd_dep_count = 4 and household_demographics.hd_vehicle_count<=4+2) or (household_demographics.hd_dep_count = 3 and household_demographics.hd_vehicle_count<=3+2) or (household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2)) and store.s_store_name = 'ese') s3, (select count(*) h10_to_10_30 from store_sales, household_demographics , time_dim, store where ss_sold_time_sk = time_dim.t_time_sk and ss_hdemo_sk = household_demographics.hd_demo_sk and ss_store_sk = s_store_sk and time_dim.t_hour = 10 and time_dim.t_minute < 30 and ((household_demographics.hd_dep_count = 4 and household_demographics.hd_vehicle_count<=4+2) or (household_demographics.hd_dep_count = 3 and household_demographics.hd_vehicle_count<=3+2) or (household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2)) and store.s_store_name = 'ese') s4, (select count(*) h10_30_to_11 from store_sales, household_demographics , time_dim, store where ss_sold_time_sk = time_dim.t_time_sk and ss_hdemo_sk = household_demographics.hd_demo_sk and ss_store_sk = s_store_sk and time_dim.t_hour = 10 and time_dim.t_minute >= 30 and ((household_demographics.hd_dep_count = 4 and household_demographics.hd_vehicle_count<=4+2) or (household_demographics.hd_dep_count = 3 and household_demographics.hd_vehicle_count<=3+2) or (household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2)) and store.s_store_name = 'ese') s5, (select count(*) h11_to_11_30 from store_sales, household_demographics , time_dim, store where ss_sold_time_sk = time_dim.t_time_sk and ss_hdemo_sk = household_demographics.hd_demo_sk and ss_store_sk = s_store_sk and time_dim.t_hour = 11 and time_dim.t_minute < 30 and ((household_demographics.hd_dep_count = 4 and household_demographics.hd_vehicle_count<=4+2) or (household_demographics.hd_dep_count = 3 and household_demographics.hd_vehicle_count<=3+2) or (household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2)) and store.s_store_name = 'ese') s6,
  • 86. Page | 86 (select count(*) h11_30_to_12 from store_sales, household_demographics , time_dim, store where ss_sold_time_sk = time_dim.t_time_sk and ss_hdemo_sk = household_demographics.hd_demo_sk and ss_store_sk = s_store_sk and time_dim.t_hour = 11 and time_dim.t_minute >= 30 and ((household_demographics.hd_dep_count = 4 and household_demographics.hd_vehicle_count<=4+2) or (household_demographics.hd_dep_count = 3 and household_demographics.hd_vehicle_count<=3+2) or (household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2)) and store.s_store_name = 'ese') s7, (select count(*) h12_to_12_30 from store_sales, household_demographics , time_dim, store where ss_sold_time_sk = time_dim.t_time_sk and ss_hdemo_sk = household_demographics.hd_demo_sk and ss_store_sk = s_store_sk and time_dim.t_hour = 12 and time_dim.t_minute < 30 and ((household_demographics.hd_dep_count = 4 and household_demographics.hd_vehicle_count<=4+2) or (household_demographics.hd_dep_count = 3 and household_demographics.hd_vehicle_count<=3+2) or (household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2)) and store.s_store_name = 'ese') s8 ; -- end query 66 in stream 0 using template query88.tpl -- start query 67 in stream 0 using template query82.tpl and seed 1485865852 select i_item_id ,i_item_desc ,i_current_price from item, inventory, date_dim, store_sales where i_current_price between 16 and 16+30 and inv_item_sk = i_item_sk and d_date_sk=inv_date_sk and d_date between cast('2000-02-07' as date) and (cast('2000-02-07' as date) + 60 days) and i_manufact_id in (763,359,233,597) and inv_quantity_on_hand between 100 and 500 and ss_item_sk = i_item_sk group by i_item_id,i_item_desc,i_current_price order by i_item_id fetch first 100 rows only; -- end query 67 in stream 0 using template query82.tpl -- start query 68 in stream 0 using template query23.tpl and seed 1801862381 with frequent_ss_items as (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date solddate,count(*) cnt from store_sales ,date_dim ,item where ss_sold_date_sk = d_date_sk and ss_item_sk = i_item_sk and d_year in (2000,2000+1,2000+2,2000+3) group by substr(i_item_desc,1,30),i_item_sk,d_date having count(*) >4), max_store_sales as (select max(csales) tpcds_cmax from (select c_customer_sk,sum(ss_quantity*ss_sales_price) csales from store_sales ,customer ,date_dim where ss_customer_sk = c_customer_sk and ss_sold_date_sk = d_date_sk and d_year in (2000,2000+1,2000+2,2000+3) group by c_customer_sk) x), best_ss_customer as (select c_customer_sk,sum(ss_quantity*ss_sales_price) ssales from store_sales ,customer where ss_customer_sk = c_customer_sk group by c_customer_sk having sum(ss_quantity*ss_sales_price) > (95/100.0) * (select * from max_store_sales)) select sum(sales) from ((select cs_quantity*cs_list_price sales from catalog_sales
  • 87. Page | 87 ,date_dim where d_year = 2000 and d_moy = 5 and cs_sold_date_sk = d_date_sk and cs_item_sk in (select item_sk from frequent_ss_items) and cs_bill_customer_sk in (select c_customer_sk from best_ss_customer)) union all (select ws_quantity*ws_list_price sales from web_sales ,date_dim where d_year = 2000 and d_moy = 5 and ws_sold_date_sk = d_date_sk and ws_item_sk in (select item_sk from frequent_ss_items) and ws_bill_customer_sk in (select c_customer_sk from best_ss_customer))) y fetch first 100 rows only; with frequent_ss_items as (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date solddate,count(*) cnt from store_sales ,date_dim ,item where ss_sold_date_sk = d_date_sk and ss_item_sk = i_item_sk and d_year in (2000,2000 + 1,2000 + 2,2000 + 3) group by substr(i_item_desc,1,30),i_item_sk,d_date having count(*) >4), max_store_sales as (select max(csales) tpcds_cmax from (select c_customer_sk,sum(ss_quantity*ss_sales_price) csales from store_sales ,customer ,date_dim where ss_customer_sk = c_customer_sk and ss_sold_date_sk = d_date_sk and d_year in (2000,2000+1,2000+2,2000+3) group by c_customer_sk) x), best_ss_customer as (select c_customer_sk,sum(ss_quantity*ss_sales_price) ssales from store_sales ,customer where ss_customer_sk = c_customer_sk group by c_customer_sk having sum(ss_quantity*ss_sales_price) > (95/100.0) * (select * from max_store_sales)) select c_last_name,c_first_name,sales from ((select c_last_name,c_first_name,sum(cs_quantity*cs_list_price) sales from catalog_sales ,customer ,date_dim where d_year = 2000 and d_moy = 5 and cs_sold_date_sk = d_date_sk and cs_item_sk in (select item_sk from frequent_ss_items) and cs_bill_customer_sk in (select c_customer_sk from best_ss_customer) and cs_bill_customer_sk = c_customer_sk group by c_last_name,c_first_name) union all (select c_last_name,c_first_name,sum(ws_quantity*ws_list_price) sales from web_sales ,customer ,date_dim where d_year = 2000 and d_moy = 5 and ws_sold_date_sk = d_date_sk and ws_item_sk in (select item_sk from frequent_ss_items) and ws_bill_customer_sk in (select c_customer_sk from best_ss_customer) and ws_bill_customer_sk = c_customer_sk group by c_last_name,c_first_name)) y order by c_last_name,c_first_name,sales fetch first 100 rows only; -- end query 68 in stream 0 using template query23.tpl -- start query 69 in stream 0 using template query14.tpl and seed 290166045 with cross_items as (select i_item_sk ss_item_sk from item, (select iss.i_brand_id brand_id ,iss.i_class_id class_id ,iss.i_category_id category_id
  • 88. Page | 88 from store_sales ,item iss ,date_dim d1 where ss_item_sk = iss.i_item_sk and ss_sold_date_sk = d1.d_date_sk and d1.d_year between 1999 AND 1999 + 2 intersect select ics.i_brand_id ,ics.i_class_id ,ics.i_category_id from catalog_sales ,item ics ,date_dim d2 where cs_item_sk = ics.i_item_sk and cs_sold_date_sk = d2.d_date_sk and d2.d_year between 1999 AND 1999 + 2 intersect select iws.i_brand_id ,iws.i_class_id ,iws.i_category_id from web_sales ,item iws ,date_dim d3 where ws_item_sk = iws.i_item_sk and ws_sold_date_sk = d3.d_date_sk and d3.d_year between 1999 AND 1999 + 2) x where i_brand_id = brand_id and i_class_id = class_id and i_category_id = category_id ), avg_sales as (select avg(quantity*list_price) average_sales from (select ss_quantity quantity ,ss_list_price list_price from store_sales ,date_dim where ss_sold_date_sk = d_date_sk and d_year between 1999 and 2001 union all select cs_quantity quantity ,cs_list_price list_price from catalog_sales ,date_dim where cs_sold_date_sk = d_date_sk and d_year between 1998 and 1998 + 2 union all select ws_quantity quantity ,ws_list_price list_price from web_sales ,date_dim where ws_sold_date_sk = d_date_sk and d_year between 1998 and 1998 + 2) x) select channel, i_brand_id,i_class_id,i_category_id,sum(sales), sum(number_sales) from( select 'store' channel, i_brand_id,i_class_id ,i_category_id,sum(ss_quantity*ss_list_price) sales , count(*) number_sales from store_sales ,item ,date_dim where ss_item_sk in (select ss_item_sk from cross_items) and ss_item_sk = i_item_sk and ss_sold_date_sk = d_date_sk and d_year = 1998+2 and d_moy = 11 group by i_brand_id,i_class_id,i_category_id having sum(ss_quantity*ss_list_price) > (select average_sales from avg_sales) union all select 'catalog' channel, i_brand_id,i_class_id,i_category_id, sum(cs_quantity*cs_list_price) sales, count(*) number_sales from catalog_sales ,item ,date_dim where cs_item_sk in (select ss_item_sk from cross_items) and cs_item_sk = i_item_sk and cs_sold_date_sk = d_date_sk and d_year = 1998+2 and d_moy = 11 group by i_brand_id,i_class_id,i_category_id having sum(cs_quantity*cs_list_price) > (select average_sales from avg_sales) union all select 'web' channel, i_brand_id,i_class_id,i_category_id, sum(ws_quantity*ws_list_price) sales , count(*) number_sales from web_sales ,item
  • 89. Page | 89 ,date_dim where ws_item_sk in (select ss_item_sk from cross_items) and ws_item_sk = i_item_sk and ws_sold_date_sk = d_date_sk and d_year = 1998+2 and d_moy = 11 group by i_brand_id,i_class_id,i_category_id having sum(ws_quantity*ws_list_price) > (select average_sales from avg_sales) ) y group by rollup (channel, i_brand_id,i_class_id,i_category_id) order by channel,i_brand_id,i_class_id,i_category_id fetch first 100 rows only; with cross_items as (select i_item_sk ss_item_sk from item, (select iss.i_brand_id brand_id ,iss.i_class_id class_id ,iss.i_category_id category_id from store_sales ,item iss ,date_dim d1 where ss_item_sk = iss.i_item_sk and ss_sold_date_sk = d1.d_date_sk and d1.d_year between 1999 AND 1999 + 2 intersect select ics.i_brand_id ,ics.i_class_id ,ics.i_category_id from catalog_sales ,item ics ,date_dim d2 where cs_item_sk = ics.i_item_sk and cs_sold_date_sk = d2.d_date_sk and d2.d_year between 1999 AND 1999 + 2 intersect select iws.i_brand_id ,iws.i_class_id ,iws.i_category_id from web_sales ,item iws ,date_dim d3 where ws_item_sk = iws.i_item_sk and ws_sold_date_sk = d3.d_date_sk and d3.d_year between 1999 AND 1999 + 2) x where i_brand_id = brand_id and i_class_id = class_id and i_category_id = category_id ), avg_sales as (select avg(quantity*list_price) average_sales from (select ss_quantity quantity ,ss_list_price list_price from store_sales ,date_dim where ss_sold_date_sk = d_date_sk and d_year between 1998 and 1998 + 2 union all select cs_quantity quantity ,cs_list_price list_price from catalog_sales ,date_dim where cs_sold_date_sk = d_date_sk and d_year between 1998 and 1998 + 2 union all select ws_quantity quantity ,ws_list_price list_price from web_sales ,date_dim where ws_sold_date_sk = d_date_sk and d_year between 1998 and 1998 + 2) x) select * from (select 'store' channel, i_brand_id,i_class_id,i_category_id ,sum(ss_quantity*ss_list_price) sales, count(*) number_sales from store_sales ,item ,date_dim where ss_item_sk in (select ss_item_sk from cross_items) and ss_item_sk = i_item_sk and ss_sold_date_sk = d_date_sk and d_week_seq = (select d_week_seq from date_dim where d_year = 1998 + 1 and d_moy = 12 and d_dom = 18) group by i_brand_id,i_class_id,i_category_id having sum(ss_quantity*ss_list_price) > (select average_sales from avg_sales)) this_year, (select 'store' channel, i_brand_id,i_class_id
  • 90. Page | 90 ,i_category_id, sum(ss_quantity*ss_list_price) sales, count(*) number_sales from store_sales ,item ,date_dim where ss_item_sk in (select ss_item_sk from cross_items) and ss_item_sk = i_item_sk and ss_sold_date_sk = d_date_sk and d_week_seq = (select d_week_seq from date_dim where d_year = 1998 and d_moy = 12 and d_dom = 18) group by i_brand_id,i_class_id,i_category_id having sum(ss_quantity*ss_list_price) > (select average_sales from avg_sales)) last_year where this_year.i_brand_id= last_year.i_brand_id and this_year.i_class_id = last_year.i_class_id and this_year.i_category_id = last_year.i_category_id order by this_year.channel, this_year.i_brand_id, this_year.i_class_id, this_year.i_category_id fetch first 100 rows only; -- end query 69 in stream 0 using template query14.tpl -- start query 70 in stream 0 using template query57.tpl and seed 1980588756 with v1 as( select i_category, i_brand, cc_name, d_year, d_moy, sum(cs_sales_price) sum_sales, avg(sum(cs_sales_price)) over (partition by i_category, i_brand, cc_name, d_year) avg_monthly_sales, rank() over (partition by i_category, i_brand, cc_name order by d_year, d_moy) rn from item, catalog_sales, date_dim, call_center where cs_item_sk = i_item_sk and cs_sold_date_sk = d_date_sk and cc_call_center_sk= cs_call_center_sk and ( d_year = 1999 or ( d_year = 1999-1 and d_moy =12) or ( d_year = 1999+1 and d_moy =1) ) group by i_category, i_brand, cc_name , d_year, d_moy), v2 as( select v1.i_category, v1.i_brand ,v1.d_year, v1.d_moy ,v1.avg_monthly_sales ,v1.sum_sales, v1_lag.sum_sales psum, v1_lead.sum_sales nsum from v1, v1 v1_lag, v1 v1_lead where v1.i_category = v1_lag.i_category and v1.i_category = v1_lead.i_category and v1.i_brand = v1_lag.i_brand and v1.i_brand = v1_lead.i_brand and v1. cc_name = v1_lag. cc_name and v1. cc_name = v1_lead. cc_name and v1.rn = v1_lag.rn + 1 and v1.rn = v1_lead.rn - 1) select * from v2 where d_year = 1999 and avg_monthly_sales > 0 and case when avg_monthly_sales > 0 then abs(sum_sales - avg_monthly_sales) / avg_monthly_sales else null end > 0.1 order by sum_sales - avg_monthly_sales, 3 fetch first 100 rows only; -- end query 70 in stream 0 using template query57.tpl -- start query 71 in stream 0 using template query65.tpl and seed 398283436 select s_store_name, i_item_desc, sc.revenue, i_current_price, i_wholesale_cost, i_brand from store, item, (select ss_store_sk, avg(revenue) as ave from (select ss_store_sk, ss_item_sk, sum(ss_sales_price) as revenue from store_sales, date_dim where ss_sold_date_sk = d_date_sk and d_month_seq between 1223 and 1223+11
  • 91. Page | 91 group by ss_store_sk, ss_item_sk) sa group by ss_store_sk) sb, (select ss_store_sk, ss_item_sk, sum(ss_sales_price) as revenue from store_sales, date_dim where ss_sold_date_sk = d_date_sk and d_month_seq between 1223 and 1223+11 group by ss_store_sk, ss_item_sk) sc where sb.ss_store_sk = sc.ss_store_sk and sc.revenue <= 0.1 * sb.ave and s_store_sk = sc.ss_store_sk and i_item_sk = sc.ss_item_sk order by s_store_name, i_item_desc fetch first 100 rows only; -- end query 71 in stream 0 using template query65.tpl -- start query 72 in stream 0 using template query71.tpl and seed 2112533127 select i_brand_id brand_id, i_brand brand,t_hour,t_minute, sum(ext_price) ext_price from item, (select ws_ext_sales_price as ext_price, ws_sold_date_sk as sold_date_sk, ws_item_sk as sold_item_sk, ws_sold_time_sk as time_sk from web_sales,date_dim where d_date_sk = ws_sold_date_sk and d_moy=11 and d_year=2000 union all select cs_ext_sales_price as ext_price, cs_sold_date_sk as sold_date_sk, cs_item_sk as sold_item_sk, cs_sold_time_sk as time_sk from catalog_sales,date_dim where d_date_sk = cs_sold_date_sk and d_moy=11 and d_year=2000 union all select ss_ext_sales_price as ext_price, ss_sold_date_sk as sold_date_sk, ss_item_sk as sold_item_sk, ss_sold_time_sk as time_sk from store_sales,date_dim where d_date_sk = ss_sold_date_sk and d_moy=11 and d_year=2000 ) as tmp,time_dim where sold_item_sk = i_item_sk and i_manager_id=1 and time_sk = t_time_sk and (t_meal_time = 'breakfast' or t_meal_time = 'dinner') group by i_brand, i_brand_id,t_hour,t_minute order by ext_price desc, i_brand_id ; -- end query 72 in stream 0 using template query71.tpl -- start query 73 in stream 0 using template query34.tpl and seed 1754860092 select c_last_name ,c_first_name ,c_salutation ,c_preferred_cust_flag ,ss_ticket_number ,cnt from (select ss_ticket_number ,ss_customer_sk ,count(*) cnt from store_sales,date_dim,store,household_demographics where store_sales.ss_sold_date_sk = date_dim.d_date_sk and store_sales.ss_store_sk = store.s_store_sk and store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk and (date_dim.d_dom between 1 and 3 or date_dim.d_dom between 25 and 28) and (household_demographics.hd_buy_potential = '1001-5000' or household_demographics.hd_buy_potential = '5001-10000') and household_demographics.hd_vehicle_count > 0 and (case when household_demographics.hd_vehicle_count > 0 then cast(household_demographics.hd_dep_count as double) / cast(household_demographics.hd_vehicle_count as double) else null end) > 1.2 and date_dim.d_year in (2000,2000+1,2000+2)
  • 92. Page | 92 and store.s_county in ('Chambers County','Perry County','Richland County','Sierra County', 'Perry County','Surry County','Wadena County','Essex County') group by ss_ticket_number,ss_customer_sk) dn,customer where ss_customer_sk = c_customer_sk and cnt between 15 and 20 order by c_last_name,c_first_name,c_salutation,c_preferred_ cust_flag desc; -- end query 73 in stream 0 using template query34.tpl -- start query 74 in stream 0 using template query48.tpl and seed 511060613 select sum (ss_quantity) from store_sales, store, customer_demographics, customer_address, date_dim where s_store_sk = ss_store_sk and ss_sold_date_sk = d_date_sk and d_year = 1999 and ( ( cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'M' and cd_education_status = 'Secondary' and ss_sales_price between 100.00 and 150.00 ) or ( cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'M' and cd_education_status = 'Secondary' and ss_sales_price between 50.00 and 100.00 ) or ( cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'M' and cd_education_status = 'Secondary' and ss_sales_price between 150.00 and 200.00 ) ) and ( ( ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('TX', 'MN', 'GA') and ss_net_profit between 0 and 2000 ) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('AR', 'NJ', 'OH') and ss_net_profit between 150 and 3000 ) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('VA', 'ID', 'KY') and ss_net_profit between 50 and 25000 ) ) ; -- end query 74 in stream 0 using template query48.tpl -- start query 75 in stream 0 using template query30.tpl and seed 1376774683 with customer_total_return as (select wr_returning_customer_sk as ctr_customer_sk ,ca_state as ctr_state, sum(wr_return_amt) as ctr_total_return from web_returns ,date_dim ,customer_address where wr_returned_date_sk = d_date_sk
  • 93. Page | 93 and d_year =2001 and wr_returning_addr_sk = ca_address_sk group by wr_returning_customer_sk ,ca_state) select c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag ,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address ,c_last_review_date,ctr_total_return from customer_total_return ctr1 ,customer_address ,customer where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2 from customer_total_return ctr2 where ctr1.ctr_state = ctr2.ctr_state) and ca_address_sk = c_current_addr_sk and ca_state = 'NE' and ctr1.ctr_customer_sk = c_customer_sk order by c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag ,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address ,c_last_review_date,ctr_total_return fetch first 100 rows only; -- end query 75 in stream 0 using template query30.tpl -- start query 76 in stream 0 using template query74.tpl and seed 860979922 with year_total as ( select c_customer_id customer_id ,c_first_name customer_first_name ,c_last_name customer_last_name ,d_year as year ,stddev_samp(ss_net_paid) year_total ,'s' sale_type from customer ,store_sales ,date_dim where c_customer_sk = ss_customer_sk and ss_sold_date_sk = d_date_sk and d_year in (2000,2000+1) group by c_customer_id ,c_first_name ,c_last_name ,d_year union all select c_customer_id customer_id ,c_first_name customer_first_name ,c_last_name customer_last_name ,d_year as year ,stddev_samp(ws_net_paid) year_total ,'w' sale_type from customer ,web_sales ,date_dim where c_customer_sk = ws_bill_customer_sk and ws_sold_date_sk = d_date_sk and d_year in (2000,2000+1) group by c_customer_id ,c_first_name ,c_last_name ,d_year ) select t_s_secyear.customer_id, t_s_secyear.customer_first_name, t_s_secyear.customer_last_name from year_total t_s_firstyear ,year_total t_s_secyear ,year_total t_w_firstyear ,year_total t_w_secyear where t_s_secyear.customer_id = t_s_firstyear.customer_id and t_s_firstyear.customer_id = t_w_secyear.customer_id and t_s_firstyear.customer_id = t_w_firstyear.customer_id and t_s_firstyear.sale_type = 's' and t_w_firstyear.sale_type = 'w' and t_s_secyear.sale_type = 's' and t_w_secyear.sale_type = 'w' and t_s_firstyear.year = 2000 and t_s_secyear.year = 2000+1 and t_w_firstyear.year = 2000 and t_w_secyear.year = 2000+1 and t_s_firstyear.year_total > 0 and t_w_firstyear.year_total > 0 and case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else null end > case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else null end order by 3,2,1
  • 94. Page | 94 fetch first 100 rows only; -- end query 76 in stream 0 using template query74.tpl -- start query 77 in stream 0 using template query87.tpl and seed 1235996660 select count(*) from ((select distinct c_last_name, c_first_name, d_date from store_sales, date_dim, customer where store_sales.ss_sold_date_sk = date_dim.d_date_sk and store_sales.ss_customer_sk = customer.c_customer_sk and d_month_seq between 1179 and 1179+11) except (select distinct c_last_name, c_first_name, d_date from catalog_sales, date_dim, customer where catalog_sales.cs_sold_date_sk = date_dim.d_date_sk and catalog_sales.cs_bill_customer_sk = customer.c_customer_sk and d_month_seq between 1179 and 1179+11) except (select distinct c_last_name, c_first_name, d_date from web_sales, date_dim, customer where web_sales.ws_sold_date_sk = date_dim.d_date_sk and web_sales.ws_bill_customer_sk = customer.c_customer_sk and d_month_seq between 1179 and 1179+11) ) cool_cust ; -- end query 77 in stream 0 using template query87.tpl -- start query 78 in stream 0 using template query77.tpl and seed 1736758238 with ss as (select s_store_sk, sum(ss_ext_sales_price) as sales, sum(ss_net_profit) as profit from store_sales, date_dim, store where ss_sold_date_sk = d_date_sk and d_date between cast('2002-08-25' as date) and (cast('2002-08-25' as date) + 30 days) and ss_store_sk = s_store_sk group by s_store_sk) , sr as (select s_store_sk, sum(sr_return_amt) as returns, sum(sr_net_loss) as profit_loss from store_returns, date_dim, store where sr_returned_date_sk = d_date_sk and d_date between cast('2002-08-25' as date) and (cast('2002-08-25' as date) + 30 days) and sr_store_sk = s_store_sk group by s_store_sk), cs as (select cs_call_center_sk, sum(cs_ext_sales_price) as sales, sum(cs_net_profit) as profit from catalog_sales, date_dim where cs_sold_date_sk = d_date_sk and d_date between cast('2002-08-25' as date) and (cast('2002-08-25' as date) + 30 days) group by cs_call_center_sk ), cr as (select sum(cr_return_amount) as returns, sum(cr_net_loss) as profit_loss from catalog_returns, date_dim where cr_returned_date_sk = d_date_sk and d_date between cast('2002-08-25' as date) and (cast('2002-08-25' as date) + 30 days) ), ws as ( select wp_web_page_sk, sum(ws_ext_sales_price) as sales, sum(ws_net_profit) as profit from web_sales, date_dim, web_page
  • 95. Page | 95 where ws_sold_date_sk = d_date_sk and d_date between cast('2002-08-25' as date) and (cast('2002-08-25' as date) + 30 days) and ws_web_page_sk = wp_web_page_sk group by wp_web_page_sk), wr as (select wp_web_page_sk, sum(wr_return_amt) as returns, sum(wr_net_loss) as profit_loss from web_returns, date_dim, web_page where wr_returned_date_sk = d_date_sk and d_date between cast('2002-08-25' as date) and (cast('2002-08-25' as date) + 30 days) and wr_web_page_sk = wp_web_page_sk group by wp_web_page_sk) select channel , id , sum(sales) as sales , sum(returns) as returns , sum(profit) as profit from (select 'store channel' as channel , ss.s_store_sk as id , sales , coalesce(returns, 0) as returns , (profit - coalesce(profit_loss,0)) as profit from ss left join sr on ss.s_store_sk = sr.s_store_sk union all select 'catalog channel' as channel , cs_call_center_sk as id , sales , returns , (profit - profit_loss) as profit from cs , cr union all select 'web channel' as channel , ws.wp_web_page_sk as id , sales , coalesce(returns, 0) returns , (profit - coalesce(profit_loss,0)) as profit from ws left join wr on ws.wp_web_page_sk = wr.wp_web_page_sk ) x group by rollup (channel, id) order by channel ,id fetch first 100 rows only; -- end query 78 in stream 0 using template query77.tpl -- start query 79 in stream 0 using template query73.tpl and seed 1878398230 select c_last_name ,c_first_name ,c_salutation ,c_preferred_cust_flag ,ss_ticket_number ,cnt from (select ss_ticket_number ,ss_customer_sk ,count(*) cnt from store_sales,date_dim,store,household_demographics where store_sales.ss_sold_date_sk = date_dim.d_date_sk and store_sales.ss_store_sk = store.s_store_sk and store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk and date_dim.d_dom between 1 and 2 and (household_demographics.hd_buy_potential = '>10000' or household_demographics.hd_buy_potential = '0-500') and household_demographics.hd_vehicle_count > 0 and case when household_demographics.hd_vehicle_count > 0 then household_demographics.hd_dep_count/ household_demographics.hd_vehicle_count else null end > 1 and date_dim.d_year in (2000,2000+1,2000+2) and store.s_county in ('Red River Parish','Wilkinson County','Cocke County','Jack County') group by ss_ticket_number,ss_customer_sk) dj,customer where ss_customer_sk = c_customer_sk and cnt between 1 and 5 order by cnt desc;
  • 96. Page | 96 -- end query 79 in stream 0 using template query73.tpl -- start query 80 in stream 0 using template query84.tpl and seed 915340114 select c_customer_id as customer_id ,c_last_name || ', ' || coalesce(c_first_name,'') as customername from customer ,customer_address ,customer_demographics ,household_demographics ,income_band ,store_returns where ca_city = 'Salem' and c_current_addr_sk = ca_address_sk and ib_lower_bound >= 55893 and ib_upper_bound <= 55893 + 50000 and ib_income_band_sk = hd_income_band_sk and cd_demo_sk = c_current_cdemo_sk and hd_demo_sk = c_current_hdemo_sk and sr_cdemo_sk = cd_demo_sk order by c_customer_id fetch first 100 rows only; -- end query 80 in stream 0 using template query84.tpl -- start query 81 in stream 0 using template query54.tpl and seed 87886004 with my_customers as ( select distinct c_customer_sk , c_current_addr_sk from ( select cs_sold_date_sk sold_date_sk, cs_bill_customer_sk customer_sk, cs_item_sk item_sk from catalog_sales union all select ws_sold_date_sk sold_date_sk, ws_bill_customer_sk customer_sk, ws_item_sk item_sk from web_sales ) cs_or_ws_sales, item, date_dim, customer where sold_date_sk = d_date_sk and item_sk = i_item_sk and i_category = 'Sports' and i_class = 'outdoor' and c_customer_sk = cs_or_ws_sales.customer_sk and d_moy = 7 and d_year = 2002 ) , my_revenue as ( select c_customer_sk, sum(ss_ext_sales_price) as revenue from my_customers, store_sales, customer_address, store, date_dim where c_current_addr_sk = ca_address_sk and ca_county = s_county and ca_state = s_state and ss_sold_date_sk = d_date_sk and c_customer_sk = ss_customer_sk and d_month_seq between (select distinct d_month_seq+1 from date_dim where d_year = 2002 and d_moy = 7) and (select distinct d_month_seq+3 from date_dim where d_year = 2002 and d_moy = 7) group by c_customer_sk ) , segments as (select cast((revenue/50) as int) as segment from my_revenue ) select segment, count(*) as num_customers, segment*50 as segment_base from segments group by segment order by segment, num_customers fetch first 100 rows only; -- end query 81 in stream 0 using template query54.tpl -- start query 82 in stream 0 using template query55.tpl and seed 792107888 select i_brand_id brand_id, i_brand brand, sum(ss_ext_sales_price) ext_price from date_dim, store_sales, item where d_date_sk = ss_sold_date_sk and ss_item_sk = i_item_sk
  • 97. Page | 97 and i_manager_id=40 and d_moy=11 and d_year=2001 group by i_brand, i_brand_id order by ext_price desc, i_brand_id fetch first 100 rows only ; -- end query 82 in stream 0 using template query55.tpl -- start query 83 in stream 0 using template query56.tpl and seed 912538819 with ss as ( select i_item_id,sum(ss_ext_sales_price) total_sales from store_sales, date_dim, customer_address, item where i_item_id in (select i_item_id from item where i_color in ('magenta','tan','turquoise')) and ss_item_sk = i_item_sk and ss_sold_date_sk = d_date_sk and d_year = 2002 and d_moy = 4 and ss_addr_sk = ca_address_sk and ca_gmt_offset = -6 group by i_item_id), cs as ( select i_item_id,sum(cs_ext_sales_price) total_sales from catalog_sales, date_dim, customer_address, item where i_item_id in (select i_item_id from item where i_color in ('magenta','tan','turquoise')) and cs_item_sk = i_item_sk and cs_sold_date_sk = d_date_sk and d_year = 2002 and d_moy = 4 and cs_bill_addr_sk = ca_address_sk and ca_gmt_offset = -6 group by i_item_id), ws as ( select i_item_id,sum(ws_ext_sales_price) total_sales from web_sales, date_dim, customer_address, item where i_item_id in (select i_item_id from item where i_color in ('magenta','tan','turquoise')) and ws_item_sk = i_item_sk and ws_sold_date_sk = d_date_sk and d_year = 2002 and d_moy = 4 and ws_bill_addr_sk = ca_address_sk and ca_gmt_offset = -6 group by i_item_id) select i_item_id ,sum(total_sales) total_sales from (select * from ss union all select * from cs union all select * from ws) tmp1 group by i_item_id order by total_sales fetch first 100 rows only; -- end query 83 in stream 0 using template query56.tpl -- start query 84 in stream 0 using template query2.tpl and seed 1816850892 with wscs as (select sold_date_sk ,sales_price from (select ws_sold_date_sk sold_date_sk ,ws_ext_sales_price sales_price from web_sales) x union all (select cs_sold_date_sk sold_date_sk ,cs_ext_sales_price sales_price from catalog_sales)), wswscs as
  • 98. Page | 98 (select d_week_seq, sum(case when (d_day_name='Sunday') then sales_price else null end) sun_sales, sum(case when (d_day_name='Monday') then sales_price else null end) mon_sales, sum(case when (d_day_name='Tuesday') then sales_price else null end) tue_sales, sum(case when (d_day_name='Wednesday') then sales_price else null end) wed_sales, sum(case when (d_day_name='Thursday') then sales_price else null end) thu_sales, sum(case when (d_day_name='Friday') then sales_price else null end) fri_sales, sum(case when (d_day_name='Saturday') then sales_price else null end) sat_sales from wscs ,date_dim where d_date_sk = sold_date_sk group by d_week_seq) select d_week_seq1 ,round(sun_sales1/sun_sales2,2) ,round(mon_sales1/mon_sales2,2) ,round(tue_sales1/tue_sales2,2) ,round(wed_sales1/wed_sales2,2) ,round(thu_sales1/thu_sales2,2) ,round(fri_sales1/fri_sales2,2) ,round(sat_sales1/sat_sales2,2) from (select wswscs.d_week_seq d_week_seq1 ,sun_sales sun_sales1 ,mon_sales mon_sales1 ,tue_sales tue_sales1 ,wed_sales wed_sales1 ,thu_sales thu_sales1 ,fri_sales fri_sales1 ,sat_sales sat_sales1 from wswscs,date_dim where date_dim.d_week_seq = wswscs.d_week_seq and d_year = 2000) y, (select wswscs.d_week_seq d_week_seq2 ,sun_sales sun_sales2 ,mon_sales mon_sales2 ,tue_sales tue_sales2 ,wed_sales wed_sales2 ,thu_sales thu_sales2 ,fri_sales fri_sales2 ,sat_sales sat_sales2 from wswscs ,date_dim where date_dim.d_week_seq = wswscs.d_week_seq and d_year = 2000+1) z where d_week_seq1=d_week_seq2-53 order by d_week_seq1; -- end query 84 in stream 0 using template query2.tpl -- start query 85 in stream 0 using template query26.tpl and seed 646470989 select i_item_id, avg(cast(cs_quantity as double)) agg1, avg(cs_list_price) agg2, avg(cs_coupon_amt) agg3, avg(cs_sales_price) agg4 from catalog_sales, customer_demographics, date_dim, item, promotion where cs_sold_date_sk = d_date_sk and cs_item_sk = i_item_sk and cs_bill_cdemo_sk = cd_demo_sk and cs_promo_sk = p_promo_sk and cd_gender = 'F' and cd_marital_status = 'M' and cd_education_status = '2 yr Degree' and (p_channel_email = 'N' or p_channel_event = 'N') and d_year = 1999 group by i_item_id order by i_item_id fetch first 100 rows only; -- end query 85 in stream 0 using template query26.tpl -- start query 86 in stream 0 using template query40.tpl and seed 2129842400 select w_state ,i_item_id ,sum(case when (cast(d_date as date) < cast ('1998-06-27' as date)) then cs_sales_price - coalesce(cr_refunded_cash,0) else 0 end) as sales_before ,sum(case when (cast(d_date as date) >= cast ('1998-06-27' as date)) then cs_sales_price - coalesce(cr_refunded_cash,0) else 0 end) as sales_after from catalog_sales left outer join catalog_returns on
  • 99. Page | 99 (cs_order_number = cr_order_number and cs_item_sk = cr_item_sk) ,warehouse ,item ,date_dim where i_current_price between 0.99 and 1.49 and i_item_sk = cs_item_sk and cs_warehouse_sk = w_warehouse_sk and cs_sold_date_sk = d_date_sk and d_date between (cast ('1998-06-27' as date) - 30 days) and (cast ('1998-06-27' as date) + 30 days) group by w_state,i_item_id order by w_state,i_item_id fetch first 100 rows only; -- end query 86 in stream 0 using template query40.tpl -- start query 87 in stream 0 using template query72.tpl and seed 1882576156 select i_item_desc ,w_warehouse_name ,d1.d_week_seq ,count(case when p_promo_sk is null then 1 else 0 end) no_promo ,count(case when p_promo_sk is not null then 1 else 0 end) promo ,count(*) total_cnt from catalog_sales join inventory on (cs_item_sk = inv_item_sk) join warehouse on (w_warehouse_sk=inv_warehouse_sk) join item on (i_item_sk = cs_item_sk) join customer_demographics on (cs_bill_cdemo_sk = cd_demo_sk) join household_demographics on (cs_bill_hdemo_sk = hd_demo_sk) join date_dim d1 on (cs_sold_date_sk = d1.d_date_sk) join date_dim d2 on (inv_date_sk = d2.d_date_sk) join date_dim d3 on (cs_ship_date_sk = d3.d_date_sk) left outer join promotion on (cs_promo_sk=p_promo_sk) left outer join catalog_returns on (cr_item_sk = cs_item_sk and cr_order_number = cs_order_number) where d1.d_week_seq = d2.d_week_seq and inv_quantity_on_hand < cs_quantity and d3.d_date > cast(d1.d_date as date) + 5 days and hd_buy_potential = '>10000' and d1.d_year = 1999 and hd_buy_potential = '>10000' and cd_marital_status = 'U' and d1.d_year = 1999 group by i_item_desc,w_warehouse_name,d1.d_week_seq order by total_cnt desc, i_item_desc, w_warehouse_name, d_week_seq fetch first 100 rows only; -- end query 87 in stream 0 using template query72.tpl -- start query 88 in stream 0 using template query53.tpl and seed 1226866603 select * from (select i_manufact_id, sum(ss_sales_price) sum_sales, avg(sum(ss_sales_price)) over (partition by i_manufact_id) avg_quarterly_sales from item, store_sales, date_dim, store where ss_item_sk = i_item_sk and ss_sold_date_sk = d_date_sk and ss_store_sk = s_store_sk and d_month_seq in (1178,1178+1,1178+2,1178+3,1178+4,1178+5,1178+6,1178+7,1178+8,1178+9,1178+10,1178+11) and ((i_category in ('Books','Children','Electronics') and i_class in ('personal','portable','reference','self-help') and i_brand in ('scholaramalgamalg #14','scholaramalgamalg #7', 'exportiunivamalg #9','scholaramalgamalg #9')) or(i_category in ('Women','Music','Men') and i_class in ('accessories','classical','fragrances','pants') and i_brand in ('amalgimporto #1','edu packscholar #1','exportiimporto #1', 'importoamalg #1'))) group by i_manufact_id, d_qoy ) tmp1 where case when avg_quarterly_sales > 0 then abs (sum_sales - avg_quarterly_sales)/ avg_quarterly_sales else null end > 0.1 order by avg_quarterly_sales, sum_sales, i_manufact_id fetch first 100 rows only;
  • 100. Page | 100 -- end query 88 in stream 0 using template query53.tpl -- start query 89 in stream 0 using template query79.tpl and seed 1825663721 select c_last_name,c_first_name,substr(s_city,1,30),ss_ticket_number,amt,profit from (select ss_ticket_number ,ss_customer_sk ,store.s_city ,sum(ss_coupon_amt) amt ,sum(ss_net_profit) profit from store_sales,date_dim,store,household_demographics where store_sales.ss_sold_date_sk = date_dim.d_date_sk and store_sales.ss_store_sk = store.s_store_sk and store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk and (household_demographics.hd_dep_count = 1 or household_demographics.hd_vehicle_count > -1) and date_dim.d_dow = 1 and date_dim.d_year in (1998,1998+1,1998+2) and store.s_number_employees between 200 and 295 group by ss_ticket_number,ss_customer_sk,ss_addr_sk,store.s_city) ms,customer where ss_customer_sk = c_customer_sk order by c_last_name,c_first_name,substr(s_city,1,30), profit fetch first 100 rows only; -- end query 89 in stream 0 using template query79.tpl -- start query 90 in stream 0 using template query18.tpl and seed 1875871513 select i_item_id, ca_country, ca_state, ca_county, avg( cast(cs_quantity as numeric(12,2))) agg1, avg( cast(cs_list_price as numeric(12,2))) agg2, avg( cast(cs_coupon_amt as numeric(12,2))) agg3, avg( cast(cs_sales_price as numeric(12,2))) agg4, avg( cast(cs_net_profit as numeric(12,2))) agg5, avg( cast(c_birth_year as numeric(12,2))) agg6, avg( cast(cd1.cd_dep_count as numeric(12,2))) agg7 from catalog_sales, customer_demographics cd1, customer_demographics cd2, customer, customer_address, date_dim, item where cs_sold_date_sk = d_date_sk and cs_item_sk = i_item_sk and cs_bill_cdemo_sk = cd1.cd_demo_sk and cs_bill_customer_sk = c_customer_sk and cd1.cd_gender = 'F' and cd1.cd_education_status = 'Advanced Degree' and c_current_cdemo_sk = cd2.cd_demo_sk and c_current_addr_sk = ca_address_sk and c_birth_month in (2,6,7,11,1,8) and d_year = 2001 and ca_state in ('OH','MT','PA' ,'WV','NC','CO','FL') group by rollup (i_item_id, ca_country, ca_state, ca_county) order by ca_country, ca_state, ca_county, i_item_id fetch first 100 rows only; -- end query 90 in stream 0 using template query18.tpl -- start query 91 in stream 0 using template query13.tpl and seed 512374064 select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales ,store ,customer_demographics ,household_demographics ,customer_address ,date_dim where s_store_sk = ss_store_sk and ss_sold_date_sk = d_date_sk and d_year = 2001 and((ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'W' and cd_education_status = 'College'
  • 101. Page | 101 and ss_sales_price between 100.00 and 150.00 and hd_dep_count = 3 )or (ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'D' and cd_education_status = 'Primary' and ss_sales_price between 50.00 and 100.00 and hd_dep_count = 1 ) or (ss_hdemo_sk=hd_demo_sk and cd_demo_sk = ss_cdemo_sk and cd_marital_status = 'S' and cd_education_status = 'Advanced Degree' and ss_sales_price between 150.00 and 200.00 and hd_dep_count = 1 )) and((ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('VA', 'WV', 'LA') and ss_net_profit between 100 and 200 ) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('NV', 'SC', 'IL') and ss_net_profit between 150 and 300 ) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('TN', 'GA', 'MA') and ss_net_profit between 50 and 250 )) ; -- end query 91 in stream 0 using template query13.tpl -- start query 92 in stream 0 using template query24.tpl and seed 1636591044 with ssales as (select c_last_name ,c_first_name ,s_store_name ,ca_state ,s_state ,i_color ,i_current_price ,i_manager_id ,i_units ,i_size ,sum(ss_net_profit) netpaid from store_sales ,store_returns ,store ,item ,customer ,customer_address where ss_ticket_number = sr_ticket_number and ss_item_sk = sr_item_sk and ss_customer_sk = c_customer_sk and ss_item_sk = i_item_sk and ss_store_sk = s_store_sk and c_birth_country = upper(ca_country) and s_zip = ca_zip and s_market_id=5 group by c_last_name ,c_first_name ,s_store_name ,ca_state ,s_state ,i_color ,i_current_price ,i_manager_id ,i_units ,i_size) select c_last_name ,c_first_name ,s_store_name ,sum(netpaid) paid from ssales where i_color = 'deep' group by c_last_name ,c_first_name ,s_store_name having sum(netpaid) > (select 0.05*avg(netpaid) from ssales) ; with ssales as (select c_last_name ,c_first_name ,s_store_name ,ca_state ,s_state ,i_color
  • 102. Page | 102 ,i_current_price ,i_manager_id ,i_units ,i_size ,sum(ss_net_profit) netpaid from store_sales ,store_returns ,store ,item ,customer ,customer_address where ss_ticket_number = sr_ticket_number and ss_item_sk = sr_item_sk and ss_customer_sk = c_customer_sk and ss_item_sk = i_item_sk and ss_store_sk = s_store_sk and c_birth_country = upper(ca_country) and s_zip = ca_zip and s_market_id = 5 group by c_last_name ,c_first_name ,s_store_name ,ca_state ,s_state ,i_color ,i_current_price ,i_manager_id ,i_units ,i_size) select c_last_name ,c_first_name ,s_store_name ,sum(netpaid) paid from ssales where i_color = 'blush' group by c_last_name ,c_first_name ,s_store_name having sum(netpaid) > (select 0.05*avg(netpaid) from ssales) ; -- end query 92 in stream 0 using template query24.tpl -- start query 93 in stream 0 using template query4.tpl and seed 48694754 with year_total as ( select c_customer_id customer_id ,c_first_name customer_first_name ,c_last_name customer_last_name ,c_preferred_cust_flag customer_preferred_cust_flag ,c_birth_country customer_birth_country ,c_login customer_login ,c_email_address customer_email_address ,d_year dyear ,sum(((ss_ext_list_price- ss_ext_wholesale_cost- ss_ext_discount_amt)+ss_ext_sales_price)/2) year_total ,'s' sale_type from customer ,store_sales ,date_dim where c_customer_sk = ss_customer_sk and ss_sold_date_sk = d_date_sk group by c_customer_id ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_country ,c_login ,c_email_address ,d_year union all select c_customer_id customer_id ,c_first_name customer_first_name ,c_last_name customer_last_name ,c_preferred_cust_flag customer_preferred_cust_flag ,c_birth_country customer_birth_country ,c_login customer_login ,c_email_address customer_email_address ,d_year dyear ,sum((((cs_ext_list_price- cs_ext_wholesale_cost- cs_ext_discount_amt)+cs_ext_sales_price)/2) ) year_total ,'c' sale_type from customer ,catalog_sales ,date_dim where c_customer_sk = cs_bill_customer_sk and cs_sold_date_sk = d_date_sk group by c_customer_id ,c_first_name
  • 103. Page | 103 ,c_last_name ,c_preferred_cust_flag ,c_birth_country ,c_login ,c_email_address ,d_year union all select c_customer_id customer_id ,c_first_name customer_first_name ,c_last_name customer_last_name ,c_preferred_cust_flag customer_preferred_cust_flag ,c_birth_country customer_birth_country ,c_login customer_login ,c_email_address customer_email_address ,d_year dyear ,sum((((ws_ext_list_price- ws_ext_wholesale_cost- ws_ext_discount_amt)+ws_ext_sales_price)/2) ) year_total ,'w' sale_type from customer ,web_sales ,date_dim where c_customer_sk = ws_bill_customer_sk and ws_sold_date_sk = d_date_sk group by c_customer_id ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_country ,c_login ,c_email_address ,d_year ) select t_s_secyear.customer_birth_country from year_total t_s_firstyear ,year_total t_s_secyear ,year_total t_c_firstyear ,year_total t_c_secyear ,year_total t_w_firstyear ,year_total t_w_secyear where t_s_secyear.customer_id = t_s_firstyear.customer_id and t_s_firstyear.customer_id = t_c_secyear.customer_id and t_s_firstyear.customer_id = t_c_firstyear.customer_id and t_s_firstyear.customer_id = t_w_firstyear.customer_id and t_s_firstyear.customer_id = t_w_secyear.customer_id and t_s_firstyear.sale_type = 's' and t_c_firstyear.sale_type = 'c' and t_w_firstyear.sale_type = 'w' and t_s_secyear.sale_type = 's' and t_c_secyear.sale_type = 'c' and t_w_secyear.sale_type = 'w' and t_s_firstyear.dyear = 2000 and t_s_secyear.dyear = 2000+1 and t_c_firstyear.dyear = 2000 and t_c_secyear.dyear = 2000+1 and t_w_firstyear.dyear = 2000 and t_w_secyear.dyear = 2000+1 and t_s_firstyear.year_total > 0 and t_c_firstyear.year_total > 0 and t_w_firstyear.year_total > 0 and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end > case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else null end and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end > case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else null end order by t_s_secyear.customer_birth_country fetch first 100 rows only; -- end query 93 in stream 0 using template query4.tpl -- start query 94 in stream 0 using template query99.tpl and seed 505379346 select substr(w_warehouse_name,1,20) ,sm_type ,cc_name ,sum(case when (cs_ship_date_sk - cs_sold_date_sk <= 30 ) then 1 else 0 end) as "30 days" ,sum(case when (cs_ship_date_sk - cs_sold_date_sk > 30) and (cs_ship_date_sk - cs_sold_date_sk <= 60) then 1 else 0 end ) as "31-60 days" ,sum(case when (cs_ship_date_sk - cs_sold_date_sk > 60) and
  • 104. Page | 104 (cs_ship_date_sk - cs_sold_date_sk <= 90) then 1 else 0 end) as "61- 90 days" ,sum(case when (cs_ship_date_sk - cs_sold_date_sk > 90) and (cs_ship_date_sk - cs_sold_date_sk <= 120) then 1 else 0 end) as "91-120 days" ,sum(case when (cs_ship_date_sk - cs_sold_date_sk > 120) then 1 else 0 end) as ">120 days" from catalog_sales ,warehouse ,ship_mode ,call_center ,date_dim where d_month_seq between 1208 and 1208 + 11 and cs_ship_date_sk = d_date_sk and cs_warehouse_sk = w_warehouse_sk and cs_ship_mode_sk = sm_ship_mode_sk and cs_call_center_sk = cc_call_center_sk group by substr(w_warehouse_name,1,20) ,sm_type ,cc_name order by substr(w_warehouse_name,1,20) ,sm_type ,cc_name fetch first 100 rows only; -- end query 94 in stream 0 using template query99.tpl -- start query 95 in stream 0 using template query68.tpl and seed 372107550 select c_last_name ,c_first_name ,ca_city ,bought_city ,ss_ticket_number ,extended_price ,extended_tax ,list_price from (select ss_ticket_number ,ss_customer_sk ,ca_city bought_city ,sum(ss_ext_sales_price) extended_price ,sum(ss_ext_list_price) list_price ,sum(ss_ext_tax) extended_tax from store_sales ,date_dim ,store ,household_demographics ,customer_address where store_sales.ss_sold_date_sk = date_dim.d_date_sk and store_sales.ss_store_sk = store.s_store_sk and store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk and store_sales.ss_addr_sk = customer_address.ca_address_sk and date_dim.d_dom between 1 and 2 and (household_demographics.hd_dep_count = 0 or household_demographics.hd_vehicle_count= -1) and date_dim.d_year in (2000,2000+1,2000+2) and store.s_city in ('Arcadia','Friendship') group by ss_ticket_number ,ss_customer_sk ,ss_addr_sk,ca_city) dn ,customer ,customer_address current_addr where ss_customer_sk = c_customer_sk and customer.c_current_addr_sk = current_addr.ca_address_sk and current_addr.ca_city <> bought_city order by c_last_name ,ss_ticket_number fetch first 100 rows only; -- end query 95 in stream 0 using template query68.tpl -- start query 96 in stream 0 using template query83.tpl and seed 1926747028 with sr_items as (select i_item_id item_id, sum(sr_return_quantity) sr_item_qty from store_returns, item, date_dim where sr_item_sk = i_item_sk and d_date in (select d_date from date_dim where d_week_seq in
  • 105. Page | 105 (select d_week_seq from date_dim where d_date in ('1999-05-19','1999-08- 02','1999-11-08'))) and sr_returned_date_sk = d_date_sk group by i_item_id), cr_items as (select i_item_id item_id, sum(cr_return_quantity) cr_item_qty from catalog_returns, item, date_dim where cr_item_sk = i_item_sk and d_date in (select d_date from date_dim where d_week_seq in (select d_week_seq from date_dim where d_date in ('1999-05-19','1999-08- 02','1999-11-08'))) and cr_returned_date_sk = d_date_sk group by i_item_id), wr_items as (select i_item_id item_id, sum(wr_return_quantity) wr_item_qty from web_returns, item, date_dim where wr_item_sk = i_item_sk and d_date in (select d_date from date_dim where d_week_seq in (select d_week_seq from date_dim where d_date in ('1999-05- 19','1999-08-02','1999-11-08'))) and wr_returned_date_sk = d_date_sk group by i_item_id) select sr_items.item_id ,sr_item_qty ,cast(sr_item_qty as double)/(cast(sr_item_qty+cr_item_qty+wr_item_qty as double))/3.0 * 100 sr_dev ,cr_item_qty ,cast(cr_item_qty as double)/(cast(sr_item_qty+cr_item_qty+wr_item_qty as double))/3.0 * 100 cr_dev ,wr_item_qty ,cast(wr_item_qty as double)/(cast(sr_item_qty+cr_item_qty+wr_item_qty as double))/3.0 * 100 wr_dev ,(sr_item_qty+cr_item_qty+wr_item_qty)/3.0 average from sr_items ,cr_items ,wr_items where sr_items.item_id=cr_items.item_id and sr_items.item_id=wr_items.item_id order by sr_items.item_id ,sr_item_qty fetch first 100 rows only; -- end query 96 in stream 0 using template query83.tpl -- start query 97 in stream 0 using template query61.tpl and seed 1235477058 select promotions,total,cast(promotions as decimal(15,4))/cast(total as decimal(15,4))*100 from (select sum(ss_ext_sales_price) promotions from store_sales ,store ,promotion ,date_dim ,customer ,customer_address ,item where ss_sold_date_sk = d_date_sk and ss_store_sk = s_store_sk and ss_promo_sk = p_promo_sk and ss_customer_sk= c_customer_sk and ca_address_sk = c_current_addr_sk and ss_item_sk = i_item_sk and ca_gmt_offset = -7 and i_category = 'Jewelry' and (p_channel_dmail = 'Y' or p_channel_email = 'Y' or p_channel_tv = 'Y') and s_gmt_offset = -7 and d_year = 2001 and d_moy = 11) promotional_sales, (select sum(ss_ext_sales_price) total from store_sales ,store ,date_dim ,customer ,customer_address
  • 106. Page | 106 ,item where ss_sold_date_sk = d_date_sk and ss_store_sk = s_store_sk and ss_customer_sk= c_customer_sk and ca_address_sk = c_current_addr_sk and ss_item_sk = i_item_sk and ca_gmt_offset = -7 and i_category = 'Jewelry' and s_gmt_offset = -7 and d_year = 2001 and d_moy = 11) all_sales order by promotions, total fetch first 100 rows only; -- end query 97 in stream 0 using template query61.tpl -- start query 98 in stream 0 using template query5.tpl and seed 1097248849 with ssr as (select s_store_id, sum(sales_price) as sales, sum(profit) as profit, sum(return_amt) as returns, sum(net_loss) as profit_loss from ( select ss_store_sk as store_sk, ss_sold_date_sk as date_sk, ss_ext_sales_price as sales_price, ss_net_profit as profit, cast(0 as decimal(7,2)) as return_amt, cast(0 as decimal(7,2)) as net_loss from store_sales union all select sr_store_sk as store_sk, sr_returned_date_sk as date_sk, cast(0 as decimal(7,2)) as sales_price, cast(0 as decimal(7,2)) as profit, sr_return_amt as return_amt, sr_net_loss as net_loss from store_returns ) salesreturns, date_dim, store where date_sk = d_date_sk and d_date between cast('2001-08-21' as date) and (cast('2001-08-21' as date) + 14 days) and store_sk = s_store_sk group by s_store_id) , csr as (select cp_catalog_page_id, sum(sales_price) as sales, sum(profit) as profit, sum(return_amt) as returns, sum(net_loss) as profit_loss from ( select cs_catalog_page_sk as page_sk, cs_sold_date_sk as date_sk, cs_ext_sales_price as sales_price, cs_net_profit as profit, cast(0 as decimal(7,2)) as return_amt, cast(0 as decimal(7,2)) as net_loss from catalog_sales union all select cr_catalog_page_sk as page_sk, cr_returned_date_sk as date_sk, cast(0 as decimal(7,2)) as sales_price, cast(0 as decimal(7,2)) as profit, cr_return_amount as return_amt, cr_net_loss as net_loss from catalog_returns ) salesreturns, date_dim, catalog_page where date_sk = d_date_sk and d_date between cast('2001-08-21' as date) and (cast('2001-08-21' as date) + 14 days) and page_sk = cp_catalog_page_sk group by cp_catalog_page_id) , wsr as (select web_site_id, sum(sales_price) as sales, sum(profit) as profit, sum(return_amt) as returns, sum(net_loss) as profit_loss from ( select ws_web_site_sk as wsr_web_site_sk, ws_sold_date_sk as date_sk, ws_ext_sales_price as sales_price, ws_net_profit as profit,
  • 107. Page | 107 cast(0 as decimal(7,2)) as return_amt, cast(0 as decimal(7,2)) as net_loss from web_sales union all select ws_web_site_sk as wsr_web_site_sk, wr_returned_date_sk as date_sk, cast(0 as decimal(7,2)) as sales_price, cast(0 as decimal(7,2)) as profit, wr_return_amt as return_amt, wr_net_loss as net_loss from web_returns left outer join web_sales on ( wr_item_sk = ws_item_sk and wr_order_number = ws_order_number) ) salesreturns, date_dim, web_site where date_sk = d_date_sk and d_date between cast('2001-08-21' as date) and (cast('2001-08-21' as date) + 14 days) and wsr_web_site_sk = web_site_sk group by web_site_id) select channel , id , sum(sales) as sales , sum(returns) as returns , sum(profit) as profit from (select 'store channel' as channel , 'store' || s_store_id as id , sales , returns , (profit - profit_loss) as profit from ssr union all select 'catalog channel' as channel , 'catalog_page' || cp_catalog_page_id as id , sales , returns , (profit - profit_loss) as profit from csr union all select 'web channel' as channel , 'web_site' || web_site_id as id , sales , returns , (profit - profit_loss) as profit from wsr ) x group by rollup (channel, id) order by channel ,id fetch first 100 rows only; -- end query 98 in stream 0 using template query5.tpl -- start query 99 in stream 0 using template query76.tpl and seed 164871687 select channel, col_name, d_year, d_qoy, i_category, COUNT(*) sales_cnt, SUM(ext_sales_price) sales_amt FROM ( SELECT 'store' as channel, 'ss_addr_sk' col_name, d_year, d_qoy, i_category, ss_ext_sales_price ext_sales_price FROM store_sales, item, date_dim WHERE ss_addr_sk IS NULL AND ss_sold_date_sk=d_date_sk AND ss_item_sk=i_item_sk UNION ALL SELECT 'web' as channel, 'ws_ship_addr_sk' col_name, d_year, d_qoy, i_category, ws_ext_sales_price ext_sales_price FROM web_sales, item, date_dim WHERE ws_ship_addr_sk IS NULL AND ws_sold_date_sk=d_date_sk AND ws_item_sk=i_item_sk UNION ALL SELECT 'catalog' as channel, 'cs_ship_mode_sk' col_name, d_year, d_qoy, i_category, cs_ext_sales_price ext_sales_price FROM catalog_sales, item, date_dim WHERE cs_ship_mode_sk IS NULL AND cs_sold_date_sk=d_date_sk AND cs_item_sk=i_item_sk) foo GROUP BY channel, col_name, d_year, d_qoy, i_category ORDER BY channel, col_name, d_year, d_qoy, i_category fetch first 100 rows only; -- end query 99 in stream 0 using template query76.tpl
  • 108. Page | 108 Appendix F: Attestation Letter Benchmark sponsor: Berni Schiefer IBM 8200 Warden Avenue Markham, Ontario, L6C 1C7 October 24, 2014 At IBM‟s request I verified the implementation and results of a 30TB Big Data Decision Support (Hadoop-DS) benchmark, with most features derived from the TPC-DS Benchmark. The Hadoop-DS benchmark was executed on the following configuration: Test Platform: IBM x3650BD - 17 Node Cluster Query Engine: IBM BigInsights Big SQL v3.0 Operating System: Red Hat Enterprise Linux 6.4 Configuration per node: CPUs 2 x Intel Xeon Processor E5-2680 v2 (2.8 GHz, 25MB L3) Memory 128GB (1866MHz DDR3) Storage 10 x 2TB SATA 3.5” HDD & 1 x 128GB SATA 2.5” SSD (swap) The results were: Single-Stream Performance 1,023 Hadoop-DS Qph@30TB Multi-Stream Performance 2,274 Hadoop-DS Qph@30TB Multi-Stream Concurrency 4 Streams Load Time 37h 11m 10s While these results are for a non-TPC benchmark, they complied with the following subset of requirements from the latest version of the TPC-DS Benchmark standard: • The database schema was defined with the proper layout and data types • The database population was generated using the TPC provided dsdgen • The database was properly scaled to 30TB and populated accordingly • The auxiliary data structure requirements were met since none were defined • The database load time was properly measured and reported • The query input variables were generated by the TPC provided dsqgen • The execution times for queries were correctly measured and reported
  • 109. Page | 109 The following aspects of the Hadoop-DS benchmark were implemented within the spirit of the TPC-DS Benchmark: • All 99 queries were executed using the specified and unmodified query text or by applying minor modifications to the queries • Query answers were verified against the available validation answer sets The following features and requirements from the latest version of the TPC-DS Benchmark standard were not adhered to: • The defined referential integrity constraints were not enforced • The statistics collection did not meet the required limitations • The data persistence properties were not demonstrated • The data maintenance functions were neither implemented nor executed • A single throughput test was used to measure multi-user performance • The system pricing was not provided or reviewed • The report did not meet the defined format and content The executive summary and the benchmark report documenting the details of this Hadoop-DS benchmark execution were verified for accuracy. Respectfully Yours, François Raab, President