欢迎您访问 最编程 本站为您分享编程语言代码,编程技术文章!
您现在的位置是: 首页

介绍和使用大埔卫生院

最编程 2024-06-11 14:00:11
...

一、TPCH相关知识

1.tpch简介

TPCH是国际事务处理性能委员会(TPC,Transaction Processing Performance Council)于 1994 年指定的标准,是一款面向商品零售业务的决策支持系统测试基准,TPCH包含8张表,22个查询,遵循SQL92。
tpc事务处理性能委员会( Transaction ProcessingPerformance Council )
tpc-h 模拟决策支持系统中的数据库操作,测试数据库系统复杂查询的响应时间,以每小时执行的查询数(TPC-H QphH@Siz)作为度量指标.

tpc-h规范:
Query:  22条sql
Modify: RF1, RF2 
3个测试:数据装载、顺序执行一次query+modify、模拟生产

模拟生产:在22个查询执行的同时,还有一对更新操作RF1 和RF2 并发地执行,RF1向Order 表和Lineitem 表中插入原行数的0.1%的新行,模拟新销售业务的数据加入到数据库中;RF2 从Order 表和Lineitem表中删除等量与RF1 增加的数据,模拟旧的销售数据被淘汰

2.tpch关系图

TPC-H包括8张表(表上有些约束等需要满足,参见TPC-H规范),如下:
PART:表示零件的信息,如表4-1所示。
SUPPLIER:表示供货商的信息,如表4-2所示。
PARTSUPP:表示供货商的零件的信息,如表4-3所示。
CUSTOMER:表示消费者的信息,如表4-4所示。
ORDERS:表示订单的信息,如表4-5所示。
LINEITEM:表示在线商品的信息,如表4-6所示。
NATION:表示国家的信息,如表4-7所示。
REGION:表示地区的信息,如表4-8所示。
这8张表之间的关系,如图4-1所示或图4-2所示。
image.png

3.行存与列存

行存

https://www.cnblogs.com/gc65/p/11011885.html
TOAST表,这种存储方式叫行外存储。一般当一条记录压缩后的大小大于TOAST_TUPLE_THRESHOLD(通常是page_size/4即2kB)这个值时,会存储到TOAST表。

create table t5(id int,name text);
select relname,relfilenode,reltoastrelid from pg_class where relname='t5';
 relname | relfilenode | reltoastrelid
---------+-------------+---------------
 t5      |       42069 |         42072
select * from pg_toast.pg_toast_42069;
 chunk_id | chunk_seq | chunk_data
----------+-----------+------------

PLAIN:避免压缩和行外存储。只有那些不需要TOAST策略就能存放的数据类型允许选择(例如int类型),而对于text这类要求存储长度超过页大小的类型,是不允许采用此策略的
EXTENDED:允许压缩和行外存储。一般会先压缩,如果还是太大,就会行外存储(默认)
EXTERNAL:允许行外存储,但不许压缩。类似字符串这种会对数据的一部分进行操作的字段,采用此策略可能获得更高的性能,因为不需要读取出整行数据再解压。
MAIN:允许压缩,但不许行外存储。不过实际上,为了保证过大数据的存储,行外存储在其它方式(例如压缩)都无法满足需求的情况下,作为最后手段还是会被启动。因此理解为:尽量不使用行外存储更贴切。一般只有在变长的字段上且存储模式不是PLAIN时才能启用TOAST。

#列存
列存表以CU为单位,默认60000一个CU,自带min/max稀疏索引。
列的联系:col_id + cuid + offset

set search_path=cstore;
\d
 Schema |      Name       | Type  |  Owner   |        Storage
--------+-----------------+-------+----------+-----------------------
 cstore | pg_cudesc_41964 | table | vastbase | {internal_mask=33031}
 cstore | pg_cudesc_41977 | table | vastbase | {internal_mask=33031}
 cstore | pg_cudesc_41990 | table | vastbase | {internal_mask=33031}
 cstore | pg_cudesc_42003 | table | vastbase | {internal_mask=33031}
 cstore | pg_cudesc_42016 | table | vastbase | {internal_mask=33031}
 cstore | pg_cudesc_42029 | table | vastbase | {internal_mask=33031}
 cstore | pg_cudesc_42042 | table | vastbase | {internal_mask=33031}
 cstore | pg_cudesc_42055 | table | vastbase | {internal_mask=33031}
 cstore | pg_delta_41964  | table | vastbase | {internal_mask=32768}
 cstore | pg_delta_41977  | table | vastbase | {internal_mask=32768}
 cstore | pg_delta_41990  | table | vastbase | {internal_mask=32768}
 cstore | pg_delta_42003  | table | vastbase | {internal_mask=32768}
 cstore | pg_delta_42016  | table | vastbase | {internal_mask=32768}
 cstore | pg_delta_42029  | table | vastbase | {internal_mask=32768}
 cstore | pg_delta_42042  | table | vastbase | {internal_mask=32768}
 cstore | pg_delta_42055  | table | vastbase | {internal_mask=32768}
\d from pg_cudesc_42055;
   Column   |  Type   | Modifiers | Storage  | Stats target | Description
------------+---------+-----------+----------+--------------+-------------
 col_id     | integer |           | plain    |              |
 cu_id      | oid     |           | plain    |              |
 min        | text    |           | extended |              |
 max        | text    |           | extended |              |
 row_count  | integer |           | plain    |              |
 cu_mode    | integer |           | plain    |              |
 size       | bigint  |           | plain    |              |
 cu_pointer | text    |           | extended |              |
 magic      | integer |           | plain    |              |
 extra      | text    |           | extended |              |
image.png
--局部聚簇约束(Partial Cluster Key)
列存下的一种技术。这种技术可以通过min/max稀疏索引较快的实现基表扫描的filter过滤。Partial Cluster Key可以指定多列,但是一般不建议超过2列。  简单讲就是按照clusterkey进行有序存储。
Partial Cluster Key的选取原则:
受基表中的简单表达式约束。这种约束一般形如col op const,其中col为列名,op为操作符 =、>、>=、<=、<,const为常量值。
尽量采用选择度比较高(过滤掉更多数据)的简单表达式中的列。
尽量把选择度比较低的约束col放在Partial Cluster Key中的前面。
尽量把枚举类型的列放在Partial Cluster Key中的前面。

--psort索引
PSort CU临近有序,查询交集尽量减少,提升查询的效率。
列数据min/max之间的数据比较离散时,可以建立psort索引

--btree索引
b树索引和行存基本类似,不同之处在于行存使用ctid,列存使用cuid(n)+偏移量

--delta表:
列存表附带的行存表,在做小批量数据插入时候,为了提升查询性能,减少cu空间消耗
开启delta表:ALTER TABLE table_name SET (enable_delta=TRUE);
关闭delta表:ALTER TABLE table_name SET (enable_delta=FALSE);
GaussDB提供两种合并操作:自动的auto-deltaMerge和手动的deltaMerge。auto-deltaMerge会将默认60000行以上的数据合并到列存储,而手动deltaMerge会将全部数据从Delta存储合并至列存储。但手动deltaMerge依赖高级别的锁,需谨慎使用

--列存相关参数
shared_buffers
列存cudesc和delta信息存储需要
cstore_buffers
列存所使用的共享缓冲区的大小,默认值:32768KB。
max_loaded_cudesc
设置列存表在做扫描时,缓存cudesc信息的个数。增大设置会提高查询性能,但也会增加内存占用,特别是当列存表的列非常多时。
cstore_prefetch_quantity
描述列存储使用ADIO预取IO量的大小
cstore_backwrite_quantity
描述列存储使用ADIO写入IO量的大小
cstore_backwrite_max_threshold
描述列存储使用ADIO写入数据库可缓存最大的IO量
fast_extend_file_size
描述列存储使用ADIO预扩展磁盘的大小
psort_work_mem
设置列存表在进行局部排序中,在开始写入临时磁盘文件之前使用的内存大小。带partial cluster key的表、带索引的表插入、创建表索引、删除表和更新表都会用到
partition_mem_batch
指定缓存个数,为了优化对列存分区表的批量插入,在批量插入过程中会对数据进行缓存后再批量写盘。默认值:256 。
partition_max_cache_size
指定数据缓存区大小,为了优化对列存分区表的批量插入,在批量插入过程中会对数据进行缓存后再批量写盘。默认值:2GB。
enable_delta_store
为了增强列存单条数据导入的性能和解决磁盘冗余问题,是否需要开启列存delta表功能,与参数DELTAROW_THRESHOLD 配合使用。默认值:off。

4.并行与连接

并行:
query_dop本质
并行查询不是指多个会话的并发,而是服务器为单个查询开启至少2n+1个并行线程,其中n个线程用于生产数据,n个线程用于消费数据,1个线程用于协调汇总,最后反馈给客户端。因此并行度越大,带来的资源消耗则是成倍增长,可能会导致主机资源耗尽。
select id from t order by id;
生产者thred1: 1,3,5 -> 消费者thred3: 1,3,5 -> 协调线程:1,2,3,4,5,6  
生产者thred2: 4,6,2 -> 消费者thred4: 2,4,6
1.生产者划分n个数据区。提交给消费者后就可以做其他事了
2.消费者处理数据,必须要等待所有消费线程处理完成
3.协调线程汇总消费者数据,返回给客户端

#例外
1.数据倾斜:数据倾斜会导致生产者数据不均匀,消费者并行效果差。
2.资源竞争:高并发场景下,开启并行,会导致各查询之间严重的资源竞争问题
3.索引扫描:不支持并行

表的连接:
nestloop_join: 驱动表较小,且内表连接字段必须有索引
hash_join: 两表数据量相差很大。小表连接字段建立散列表,大表去探测散列表
merge_join: 没有索引,数据源已经排序。
image.png

5.执行计划

explain [ (  option  [, ...] )  ] statement
    ANALYZE [ boolean ] |
    ANALYSE [ boolean ] |
    VERBOSE [ boolean ] | 默认为false 参数可以显示执行集合更多信息
    COSTS [ boolean ] |  默认为TRUE
    CPU [ boolean ] |
    DETAIL [ boolean ] |
    NODES [ boolean ] |
    NUM_NODES [ boolean ] |
    BUFFERS [ boolean ] | 缺省为FALSE
    TIMING [ boolean ] |  缺省为TRUE
    PLAN [ boolean ] |
    FORMAT { TEXT | XML | JSON | YAML }
explain + statement    
explain analyze + statement
explain (ANALYZE true,buffers true) + statement

1.costs总代价最大的子句 
2.预估rows与实际rows相差是否过大
3.使用buffers查看是否使用了磁盘
4.为什么没有走索引扫描,索引建立不合适?表较小?可能结果集比较大?(一般结果集超过total*40%就可能走全表了,感兴趣可以研究costs计算方式或者直接强制走索引对比)
5.多表连接如何选择,优化器正常会走合适的连接方式,也可以强制关闭其他连接开关。
6.。。。

二、安装TPCH工具

1.安装工具下载:

1).TPC-H工具下载网站:http://www.tpc.org/tpch/
2)链接:https://pan.baidu.com/s/1VuB-reZ_1EiWT8cj_wrGhw 
提取码:rqgg

2.解压安装包

unzip tpc-h-tool-3.0.0.zip -d /home/tpcc/

3.编辑makefile.suite

cd /home/tpcc/TPC-H_Tools_v3.0.0/dbgen
vi makefile.suite 
CC      = gcc
DATABASE= POSTGRESQL
MACHINE = LINUX
WORKLOAD = TPCH

4.编辑tpcd.h

vi tpcd.h
#ifdef POSTGRESQL
#define GEN_QUERY_PLAN  "EXPLAIN PLAN"
#define START_TRAN      "SET TRANSACTION"
#define END_TRAN        "COMMIT;"
#define SET_OUTPUT      ""
#define SET_ROWCOUNT    "LIMIT %d\n"
#define SET_DBASE      ""
#endif

5.执行编译安装

make -f makefile.suite

6.生成裸数据

dbgen参数详解

-v 详细信息
-s 表示生成G数据
-1 SF,Scale Factor ,数据库的比例因子。TPC-H标准规定,测试数据库的比例因子必须从下列固定值中选择:1,10,30,100,1000,3000,10000 (相当于1GB,10GB,30GB,100GB,1000GB,3000GB,10000GB)。数据库的大小缺省定义为1(例如:SF=1;近似于1GB)。
-S 切分数据
-f 覆盖之前的文件
更多参数可使用 ./dbgen -help 查看

./dbgen -s xxx ---这里的xxx表示想要生成的数据量(单位GB)
注:这里的-s 参数指定生成测试数据的仓库数,建议设置100以上,这里为了方便,选取较小的值,真实业务场景测试可设置1000以上。
生成的数据文件存放在/home/tpcc/TPC-H_Tools_v3.0.0/dbgen目录下,以tbl结尾,一共8个文件,对应的是8个表

7.创建TPCH相关表

详见行存/列存第2步创建数据表。
# 注意:dss.ddl 这个文件存储的是建表的语句

8.执行对应sql

将22条sql分别命名为1~22.sql文件,并将执行脚本与sql存放在同一目录,并在每个sql文件前加上 \timing on开启执行时间记录。

***脚本和sql文件属组必须是数据库安装用户,且具有创建文件权限。

22条sql文件及执行脚本获取地址:
链接:https://pan.baidu.com/s/1fi1nihHMMmsKYfeXyegbwA
提取码:zhzk

vi tpch_sql.sh

for i in `seq 1 22`
do
su - vastbase -c "vsql -d tpch -f ${i}.sql"
done

还可以将运行结果写入到日志中,日志将加载详细运行过程及运行时间,方便对sql运行结果进行分析。

for i in `seq 1 22`
do
su - vastbase -c "vsql -d tpch -f ${i}.sql >>${i}sql.log"
done

三、行存测试

1.创建database

create database tpch;
--create tablespace tpch location '/data/tpch';
--alter database tpch set tablespace tpch;

2. 创建数据表

CREATE TABLE NATION  ( N_NATIONKEY  INTEGER NOT NULL,
                            N_NAME       CHAR(25) NOT NULL,
                            N_REGIONKEY  INTEGER NOT NULL,
                            N_COMMENT    VARCHAR(152),
                            N_NULL    VARCHAR(10)
                            );
CREATE TABLE REGION  ( R_REGIONKEY  INTEGER NOT NULL,
                            R_NAME       CHAR(25) NOT NULL,
                            R_COMMENT    VARCHAR(152),
                            R_NULL VARCHAR(10)  
                            );
CREATE TABLE PART  ( P_PARTKEY     INTEGER NOT NULL,
                          P_NAME        VARCHAR(55) NOT NULL,
                          P_MFGR        CHAR(25) NOT NULL,
                          P_BRAND       CHAR(10) NOT NULL,
                          P_TYPE        VARCHAR(25) NOT NULL,
                          P_SIZE        INTEGER NOT NULL,
                          P_CONTAINER   CHAR(10) NOT NULL,
                          P_RETAILPRICE DECIMAL(15,2) NOT NULL,
                          P_COMMENT     VARCHAR(23) NOT NULL,
                          P_NULL VARCHAR(10) 
                          );
                        
CREATE TABLE SUPPLIER ( S_SUPPKEY     INTEGER NOT NULL,
                             S_NAME        CHAR(25) NOT NULL,
                             S_ADDRESS     VARCHAR(40) NOT NULL,
                             S_NATIONKEY   INTEGER NOT NULL,
                             S_PHONE       CHAR(15) NOT NULL,
                             S_ACCTBAL     DECIMAL(15,2) NOT NULL,
                             S_COMMENT     VARCHAR(101) NOT NULL,
                             S_NULL VARCHAR(10)         
                             );
CREATE TABLE PARTSUPP ( PS_PARTKEY     INTEGER NOT NULL,
                             PS_SUPPKEY     INTEGER NOT NULL,
                             PS_AVAILQTY    INTEGER NOT NULL,
                             PS_SUPPLYCOST  DECIMAL(15,2)  NOT NULL,
                             PS_COMMENT     VARCHAR(199) NOT NULL,
                             PS_NULL  VARCHAR(10)   
                             );
CREATE TABLE CUSTOMER ( C_CUSTKEY     INTEGER NOT NULL,
                             C_NAME        VARCHAR(25) NOT NULL,
                             C_ADDRESS     VARCHAR(40) NOT NULL,
                             C_NATIONKEY   INTEGER NOT NULL,
                             C_PHONE       CHAR(15) NOT NULL,
                             C_ACCTBAL     DECIMAL(15,2)   NOT NULL,
                             C_MKTSEGMENT  CHAR(10) NOT NULL,
                             C_COMMENT     VARCHAR(117) NOT NULL,
                             C_NULL  VARCHAR(10)        
                             );
CREATE TABLE ORDERS  ( O_ORDERKEY       INTEGER NOT NULL,
                           O_CUSTKEY        INTEGER NOT NULL,
                           O_ORDERSTATUS    CHAR(1) NOT NULL,
                           O_TOTALPRICE     DECIMAL(15,2) NOT NULL,
                           O_ORDERDATE      DATE NOT NULL,
                           O_ORDERPRIORITY  CHAR(15) NOT NULL,  
                           O_CLERK          CHAR(15) NOT NULL, 
                           O_SHIPPRIORITY   INTEGER NOT NULL,
                           O_COMMENT        VARCHAR(79) NOT NULL,
                           O_NULL   VARCHAR(10)       
                           );
CREATE TABLE LINEITEM ( L_ORDERKEY    INTEGER NOT NULL,
                             L_PARTKEY     INTEGER NOT NULL,
                             L_SUPPKEY     INTEGER NOT NULL,
                             L_LINENUMBER  INTEGER NOT NULL,
                             L_QUANTITY    DECIMAL(15,2) NOT NULL,
                             L_EXTENDEDPRICE  DECIMAL(15,2) NOT NULL,
                             L_DISCOUNT    DECIMAL(15,2) NOT NULL,
                             L_TAX         DECIMAL(15,2) NOT NULL,
                             L_RETURNFLAG  CHAR(1) NOT NULL,
                             L_LINESTATUS  CHAR(1) NOT NULL,
                             L_SHIPDATE    DATE NOT NULL,
                             L_COMMITDATE  DATE NOT NULL,
                             L_RECEIPTDATE DATE NOT NULL,
                             L_SHIPINSTRUCT CHAR(25) NOT NULL,
                             L_SHIPMODE     CHAR(10) NOT NULL,
                             L_COMMENT      VARCHAR(44) NOT NULL,
                             L_NULL  VARCHAR(10)  
                             );

3. 导入数据

copy customer from '/home/tpch/TPC-H_Tools_v3.0.0/dbgen/customer.tbl' DELIMITERS '|';
copy lineitem from '/home/tpch/TPC-H_Tools_v3.0.0/dbgen/lineitem.tbl' DELIMITERS '|';
copy nation from '/home/tpch/TPC-H_Tools_v3.0.0/dbgen/nation.tbl' DELIMITERS '|';
copy orders from '/home/tpch/TPC-H_Tools_v3.0.0/dbgen/orders.tbl' DELIMITERS '|';
copy partsupp from '/home/tpch/TPC-H_Tools_v3.0.0/dbgen/partsupp.tbl' DELIMITERS '|';
copy part from '/home/tpch/TPC-H_Tools_v3.0.0/dbgen/part.tbl' DELIMITERS '|';
copy region from '/home/tpch/TPC-H_Tools_v3.0.0/dbgen/region.tbl' DELIMITERS '|';
copy supplier from '/home/tpch/TPC-H_Tools_v3.0.0/dbgen/supplier.tbl' DELIMITERS '|';

3. 创建索引

ALTER TABLE REGION ADD PRIMARY KEY (R_REGIONKEY);
ALTER TABLE NATION ADD PRIMARY KEY (N_NATIONKEY);
ALTER TABLE PART ADD PRIMARY KEY (P_PARTKEY);
ALTER TABLE SUPPLIER ADD PRIMARY KEY (S_SUPPKEY);
ALTER TABLE PARTSUPP ADD PRIMARY KEY (PS_PARTKEY,PS_SUPPKEY);
ALTER TABLE CUSTOMER ADD PRIMARY KEY (C_CUSTKEY);
ALTER TABLE LINEITEM ADD PRIMARY KEY (L_ORDERKEY,L_LINENUMBER);
ALTER TABLE ORDERS ADD PRIMARY KEY (O_ORDERKEY);
ALTER TABLE PARTSUPP ADD FOREIGN KEY (PS_PARTKEY) REFERENCES PART(P_PARTKEY);
ALTER TABLE LINEITEM ADD FOREIGN KEY (L_ORDERKEY) REFERENCES ORDERS(O_ORDERKEY);
CREATE INDEX l_shipdate_idx ON LINEITEM(L_SHIPDATE);
vacuum ANALYZE ;

四、列存测试

1.创建测试表

CREATE TABLE NATION  ( N_NATIONKEY  INTEGER NOT NULL,
                            N_NAME       CHAR(25) NOT NULL,
                            N_REGIONKEY  INTEGER NOT NULL,
                            N_COMMENT    VARCHAR(152),
                            N_NULL    VARCHAR(10)
                            )with (ORIENTATION = COLUMN);
CREATE TABLE REGION  ( R_REGIONKEY  INTEGER NOT NULL,
                            R_NAME       CHAR(25) NOT NULL,
                            R_COMMENT    VARCHAR(152),
                            R_NULL VARCHAR(10)  
                            )with (ORIENTATION = COLUMN);
CREATE TABLE PART  ( P_PARTKEY     INTEGER NOT NULL,
                          P_NAME        VARCHAR(55) NOT NULL,
                          P_MFGR        CHAR(25) NOT NULL,
                          P_BRAND       CHAR(10) NOT NULL,
                          P_TYPE        VARCHAR(25) NOT NULL,
                          P_SIZE        INTEGER NOT NULL,
                          P_CONTAINER   CHAR(10) NOT NULL,
                          P_RETAILPRICE DECIMAL(15,2) NOT NULL,
                          P_COMMENT     VARCHAR(23) NOT NULL,
                          P_NULL VARCHAR(10) 
                          )with (ORIENTATION = COLUMN);
                        
CREATE TABLE SUPPLIER ( S_SUPPKEY     INTEGER NOT NULL,
                             S_NAME        CHAR(25) NOT NULL,
                             S_ADDRESS     VARCHAR(40) NOT NULL,
                             S_NATIONKEY   INTEGER NOT NULL,
                             S_PHONE       CHAR(15) NOT NULL,
                             S_ACCTBAL     DECIMAL(15,2) NOT NULL,
                             S_COMMENT     VARCHAR(101) NOT NULL,
                             S_NULL VARCHAR(10)         
                             )with (ORIENTATION = COLUMN);
CREATE TABLE PARTSUPP ( PS_PARTKEY     INTEGER NOT NULL,
                             PS_SUPPKEY     INTEGER NOT NULL,
                             PS_AVAILQTY    INTEGER NOT NULL,
                             PS_SUPPLYCOST  DECIMAL(15,2)  NOT NULL,
                             PS_COMMENT     VARCHAR(199) NOT NULL,
                             PS_NULL  VARCHAR(10)   
                             )with (ORIENTATION = COLUMN);
CREATE TABLE CUSTOMER ( C_CUSTKEY     INTEGER NOT NULL,
                             C_NAME        VARCHAR(25) NOT NULL,
                             C_ADDRESS     VARCHAR(40) NOT NULL,
                             C_NATIONKEY   INTEGER NOT NULL,
                             C_PHONE       CHAR(15) NOT NULL,
                             C_ACCTBAL     DECIMAL(15,2)   NOT NULL,
                             C_MKTSEGMENT  CHAR(10) NOT NULL,
                             C_COMMENT     VARCHAR(117) NOT NULL,
                             C_NULL  VARCHAR(10)        
                             )with (ORIENTATION = COLUMN);
CREATE TABLE ORDERS  ( O_ORDERKEY       INTEGER NOT NULL,
                           O_CUSTKEY        INTEGER NOT NULL,
                           O_ORDERSTATUS    CHAR(1) NOT NULL,
                           O_TOTALPRICE     DECIMAL(15,2) NOT NULL,
                           O_ORDERDATE      DATE NOT NULL,
                           O_ORDERPRIORITY  CHAR(15) NOT NULL,  
                           O_CLERK          CHAR(15) NOT NULL, 
                           O_SHIPPRIORITY   INTEGER NOT NULL,
                           O_COMMENT        VARCHAR(79) NOT NULL,
                           O_NULL   VARCHAR(10)       
                           )with (ORIENTATION = COLUMN);
CREATE TABLE LINEITEM ( L_ORDERKEY    INTEGER NOT NULL,
                             L_PARTKEY     INTEGER NOT NULL,
                             L_SUPPKEY     INTEGER NOT NULL,
                             L_LINENUMBER  INTEGER NOT NULL,
                             L_QUANTITY    DECIMAL(15,2) NOT NULL,
                             L_EXTENDEDPRICE  DECIMAL(15,2) NOT NULL,
                             L_DISCOUNT    DECIMAL(15,2) NOT NULL,
                             L_TAX         DECIMAL(15,2) NOT NULL,
                             L_RETURNFLAG  CHAR(1) NOT NULL,
                             L_LINESTATUS  CHAR(1) NOT NULL,
                             L_SHIPDATE    DATE NOT NULL,
                             L_COMMITDATE  DATE NOT NULL,
                             L_RECEIPTDATE DATE NOT NULL,
                             L_SHIPINSTRUCT CHAR(25) NOT NULL,
                             L_SHIPMODE     CHAR(10) NOT NULL,
                             L_COMMENT      VARCHAR(44) NOT NULL,
                             L_NULL  VARCHAR(10)  
                             )with (ORIENTATION = COLUMN);

2.创建约束

ALTER TABLE LINEITEM ADD PARTIAL CLUSTER KEY (L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE);
ALTER TABLE ORDERS ADD PARTIAL CLUSTER KEY (O_ORDERDATE);

3.导入数据

copy customer from '/home/tpch/TPC-H_Tools_v3.0.0/dbgen/customer.tbl' DELIMITERS '|';
copy lineitem from '/home/tpch/TPC-H_Tools_v3.0.0/dbgen/lineitem.tbl' DELIMITERS '|';
copy nation from '/home/tpch/TPC-H_Tools_v3.0.0/dbgen/nation.tbl' DELIMITERS '|';
copy orders from '/home/tpch/TPC-H_Tools_v3.0.0/dbgen/orders.tbl' DELIMITERS '|';
copy partsupp from '/home/tpch/TPC-H_Tools_v3.0.0/dbgen/partsupp.tbl' DELIMITERS '|';
copy part from '/home/tpch/TPC-H_Tools_v3.0.0/dbgen/part.tbl' DELIMITERS '|';
copy region from '/home/tpch/TPC-H_Tools_v3.0.0/dbgen/region.tbl' DELIMITERS '|';
copy supplier from '/home/tpch/TPC-H_Tools_v3.0.0/dbgen/supplier.tbl' DELIMITERS '|';
vacuum ANALYZE ;

五、执行22条测试sql

Q1:定价汇总报表查询
定价汇总报表查询是分组、排序、聚集操作并存的单表查询操作,报告已开票、发货和退回的业务量; 为给定日期发货的所有订单项提供汇总定价报告,按l_returnflag和l_linestatus分组并排序,列出了扩展价格、折扣扩展价格、折扣扩展价格加税、平均数量、平均扩展价格、平均折扣、各个分组的总行数。

Q1语句的特点是:带有分组、排序、聚集操作并存的单表查询操作。这个查询会导致表上的数据有95%到97%行被读取到。

--语句详解

select 
    l_returnflag, //返回标志
    l_linestatus, 
    sum(l_quantity) as sum_qty, //总的数量
    sum(l_extendedprice) as sum_base_price, //聚集函数操作
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, 
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, 
    avg(l_quantity) as avg_qty, 
    avg(l_extendedprice) as avg_price, 
    avg(l_discount) as avg_disc, 
    count(*) as count_order //每个分组所包含的行数
from 
    lineitem
where 
    l_shipdate <= date'1998-12-01' - interval '90' day //时间段是随机生成的
group by //分组操作
    l_returnflag, 
    l_linestatus
order by //排序操作
    l_returnflag, 
    l_linestatus;

示例sql:

select
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
from
    lineitem
where
    l_shipdate <= date '1998-12-01' - interval '103' day
group by
    l_returnflag,
    l_linestatus
order by
    l_returnflag,
    l_linestatus
LIMIT 1;

Q2:最低成本供应商查询
最低成本供应商查询是带有排序、聚集操作、子查询并存的多表查询操作,查找应该选择那个供应商来为给定地区的给定零件下订单;在给定的区域中,针对特定类型和尺寸的每个零件,找到能够以最低成本供应的供应商。如果该地区的多个供应商以相同的(最低)成本供货。查询列出了供应商的账户余额、名称和国家,零部件编号、制造商,供应商的地址、电话号码和评论信息。

Q2语句的特点是:带有排序、聚集操作、子查询并存的多表查询操作。查询语句没有从语法上限制返回多少条元组,但是TPC-H标准规定,查询结果只返回前100行(通常依赖于应用程序实现)。

--语句详解

select
    s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment 
  /*查询供应者的帐户余额、名字、国家、零件的号码、生产者、供应者的地址、电话号码、备注信息 */
from
    part, supplier, partsupp, nation, region //五表连接
where
    p_partkey = ps_partkey
    and s_suppkey = ps_suppkey
    and p_size = [SIZE] //指定大小,在区间[1, 50]内随机选择
    and p_type like '%[TYPE]' //指定类型,在TPC-H标准指定的范围内随机选择
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = '[REGION]' //指定地区,在TPC-H标准指定的范围内随机选择
    and ps_supplycost = ( //子查询
        select
            min(ps_supplycost) //聚集函数
        from
            partsupp, supplier, nation, region //与父查询的表有重叠
        where
            p_partkey = ps_partkey
            and s_suppkey = ps_suppkey
            and s_nationkey = n_nationkey
            and n_regionkey = r_regionkey
            and r_name = '[REGION]'
    )
order by //排序
    s_acctbal desc,
    n_name,
    s_name,
    p_partkey;

示例sql:

select
    s_acctbal,
    s_name,
    n_name,
    p_partkey,
    p_mfgr,
    s_address,
    s_phone,
    s_comment
from
    part,
    supplier,
    partsupp,
    nation,
    region,
    (
        select
            ps_partkey,
            min(ps_supplycost) as min
        from
            partsupp,
            supplier,
            nation,
            region
        where
            s_suppkey = ps_suppkey
            and s_nationkey = n_nationkey
            and n_regionkey = r_regionkey
            and r_name = 'EUROPE'
        group by
            ps_partkey
    ) temp
where
    p_partkey = partsupp.ps_partkey
    and s_suppkey = ps_suppkey
    and p_size = 47
    and p_type like '%BRASS'
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = 'EUROPE'
    and ps_supplycost = temp.min
    and p_partkey = temp.ps_partkey
order by
    s_acctbal desc,
    n_name,
    s_name,
    p_partkey
LIMIT 100;

Q3:运输优先级查询
运输优先级查询是带有分组、排序、聚集操作并存的三表查询操作,检索价值最高的未发货订单;在给定日期尚未发货的订单中收入最大的订单的运输优先级和潜在收入(l_extendedprice * (1-l_discount)的总和),订单按照收入的降序排序。

Q3语句的特点是:带有分组、排序、聚集操作并存的三表查询操作。查询语句没有从语法上限制返回多少条元组,但是TPC-H标准规定,查询结果只返回前10行(通常依赖于应用程序实现)。

--语句详解

select
    l_orderkey,
    sum(l_extendedprice*(1-l_discount)) as revenue, //潜在的收入,聚集操作
    o_orderdate,
    o_shippriority
from
    customer, orders, lineitem //三表连接
where
    c_mktsegment = '[SEGMENT]' //在TPC-H标准指定的范围内随机选择
    and c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate < date '[DATE]' //指定日期段,在在[1995-03-01, 1995-03-31]中随机选择
    and l_shipdate > date '[DATE]'
group by //分组操作
    l_orderkey, //订单标识
    o_orderdate, //订单日期
    o_shippriority //运输优先级
order by //排序操作
    revenue desc, //降序排序,把潜在最大收入列在前面
    o_orderdate;

示例sql:

select
    l_orderkey,
    sum(l_extendedprice * (1 - l_discount)) as revenue,
    o_orderdate,
    o_shippriority
from
    customer,
    orders,
    lineitem
where
    c_mktsegment = 'BUILDING'
    and c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate < date '1995-03-20'
    and l_shipdate > date '1995-03-20'
group by
    l_orderkey,
    o_orderdate,
    o_shippriority
order by
    revenue desc,
    o_orderdate
LIMIT 10;

Q4:订单优先级查询
订单优先级查询是带有分组、排序、聚集操作、子查询并存的单表查询操作,确定订单优先系统的工作情况并评估客户满意度;在给定季度中的订购数量,其中客户在承诺日志之后收到至少一个订单项,列出了按优先级升序排序的每个订单优先级的此类订单的计数。

Q4语句的特点是:带有分组、排序、聚集操作、子查询并存的单表查询操作。子查询是相关子查询。

--语句详解

select
    o_orderpriority, //订单优先级
    count(*) as order_count //订单优先级计数
from orders //单表查询
where
    o_orderdate >= date '[DATE]'
    and o_orderdate < date '[DATE]' + interval '3' month //指定订单的时间段+三个月,DATE是在1993年1月和1997年10月之间随机选择的一个月的第一天
    and exists ( //子查询
        select
            *
        from
            lineitem
        where
            l_orderkey = o_orderkey
            and l_commitdate < l_receiptdate
    )
group by //按订单优先级分组
    o_orderpriority
order by //按订单优先级排序
    o_orderpriority;

示例sql:

select
    o_orderpriority,
    count(*) as order_count
from
    orders
where
    o_orderdate >= date '1995-12-01'
    and o_orderdate < date '1995-12-01' + interval '3' month
    and exists 
    (
        select
            *
        from
            lineitem
        where
            l_orderkey = o_orderkey
            and l_commitdate < l_receiptdate 
        limit 1
    )
group by
    o_orderpriority
order by
    o_orderpriority
LIMIT 1;

Q5:本地供应商收入量查询
本地供应商收入量查询是带有分组、排序、聚集操作、子查询并存的多表连接查询操作,列出了通过本地供应商完成的收入量;为一个地区的每个国家列出了由订单项交易产生的收入量,以确定是否在给定区域建立本地配送中心。

Q5语句的特点是:带有分组、排序、聚集操作、子查询并存的多表连接查询操作。

--语句详解

select
    n_name,
    sum(l_extendedprice * (1 - l_discount)) as revenue //聚集操作
from
    customer,orders,lineitem,supplier,nation,region //六表连接
where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and l_suppkey = s_suppkey
    and c_nationkey = s_nationkey
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = '[REGION]' //指定地区,在TPC-H标准指定的范围内随机选择
    and o_orderdate >= date '[DATE]' //DATE是从1993年到1997年中随机选择的一年的1月1日
    and o_orderdate < date '[DATE]' + interval '1' year
group by //按名字分组
    n_name
order by //按收入降序排序,注意分组和排序子句不同
    revenue desc;

示例sql:

select
    n_name,
    sum(l_extendedprice * (1 - l_discount)) as revenue
from
    customer,
    orders,
    lineitem,
    supplier,
    nation,
    region
where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and l_suppkey = s_suppkey
    and c_nationkey = s_nationkey
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = 'AMERICA'
    and o_orderdate >= date '1997-01-01'
    and o_orderdate < date '1997-01-01' + interval '1' year
group by
    n_name
order by
    revenue desc
LIMIT 1;

Q6:预测收入变化查询
预测收入变化查询是带有聚集操作的单表查询操作,量化了在给定年份中给定百分比范围内消除某些公司范围内的折扣所导致的收入增长量,用于寻找增加收入的方法;在给定的年份发货的所有订单项,折扣介于“DISCOUNT-0.01”和“DISCOUNT+0.01”之间,如果l_quantity小于quantity的订单的折扣被消除后总收入增加的数量。

Q6语句的特点是:带有聚集操作的单表查询操作。查询语句使用了BETWEEN-AND操作符,有的数据库可以对BETWEEN-AND进行优化。

--语句详解

select
    sum(l_extendedprice*l_discount) as revenue //潜在的收入增加量
from
    lineitem //单表查询
where
    l_shipdate >= date '[DATE]' //DATE是从[1993, 1997]中随机选择的一年的1月1日
    and l_shipdate < date '[DATE]' + interval '1' year //一年内
    and l_discount between [DISCOUNT] - 0.01 and [DISCOUNT] + 0.01 //between
    and l_quantity < [QUANTITY]; // QUANTITY在区间[24, 25]中随机选择

示例sql:

select
    sum(l_extendedprice * l_discount) as revenue
from
    lineitem
where
    l_shipdate >= date '1997-01-01'
    and l_shipdate < date '1997-01-01' + interval '1' year
    and l_discount between 0.07 - 0.01 and 0.07 + 0.01
    and l_quantity < 24
LIMIT 1;

Q7:批量出货查询
批量出货查询是带有分组、排序、聚集、子查询操作并存的多表查询操作,确定在某些国家直接运输的货物的价值以帮助重新谈判运输合同;对于两个给定的国家,在1995年到1996年期间,从其中一个国家的供应商想另一个国家的客户运送零件的订单项获得的总折扣收入。

Q7语句的特点是:带有分组、排序、聚集、子查询操作并存的多表查询操作。子查询的父层查询不存在其他查询对象,是格式相对简单的子查询。

--语句详解

select
    supp_nation, //供货商国家
    cust_nation, //顾客国家
    l_year, sum(volume) as revenue //年度、年度的货运收入
from ( //子查询
    select
        n1.n_name as supp_nation,
        n2.n_name as cust_nation,
        extract(year from l_shipdate) as l_year,
        l_extendedprice * (1 - l_discount) as volume
    from
        supplier,lineitem,orders,customer,nation n1,nation n2 //六表连接
    where
        s_suppkey = l_suppkey
        and o_orderkey = l_orderkey
        and c_custkey = o_custkey
        and s_nationkey = n1.n_nationkey
        and c_nationkey = n2.n_nationkey
        and ( // NATION2和NATION1的值不同,表示查询的是跨国的货运情况
            (n1.n_name = '[NATION1]' and n2.n_name = '[NATION2]')
            or (n1.n_name = '[NATION2]' and n2.n_name = '[NATION1]')
        )
        and l_shipdate between date '1995-01-01' and date '1996-12-31'
    ) as shipping
group by
    supp_nation,
    cust_nation,
    l_year
order by
    supp_nation,
    cust_nation,
    l_year;

示例sql:

select
    supp_nation,
    cust_nation,
    l_year,
    sum(volume) as revenue
from
(
    select
        n1.n_name as supp_nation,
        n2.n_name as cust_nation,
        extract(year from l_shipdate) as l_year,
        l_extendedprice * (1 - l_discount) as volume
    from
        supplier,
        lineitem,
        orders,
        customer,
        nation n1,
        nation n2
    where
        s_suppkey = l_suppkey
        and o_orderkey = l_orderkey
        and c_custkey = o_custkey
        and s_nationkey = n1.n_nationkey
        and c_nationkey = n2.n_nationkey
        and 
        (
            (n1.n_name = 'CHINA' and n2.n_name = 'INDONESIA')
            or (n1.n_name = 'INDONESIA' and n2.n_name = 'CHINA')
        )
        and l_shipdate between date '1995-01-01' and date '1996-12-31'
) as shipping
group by
    supp_nation,
    cust_nation,
    l_year
order by
    supp_nation,
    cust_nation,
    l_year
LIMIT 1;

Q8:全国市场份额查询
全国市场份额查询带有分组、排序、聚集、子查询操作并存的查询操作,确定给定区域内给定国家的市场份额在两年内对于给定零件类型的变化情况; 给定区域内给定国家的市场份额被定义为收入的一部分,即 l_extendedprice * (1-l_discount)的总和。

Q8语句的特点是:带有分组、排序、聚集、子查询操作并存的查询操作。子查询的父层查询不存在其他查询对象,是格式相对简单的子查询,但子查询自身是多表连接的查询。

--语句详解:

select
    o_year, //年份
    sum(case
    when nation = '[NATION]'//指定国家,在TPC-H标准指定的范围内随机选择
    then volume
    else 0
    end) / sum(volume) as mkt_share //市场份额:特定种类的产品收入的百分比;聚集操作
from ( //子查询
    select
        extract(year from o_orderdate) as o_year, //分解出年份
        l_extendedprice * (1-l_discount) as volume, //特定种类的产品收入
        n2.n_name as nation
    from
        part,supplier,lineitem,orders,customer,nation n1,nation n2,region //八表连接
    where
        p_partkey = l_partkey
        and s_suppkey = l_suppkey
        and l_orderkey = o_orderkey
        and o_custkey = c_custkey
        and c_nationkey = n1.n_nationkey
        and n1.n_regionkey = r_regionkey
        and r_name = '[REGION]' //指定地区,在TPC-H标准指定的范围内随机选择
        and s_nationkey = n2.n_nationkey
        and o_orderdate between date '1995-01-01' and date '1996-12-31' //只查95、96年的情况
        and p_type = '[TYPE]' //指定零件类型,在TPC-H标准指定的范围内随机选择
) as all_nations
group by //按年分组
    o_year
order by //按年排序
    o_year;

示例sql:

select
    o_year,
    sum(case
    when nation = 'INDONESIA' then volume
    else 0
    end)   / sum(volume) as mkt_share
from
(
    select
        extract(year from o_orderdate) as o_year,
        l_extendedprice * (1 - l_discount) as volume,
        n2.n_name as nation
        from
        part,
        supplier,
        lineitem,
        orders,
        customer,
        nation n1,
        nation n2,
        region
    where
        p_partkey = l_partkey
        and s_suppkey = l_suppkey
        and l_orderkey = o_orderkey
        and o_custkey = c_custkey
        and c_nationkey = n1.n_nationkey
        and n1.n_regionkey = r_regionkey
        and r_name = 'ASIA'
        and s_nationkey = n2.n_nationkey
        and o_orderdate between date '1995-01-01' and date '1996-12-31'
        and p_type = 'PROMO POLISHED TIN'
) as all_nations
group by
    o_year
order by
    o_year
LIMIT 1;

Q9:产品类型利润度量查询
产品类型利润度量查询是带有分组、排序、聚集、子查询操作并存的查询操作,确定按供应商国家和年份细分的给定零件系列的利润。

Q9语句的特点是:带有分组、排序、聚集、子查询操作并存的查询操作。子查询的父层查询不存在其他查询对象,是格式相对简单的子查询,但子查询自身是多表连接的查询。子查询中使用了LIKE操作符,有的查询优化器不支持对LIKE操作符进行优化。

--语句详解

select
    nation,
    o_year,
    sum(amount) as sum_profit //每个国家每一年所有被定购的零件在一年中的总利润
from(
  select
        n_name as nation, //国家
        extract(year from o_orderdate) as o_year, //取出年份
        l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount //利润
    from
        part,supplier,lineitem,partsupp,orders,nation //六表连接
    where
        s_suppkey = l_suppkey
        and ps_suppkey = l_suppkey
        and ps_partkey = l_partkey
        and p_partkey = l_partkey
        and o_orderkey = l_orderkey
        and s_nationkey = n_nationkey
        and p_name like '%[COLOR]%' //LIKE操作,查询优化器可能进行优化
) as profit
group by //按国家和年份分组
    nation,
    o_year
order by //按国家和年份排序,年份大者靠前
    nation,
    o_year desc;

示例sql:

select
    nation,
    o_year,
    sum(amount) as sum_profit
from
(
    select
        n_name as nation,
        extract(year from o_orderdate) as o_year,
        l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
    from
        part,
        supplier,
        lineitem,
        partsupp,
        orders,
        nation
    where
        s_suppkey = l_suppkey
        and ps_suppkey = l_suppkey
        and ps_partkey = l_partkey
        and p_partkey = l_partkey
        and o_orderkey = l_orderkey
        and s_nationkey = n_nationkey
        and p_name like '%thistle%'
) as profit
group by
    nation,
    o_year
order by
    nation,
    o_year desc
LIMIT 1;

Q10:退货报告查询
退货报告查询是带有分组、排序、聚集操作并存的多表连接查询操作,识别可能对运送给他们的零件有问题的客户;根据对给定季度收入损失的影响,找出排名靠前的已退回零件的客户。

Q10语句的特点是:带有分组、排序、聚集操作并存的多表连接查询操作。查询语句没有从语法上限制返回多少条元组,但是TPC-H标准规定,查询结果只返回前10行(通常依赖于应用程序实现)。

--语句详解:

select
    c_custkey, c_name, //客户信息
    sum(l_extendedprice * (1 - l_discount)) as revenue, //收入损失
    c_acctbal,
    n_name, c_address, c_phone, c_comment //国家、地址、电话、意见信息等
from
    customer, orders, lineitem, nation
where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate >= date '[DATE]' // DATE是位于1993年一月到1994年十二月中任一月的一号
    and o_orderdate < date '[DATE]' + interval '3' month //3个月内
    and l_returnflag = 'R' //货物被回退
    and c_nationkey = n_nationkey
group by
    c_custkey,
    c_name,
    c_acctbal,
    c_phone,
    n_name,
    c_address,
    c_comment
order by
    revenue desc;

示例sql:

select
    c_custkey,
    c_name,
    sum(l_extendedprice * (1 - l_discount)) as revenue,
    c_acctbal,
    n_name,
    c_address,
    c_phone,
    c_comment
from
    customer,
    orders,
    lineitem,
    nation
where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate >= date '1994-06-01'
    and o_orderdate < date '1994-06-01' + interval '3' month
    and l_returnflag = 'R'
    and c_nationkey = n_nationkey
group by
    c_custkey,
    c_name,
    c_acctbal,
    c_phone,
    n_name,
    c_address,
    c_comment
order by
    revenue desc
LIMIT 20;

Q11:库存价值查询
库存价值查询是带有分组、排序、聚集、子查询操作并存的多表连接查询操作,查找给定国家/地区供应商的库存价值; 扫描给定国家的供应商的可用库存来查找占有所有可用零件总价值的重要百分比的所有零件。

Q11语句的特点是:带有分组、排序、聚集、子查询操作并存的多表连接查询操作。子查询位于分组操作的HAVING条件中。

--语句详解

select
    ps_partkey,
    sum(ps_supplycost * ps_availqty) as value //聚集操作,商品的总价值
from
    partsupp, supplier, nation
where
    ps_suppkey = s_suppkey
    and s_nationkey = n_nationkey
    and n_name = '[NATION]'
group by
    ps_partkey 
having //带有HAVING子句的分组操作
    sum(ps_supplycost * ps_availqty) > ( //HAVING子句中包括有子查询
        select
            sum(ps_supplycost * ps_availqty) * [FRACTION] //子查询中存在聚集操作;FRACTION为0.0001/SF1
        from
            partsupp, supplier, nation //与父查询的表连接一致
        where //与父查询的WHEWR条件一致
            ps_suppkey = s_suppkey
            and s_nationkey = n_nationkey
            and n_name = '[NATION]' //指定国家
    )
order by //按商品的价值降序排序
    value desc;

示例sql:

select
    ps_partkey,
    sum(ps_supplycost * ps_availqty) as value
from
    partsupp,
    supplier,
    nation
where
    ps_suppkey = s_suppkey
    and s_nationkey = n_nationkey
    and n_name = 'ALGERIA'
group by
    ps_partkey having
    sum(ps_supplycost * ps_availqty) > 
    (
        select
            sum(ps_supplycost * ps_availqty) * 0.0001000000
        from
            partsupp,
            supplier,
            nation
        where
            ps_suppkey = s_suppkey
            and s_nationkey = n_nationkey
            and n_name = 'ALGERIA'
    )
order by
    value desc
LIMIT 1;

Q12:运送方式和订单优先级查询
运送方式和订单优先级查询是带有分组、排序、聚集操作并存的两表连接查询操作,确定选择较便宜的运送方式是否会对关键优先级订单产生负面影响,导致消费者更多的在合同日期之后收到货物。

Q12语句的特点是:带有分组、排序、聚集操作并存的两表连接查询操作。

--语句详解

select
    l_shipmode,
    sum(//聚集操作
        case when 
      o_orderpriority ='1-URGENT' //OR运算,二者满足其一即可,选出URGENT或HIGH的
            or o_orderpriority ='2-HIGH'
        then 1
    else 0
        end) as high_line_count,
    sum(
    case when 
        o_orderpriority <> '1-URGENT' //AND运算,二者都不满足,非URGENT非HIGH的
            and o_orderpriority <> '2-HIGH'
        then 1
        else 0
        end) as low_line_count
from
    orders,lineitem
where
    o_orderkey = l_orderkey
    and l_shipmode in ('[SHIPMODE1]', '[SHIPMODE2]') 
  /* 指定货运模式的类型,在TPC-H标准指定的范围内随机选择,SHIPMODE2必须有别于SHIPMODE1 */
    and l_commitdate < l_receiptdate
    and l_shipdate < l_commitdate
    and l_receiptdate >= date '[DATE]' //从1993年到1997年中任一年的一月一号
    and l_receiptdate < date '[DATE]' + interval '1' year //1年之内
group by //分组操作
    l_shipmode
order by //排序操作
    l_shipmode;

示例sql:

select
    l_shipmode,
    sum(case when o_orderpriority = '1-URGENT' or o_orderpriority = '2-HIGH' then 1 else 0 end) as high_line_count,
    sum(case when o_orderpriority <> '1-URGENT' and o_orderpriority <> '2-HIGH' then 1 else 0 end) as low_line_count
from
    orders,
    lineitem
where
    o_orderkey = l_orderkey
    and l_shipmode in ('RAIL', 'SHIP')
    and l_commitdate < l_receiptdate
    and l_shipdate < l_commitdate
    and l_receiptdate >= date '1994-01-01'
    and l_receiptdate < date '1994-01-01' + interval '1' year
group by
    l_shipmode
order by
    l_shipmode
LIMIT 1;

Q13:客户分布查询
客户分布查询是带有分组、排序、聚集、子查询、左外连接操作并存的查询操作;根究客户的订单数量确定客户的分布,计算并报告有多少客户没有订单,有多个客户有1/2/3 等个订单。

Q13语句的特点是:带有分组、排序、聚集、子查询、左外连接操作并存的查询操作。

--语句详解

select
    c_count, count(*) as custdist //聚集操作,统计每个组的个数
from (//子查询
    select
        c_custkey,
        count(o_orderkey)
    from
        customer left outer join orders 
  on //子查询中包括左外连接操作
        c_custkey = o_custkey
        and o_comment not like ‘%[WORD1]%[WORD2]%’ //LIKE操作
            //WORD1 为以下四个可能值中任意一个:special、pending、unusual、express
            //WORD2 为以下四个可能值中任意一个:packages、requests、accounts、deposits
    group by //子查询中的分组操作
        c_custkey
) as c_orders (c_custkey, c_count)
group by //分组操作
    c_count
order by //排序操作
    custdist desc, //从大到小降序排序
    c_count desc;

TPC-H标准定义了Q13语句等价的变形SQL,与上述查询语句格式上不相同,上述语句使用子查询作为查询的对象,变形的SQL把子查询部分变为视图,然后基于视图做查询,这种做法的意义在于有些数据库不支持如上语法,但存在等价的其他语法,如MySQL就不支持如上语法,需要使用如下等价形式。

create view orders_per_cust:s (custkey, ordercount) as //创建视图,相当与标准Q13的子查询内容
select
c_custkey,
count(o_orderkey)
from
customer left outer join orders on
c_custkey = o_custkey
and o_comment not like '%:1%:2%'
group by
c_custkey;

select
ordercount,
count(*) as custdist
from
orders_per_cust:s //对视图进行查询
group by
ordercount
order by
custdist desc,
ordercount desc;
drop view orders_per_cust:s;

示例sql:

select
    c_count,
    count(*) as custdist
from
(
    select
        c_custkey,
        count(o_orderkey)
    from
        customer left outer join orders on
        c_custkey = o_custkey
        and o_comment not like '%express%packages%'
    group by
        c_custkey
) as c_orders (c_custkey, c_count)
group by
    c_count
order by
    custdist desc,
    c_count desc
LIMIT 1;

Q14:促销效果查询
促销效果查询是带有分组、排序、聚集、子查询、左外连接操作并存的查询操作,监控市场对促销的反应;确定在给定年份和月份的收入中有多少百分比来之与促销部分。

Q14语句的特点是:带有分组、排序、聚集、子查询、左外连接操作并存的查询操作。

语句详解:

select
    100.00 * sum(
    case when p_type like 'PROMO%' //促销零件
        then l_extendedprice*(1-l_discount) //某一特定时间的收入
        else 0
        end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
    lineitem, part
where
    l_partkey = p_partkey
    and l_shipdate >= date '[DATE]' // DATE是从1993年到1997年中任一年的任一月的一号
    and l_shipdate < date '[DATE]' + interval '1' month;

示例sql:

select
    100.00 * sum(case when p_type like 'PROMO%' then l_extendedprice * (1 - l_discount) else 0 end)  / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
    lineitem,
    part
where
    l_partkey = p_partkey
    and l_shipdate >= date '1994-02-01'
    and l_shipdate < date '1994-02-01' + interval '1' month
LIMIT 1;

Q15:*供应商查询
*供应商查询是带有分排序、聚集、聚集子查询操作并存的普通表与视图的连接操作,确定*供应商以对其进行奖励、给予更多业务或获得特殊认可;在给定的年费的给定季度对总出货量贡献最大的供应商。

Q15语句的特点是:带有分排序、聚集、聚集子查询操作并存的普通表与视图的连接操作。

--语句详解:

create view revenue[STREAM_ID](supplier_no, total_revenue) as //创建复杂视图(带有分组操作)
select
    l_suppkey,
    sum(l_extendedprice * (1 - l_discount)) //获取供货商为公司带来的总利润
from
    lineitem
where
    l_shipdate >= date '[DATE]' //DATE 是从1993年一月到1997年十月中任一月的一号
    and l_shipdate < date '[DATE]' + interval '3' month //3个月内
group by //分组键与查询对象之一相同
    l_suppkey;


//查询语句
select
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
from
    supplier,revenue[STREAM_ID] //普通表与复杂视图进行连接操作
where
    s_suppkey = supplier_no
    and total_revenue = (//聚集子查询
        select
            max(total_revenue)
        from
            revenue[STREAM_ID] //聚集子查询从视图获得数据
    )
order by
    s_suppkey;
  
//删除视图
drop view revenue[STREAM_ID];

TPC-H标准定义了Q15语句等价的变形SQL,使用了WITH子句,然后用WITH的对象与表进行连接。变形SQL的语句如下:

WITH revenue (supplier_no, total_revenue) as (
    SELECT
        l_suppkey,
        SUM(l_extendedprice * (1-l_discount))
    FROM
        lineitem
    WHERE
        l_shipdate >= date ':1'
        AND l_shipdate < date ':1' + interval '3' month
    GROUP BY
        l_suppkey
)

SELECT
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
FROM
    supplier,
    revenue
WHERE
    s_suppkey = supplier_no
    AND total_revenue = (
        SELECT
            MAX(total_revenue)
        FROM
            revenue
    )
ORDER BY
    s_suppkey;

示例sql:

create or replace view revenue0 (supplier_no, total_revenue) as
select
    l_suppkey,
    sum(l_extendedprice * (1 - l_discount))
from
    lineitem
where
    l_shipdate >= date '1994-02-01'
    and l_shipdate < date '1994-02-01' + interval '3' month
group by
    l_suppkey;
select
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
from
    supplier,
    revenue0
where
    s_suppkey = supplier_no
    and total_revenue = 
    (
        select
            max(total_revenue)
        from
            revenue0
    )
order by
    s_suppkey
LIMIT 1;

Q16:零部件/供货商关系查询
零部件/供货商关系查询是带有分组、排序、聚集、去重、NOT IN子查询操作并存的两表连接操作;找出有多少供应商可以提供具有给定属性的零件。

Q16语句的特点是:带有分组、排序、聚集、去重、NOT IN子查询操作并存的两表连接操作。

--语句详解

select
    p_brand,
    p_type,
    p_size,
    count(distinct ps_suppkey) as supplier_cnt //聚集、去重操作
from
    partsupp,
    part
where
    p_partkey = ps_partkey
    and p_brand <> '[BRAND]'
        // BRAND=Brand#MN ,M和N是两个字母,代表两个数值,相互独立,取值在1到5之间
    and p_type not like '[TYPE]%' //消费者不感兴趣的类型和尺寸
    and p_size in ([SIZE1], [SIZE2], [SIZE3], [SIZE4], [SIZE5], [SIZE6], [SIZE7], [SIZE8])
        //TYPEX是在1到50之间任意选择的一组八个不同的值
    and ps_suppkey not in ( //NOT IN子查询,消费者排除某些供货商
        select
            s_suppkey
        from
            supplier
        where
            s_comment like '%Customer%Complaints%'
    )
group by //分组操作
    p_brand,
    p_type,
    p_size
order by //排序操作
    supplier_cnt desc, //按数量降序排列,按品牌、种类、尺寸升序排列
    p_brand,
    p_type,
    p_size;

示例sql:

select
    p_brand,
    p_type,
    p_size,
    count(distinct ps_suppkey) as supplier_cnt
from
    partsupp,
    part
where
     p_partkey = ps_partkey
     and p_brand <> 'Brand#55'
     and p_type not like 'MEDIUM PLATED%'
     and p_size in (35, 17, 43, 49, 19, 41, 30, 47)
     and ps_suppkey not in 
     (
         select
             s_suppkey
         from
             supplier
         where
             s_comment like '%Customer%Complaints%'
    )
group by
    p_brand,
    p_type,
    p_size
order by
    supplier_cnt desc,
    p_brand,
    p_type,
    p_size
LIMIT 1;

Q17:小额订单收入查询
小额订单收入查询是带有聚集、聚集子查询操作并存的两表连接操作;确定如果不在为少量特定零件完成订单,平均每年将损失多少收入,通过集中在更大的出货量上来减少管理费用;给定品牌和给定容器类型的零部件,确定在7年内数据库中所有订单订购的此类零件的平均订单数量,如果不再接收低于20%平均水平的订单,平均每年的总收入损失多少。

Q17语句的特点是:带有聚集、聚集子查询操作并存的两表连接操作。

--语句详解

select
    sum(l_extendedprice) / 7.0 as avg_yearly //聚集操作
from
    lineitem, part
where
    p_partkey = l_partkey
    and p_brand = '[BRAND]' /*指定品牌。 BRAND=’Brand#MN’ ,M和N是两个字母,代表两个数值,相互独立,取值在1到5之间 */
    and p_container = '[CONTAINER]' //指定包装类型。在TPC-H标准指定的范围内随机选择
    and l_quantity < ( //聚集子查询
        select
            0.2 * avg(l_quantity)
        from
            lineitem
        where
            l_partkey = p_partkey
);

示例sql:

select
    sum(l_extendedprice) / 7.0 as avg_yearly
from
   lineitem,
   part,
   (SELECT l_partkey AS agg_partkey, 0.2 * avg(l_quantity) AS avg_quantity FROM lineitem GROUP BY l_partkey) part_agg
where
    p_partkey = l_partkey
    and agg_partkey = l_partkey
    and p_brand = 'Brand#53'
    and p_container = 'JUMBO JAR'
    and l_quantity < avg_quantity    
LIMIT 1;

Q18:大批量客户查询
大批量客户查询是带有分组、排序、聚集、IN子查询操作并存的三表连接操作,根据客户的大批量订单对客户进行排名。

Q18语句的特点是:带有分组、排序、聚集、IN子查询操作并存的三表连接操作。查询语句没有从语法上限制返回多少条元组,但是TPC-H标准规定,查询结果只返回前100行(通常依赖于应用程序实现)。

--语句详解

select
    c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, //基本信息
    sum(l_quantity) //订货总数
from
    customer, orders, lineitem
where
    o_orderkey in ( //带有分组操作的IN子查询
        select
            l_orderkey
        from
            lineitem
        group by 
            l_orderkey 
    having
            sum(l_quantity) > [QUANTITY] // QUANTITY是位于312到315之间的任意值
    )
    and c_custkey = o_custkey
    and o_orderkey = l_orderkey 
group by
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice
order by
    o_totalprice desc,
    o_orderdate;

示例sql:

select
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice,
    sum(l_quantity)
from
    customer,
    orders,
    lineitem
where
    o_orderkey in 
    (
        select
            l_orderkey
        from
            lineitem
        group by
            l_orderkey having
            sum(l_quantity) > 313
    )
    and c_custkey = o_custkey
    and o_orderkey = l_orderkey
group by
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice
order by
    o_totalprice desc,
    o_orderdate
LIMIT 100;

Q19:折扣收入查询
折扣收入查询是带有分组、排序、聚集、IN子查询操作并存的三表连接操作,归因于以特定方式处理的选定零部件的总折扣收入。本查询是用数据挖掘工具产生格式化代码的一个例子;对一些空运或人工运输零件三个不同种类的所有订单的总折扣收入。零件的选择考虑特定品牌、包装和尺寸范围。

Q19语句的特点是:带有分组、排序、聚集、IN子查询操作并存的三表连

推荐阅读