欢迎您访问 最编程 本站为您分享编程语言代码,编程技术文章!
您现在的位置是: 首页

应对数据库面试的技巧:理解窗口函数OVER(PARTITION BY)

最编程 2024-07-23 07:58:27
...

本文已参与「新人创作礼」活动,一起开启掘金创作之路。

一、开窗函数的概念:

开窗函数是计算基于组的某种聚合值,但是它不像普通聚合函数那样每组只返回一个值,开窗函数可以为每组返回多个值,因为开窗函数所执行聚合计算的行集组是窗口。在ISO SQL规定了这样的函数为开窗函数,在 Oracle中则被称为分析函数,而在DB2中则被称为OLAP函数。

有两类:一类是排序开窗函数,一类是聚合开窗函数。

二、开窗函数的语法:

函数名(列名) over(partition by 列名1 order by 列名2 ) ,括号中的两个关键词partition by 和order by 可以只出现一个。over() 前面是一个函数,如果是聚合函数,那么order by 不能一起使用。

三、开窗函数和聚合函数的区别:

(1)SQL 标准允许将所有聚合函数用作开窗函数,用OVER 关键字区分开窗函数和聚合函数。

(2)聚合函数每组只返回一个值,开窗函数每组可返回多个值。

注:常见主流数据库MSSQLServer、Oracle、DB2等都支持开窗函数,MySQL8.0支持开窗函数,MySQL8.0之前的版本不支持。

四、排序开窗函数row_number()、rank()、dense_rank()、ntile()的区别:

row_number() over():对相等的值不进行区分,相等的值对应的排名相同,序号从1到n连续。

rank() over():相等的值排名相同,但若有相等的值,则序号从1到n不连续。如果有两个人都排在第3名,则没有第4名。

dense_rank() over():对相等的值排名相同,但序号从1到n连续。如果有两个人都排在第一名,则排在第2名(假设仅有1个第二名)的人是第3个人。

ntile( n ) over():可以看作是把有序的数据集合平均分配到指定的数量n的桶中,将桶号分配给每一行,排序对应的数字为桶号。如果不能平均分配,则较小桶号的桶分配额外的行,并且各个桶中能放的数据条数最多相差1。

实操举例:

1、首先创建表 students_grades:

-- 创建表
create table students_grades
(
student_id int,
student_name varchar(20),
course_name varchar(20),
grades decimal(10,2),
class_name varchar(10)
);

2、插入测试数据:

-- 插入测试数据
insert into students_grades values (1, 'zhangsan', 'Chinese', 99, 'class_one');
insert into students_grades values (1, 'zhangsan', 'English', 97, 'class_one');
insert into students_grades values (1, 'zhangsan', 'mathematics', 87, 'class_one');
insert into students_grades values (1, 'zhangsan', 'Physics', 99, 'class_one');
insert into students_grades values (1, 'zhangsan', 'Chemistry', 67, 'class_one');
insert into students_grades values (1, 'zhangsan', 'biology', 98, 'class_one');
                     
insert into students_grades values (2, 'lisi', 'Chinese', 98, 'class_one');
insert into students_grades values (2, 'lisi', 'English', 98, 'class_one');
insert into students_grades values (2, 'lisi', 'mathematics', 97, 'class_one');
insert into students_grades values (2, 'lisi', 'Physics', 96, 'class_one');
insert into students_grades values (2, 'lisi', 'Chemistry', 99, 'class_one');
insert into students_grades values (2, 'lisi', 'biology', 87, 'class_one');
                        
insert into students_grades values (3, 'wangwu', 'Chinese', 99, 'class_one');
insert into students_grades values (3, 'wangwu', 'English', 98, 'class_one');
insert into students_grades values (3, 'wangwu', 'mathematics', 90, 'class_one');
insert into students_grades values (3, 'wangwu', 'Physics', 98, 'class_one');
insert into students_grades values (3, 'wangwu', 'Chemistry', 99, 'class_one');
insert into students_grades values (3, 'wangwu', 'biology', 99, 'class_one');
                           
insert into students_grades values (4, 'zhaoliu', 'Chinese', 97, 'class_one');
insert into students_grades values (4, 'zhaoliu', 'English', 97, 'class_one');
insert into students_grades values (4, 'zhaoliu', 'mathematics', 99, 'class_one');
insert into students_grades values (4, 'zhaoliu', 'Physics', 98, 'class_one');
insert into students_grades values (4, 'zhaoliu', 'Chemistry', 97, 'class_one');
insert into students_grades values (4, 'zhaoliu', 'biology', 98, 'class_one');

3、查询语句如下:

select 
student_name, 
course_name, 
grades, 
row_number() over(partition by course_name order by grades desc) as rownumber,
rank() over(partition by course_name order by grades desc) as rank,
dense_rank() over(partition by course_name order by grades desc) as denserank,
ntile(3) over(partition by course_name order by grades desc) as ntile
from students_grades;

五、聚合开窗函数:

1、sum(salary) over(partition by city)

--根据城市分组进行工资求和:
select name, city, salary, sum(salary) over(partition by city) from t_person;

image.png

2、sum(salary) over(order by name)

--按照姓名排序求工资累加和:
select name, city, salary, sum(salary) over(order by name) from t_person;

image.png

3、sum(salary) over(partition by city order by name)

--根据城市分组,每个城市按照姓名排序求工资累加和,(也可以求每个人一年中每个月工资的累计)
select name, city, salary, sum(salary) over(partition by city order by name) 
from t_person;

image.png

4、其他类似sum()的聚合函数:

count() over(partition by ... order by ...):求分组后的总数。

max() over(partition by ... order by ...):求分组后的最大值。

min() over(partition by ... order by ...):求分组后的最小值。

avg() over(partition by ... order by ...):求分组后的平均值。

5、first_value(salary)和last_value(salary)

--first_value:是在窗口里面取到第一个值
--last_value:是在窗口里面取到最后一个值
 
--根据city分组后,取salary的第一个值
select name, city, salary, first_value(salary) over(partition by city order by name) 
from t_person;
--根据city分组后,取salary的最后一个值
select name, city, salary, last_value(salary) over(partition by city order by name) 
from t_person;

image.png

image.png

6、lead(salary,1,0)和lag(salary,2,9)

--lag(column, n, 0) over(partition by ... order by ...):取出前n行数据,没有默认值为0--lead(column, n, 0) over(partition by ... order by ...):取出后n行数据,没有默认值为0。
 
----根据city分区(组),salary列当前行的上面1行,如果没有就为默认值0
select name, city, salary, lead(salary,1,0) over(partition by city order by name) 
from t_person;
----根据city分区(组),salary列当前行的下面2行,如果没有就为默认值9
select name, city, salary, lag(salary,2,9) over(partition by city order by name) 
from t_person;

image.png

image.png

测试数据:

--创建数据表t_person
create table t_person (name varchar2(20),city varchar2(20),age int,salary int);
 
--插入测试数据
insert into t_person(name,city,age,salary)
values('tom','beijing',20,3000);
insert into t_person(name,city,age,salary)
values('tim','chengdu',21,4000);
insert into t_person(name,city,age,salary)
values('jim','beijing',22,3500);
insert into t_person(name,city,age,salary)
values('lily','london',21,2000);
insert into t_person(name,city,age,salary)
values('john','newyork',22,1000);
insert into t_person(name,city,age,salary)
values('yaoming','beijing',20,3000);
insert into t_person(name,city,age,salary)
values('swing','london',22,2000);
insert into t_person(name,city,age,salary)
values('guo','newyork',20,2800);
insert into t_person(name,city,age,salary)
values('yuqian','beijing',24,8000);
insert into t_person(name,city,age,salary)
values('ketty','london',25,8500);
insert into t_person(name,city,age,salary)
values('kitty','chengdu',25,3000);
insert into t_person(name,city,age,salary)
values('merry','beijing',23,3500);
insert into t_person(name,city,age,salary)
values('smith','chengdu',30,3000);
insert into t_person(name,city,age,salary)
values('bill','beijing',25,2000);
insert into t_person(name,city,age,salary)
values('jerry','newyork',24,3300);