3,491
社区成员
发帖
与我相关
我的任务
分享
-- 近几天,观察了一下公司的统计平台,经常看到类似如下的ROW_NUMBER()函数相关的SQL语句:
SELECT ...
ROW_NUMBER() OVER (PARTITION BY T74.WEEK_NAME, T5816.SESSION_ID ORDER BY T74.WEEK_NAME DESC, T5816.SESSION_ID DESC)
...
FROM ...
-- 很明显:ROW_NUMBER() OVER()函数中的 ORDER BY 部分完全与 PARTITION BY 部分相同。
-- 我当时很疑惑:这样查询出来的结果能否准确呢?是否是你想要的数据呢?
-- 个人提示:用 ROW_NUMBER() OVER()函数的时候,其ORDER BY 所指定的字段,不应该与 PARTITION BY 所指定的字段完全相同,
-- 准确地说:已经在PARTITION BY 字段中指定的字段,就不应该在后续的 ORDER BY 字段中出现。
-- 举例:
-- Emp表字段说明:
-----------------------------------------------------------------------------
EMPNO NOT NULL NUMBER(4) -- 员工号
ENAME VARCHAR2(10) -- 员工名称
JOB VARCHAR2(9) -- 员工职位
MGR NUMBER(4) -- 员工上级领导工号
HIREDATE DATE -- 员工入职日期
SAL NUMBER(7,2) -- 员工薪水
COMM NUMBER(7,2) -- 员工提成
DEPTNO NUMBER(2) -- 员工所在部门的部门号
scott@TDODS> SELECT * FROM EMP;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
7369 SMITH CLERK 7902 17-DEC-80 800 20
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30
7566 JONES MANAGER 7839 02-APR-81 2975 20
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30
7698 BLAKE MANAGER 7839 01-MAY-81 2850 30
7782 CLARK MANAGER 7839 09-JUN-81 2450 10
7788 SCOTT ANALYST 7566 19-APR-87 3000 20
7839 KING PRESIDENT 17-NOV-81 5000 10
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30
7876 ADAMS CLERK 7788 23-MAY-87 1100 20
7900 JAMES CLERK 7698 03-DEC-81 950 30
7902 FORD ANALYST 7566 03-DEC-81 3000 20
7934 MILLER CLERK 7782 23-JAN-82 1300 10
14 rows selected.
SELECT EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO,
ROW_NUMBER() OVER (PARTITION BY DEPTNO ORDER BY DEPTNO DESC) AS cnt
FROM EMP;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO CNT
---------- ---------- --------- ---------- --------- ---------- ---------- ---------- ----------
7782 CLARK MANAGER 7839 09-JUN-81 2450 10 1
7839 KING PRESIDENT 17-NOV-81 5000 10 2
7934 MILLER CLERK 7782 23-JAN-82 1300 10 3
7566 JONES MANAGER 7839 02-APR-81 2975 20 1
7902 FORD ANALYST 7566 03-DEC-81 3000 20 2
7876 ADAMS CLERK 7788 23-MAY-87 1100 20 3
7369 SMITH CLERK 7902 17-DEC-80 800 20 4
7788 SCOTT ANALYST 7566 19-APR-87 3000 20 5
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 1
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30 2
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 3
7900 JAMES CLERK 7698 03-DEC-81 950 30 4
7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 5
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 6
-- 上面的查询,我一个一个仔细看,这个 ORDER BY DEPTNO DESC 是起作用啦,将查询的整体结果按DEPTNO升序排序,也不符合要求:ORDER BY DEPTNO DESC
-- 进一步分析上面的结果:CNT 字段到底是按什么顺序排序的呢?
-- 是按 EMPNO 排序? NO
-- 是按 ENAME 排序? NO
-- 是按 JOB 排序? NO
-- 是按 MGR 排序? NO
-- 是按 HIREDATE 排序? NO
-- 是按 SAL 排序? NO
-- 是按 COMM 排序? NO
-- 除了整体结果是按的DEPTNO 升序排序外,看不出什么其他规律。
-- 那么:是不是按照 rowid排序的呢?我们再来验证一下:
-- 验证是否是按照rowid全局排序:
SELECT EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO,
rowid
FROM EMP
ORDER BY rowid ASC;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO ROWID
---------- ---------- --------- ---------- --------- ---------- ---------- ---------- ------------------
7369 SMITH CLERK 7902 17-DEC-80 800 20 AAAR3xAAEAAAACXAAA
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 AAAR3xAAEAAAACXAAB
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 AAAR3xAAEAAAACXAAC
7566 JONES MANAGER 7839 02-APR-81 2975 20 AAAR3xAAEAAAACXAAD
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 AAAR3xAAEAAAACXAAE
7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 AAAR3xAAEAAAACXAAF
7782 CLARK MANAGER 7839 09-JUN-81 2450 10 AAAR3xAAEAAAACXAAG
7788 SCOTT ANALYST 7566 19-APR-87 3000 20 AAAR3xAAEAAAACXAAH
7839 KING PRESIDENT 17-NOV-81 5000 10 AAAR3xAAEAAAACXAAI
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30 AAAR3xAAEAAAACXAAJ
7876 ADAMS CLERK 7788 23-MAY-87 1100 20 AAAR3xAAEAAAACXAAK
7900 JAMES CLERK 7698 03-DEC-81 950 30 AAAR3xAAEAAAACXAAL
7902 FORD ANALYST 7566 03-DEC-81 3000 20 AAAR3xAAEAAAACXAAM
7934 MILLER CLERK 7782 23-JAN-82 1300 10 AAAR3xAAEAAAACXAAN
14 rows selected.
-- 验证是否是按照DEPTNO分组,然后各组按照rowid排序:
SELECT EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO,
ROW_NUMBER() OVER (PARTITION BY DEPTNO ORDER BY rowid DESC) AS cnt
FROM EMP;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO CNT
---------- ---------- --------- ---------- --------- ---------- ---------- ---------- ----------
7934 MILLER CLERK 7782 23-JAN-82 1300 10 1
7839 KING PRESIDENT 17-NOV-81 5000 10 2
7782 CLARK MANAGER 7839 09-JUN-81 2450 10 3
7902 FORD ANALYST 7566 03-DEC-81 3000 20 1
7876 ADAMS CLERK 7788 23-MAY-87 1100 20 2
7788 SCOTT ANALYST 7566 19-APR-87 3000 20 3
7566 JONES MANAGER 7839 02-APR-81 2975 20 4
7369 SMITH CLERK 7902 17-DEC-80 800 20 5
7900 JAMES CLERK 7698 03-DEC-81 950 30 1
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30 2
7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 3
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 4
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 5
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 6
14 rows selected.
-- 果然也不是按照 rowid 排序的,我们都清楚,rowid 只是行记录的一个物理地址,所以就算是按照rowid这样排序出来的CNT字段已经失去了意义。
-- 综上所述:
-- 所以:用 ROW_NUMBER() OVER()函数的时候,其ORDER BY 所指定的字段,不应该与 PARTITION BY 所指定的字段完全相同,
-- 准确地说:已经在PARTITION BY 字段中指定的字段,就不应该在后续的 ORDER BY 字段中出现。
-- 再举例:例如:我要查询员工的详细信息,且按部门分组,按工资排名:
SELECT EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO,
ROW_NUMBER() OVER (PARTITION BY DEPTNO ORDER BY sal DESC) AS cnt
FROM EMP;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO CNT
---------- ---------- --------- ---------- --------- ---------- ---------- ---------- ----------
7839 KING PRESIDENT 17-NOV-81 5000 10 1
7782 CLARK MANAGER 7839 09-JUN-81 2450 10 2
7934 MILLER CLERK 7782 23-JAN-82 1300 10 3
7788 SCOTT ANALYST 7566 19-APR-87 3000 20 1
7902 FORD ANALYST 7566 03-DEC-81 3000 20 2
7566 JONES MANAGER 7839 02-APR-81 2975 20 3
7876 ADAMS CLERK 7788 23-MAY-87 1100 20 4
7369 SMITH CLERK 7902 17-DEC-80 800 20 5
7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 1
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 2
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30 3
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 4
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 5
7900 JAMES CLERK 7698 03-DEC-81 950 30 6
14 rows selected.
-- 可以看到: 数据非常准确:
10号部门工资排在第1位的是 KING 员工,薪水是5000;
20号部门工资排在第1位的是 SCOTT 员工,薪水是3000;
30号部门工资排在第1位的是 BLAKE 员工,薪水是2850;
INSERT INTO EMP2 SELECT * FROM EMP ORDER BY dbms_random.value()
SELECT EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO,
ROW_NUMBER() OVER (PARTITION BY DEPTNO ORDER BY DEPTNO DESC) AS cnt
FROM EMP2;
-- 当然:这里我们没有考虑相同部门薪水相同时的排名问题,如果相同部门存在多个人的薪水一样,那么排名应该相同的话,
-- 这里就不应该用ROW_NUMBER()函数啦,详细处理方法这里略。
-- 如果按提成降序排序(COMM字段排序,结果又会怎么样呢?)
SELECT EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO,
ROW_NUMBER() OVER (PARTITION BY DEPTNO ORDER BY comm DESC) AS cnt
FROM EMP;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO CNT
---------- ---------- --------- ---------- --------- ---------- ---------- ---------- ----------
7782 CLARK MANAGER 7839 09-JUN-81 2450 10 1
7839 KING PRESIDENT 17-NOV-81 5000 10 2
7934 MILLER CLERK 7782 23-JAN-82 1300 10 3
7566 JONES MANAGER 7839 02-APR-81 2975 20 1
7902 FORD ANALYST 7566 03-DEC-81 3000 20 2
7876 ADAMS CLERK 7788 23-MAY-87 1100 20 3
7369 SMITH CLERK 7902 17-DEC-80 800 20 4
7788 SCOTT ANALYST 7566 19-APR-87 3000 20 5
7900 JAMES CLERK 7698 03-DEC-81 950 30 1
7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 2
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 3
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 4
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 5
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30 6
14 rows selected.
-- 我们由DEPTNO=30的数据可以看到:COMM为空的排在非空数据的前面。
-- 所以:如果我们把COMM为空的,看作提成为0的话,我们应该还需要进一步处理,排名才相对正确:
SELECT EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO,
ROW_NUMBER() OVER (PARTITION BY DEPTNO ORDER BY nvl(comm,0) DESC) AS cnt
FROM EMP;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO CNT
---------- ---------- --------- ---------- --------- ---------- ---------- ---------- ----------
7782 CLARK MANAGER 7839 09-JUN-81 2450 10 1
7839 KING PRESIDENT 17-NOV-81 5000 10 2
7934 MILLER CLERK 7782 23-JAN-82 1300 10 3
7566 JONES MANAGER 7839 02-APR-81 2975 20 1
7902 FORD ANALYST 7566 03-DEC-81 3000 20 2
7876 ADAMS CLERK 7788 23-MAY-87 1100 20 3
7369 SMITH CLERK 7902 17-DEC-80 800 20 4
7788 SCOTT ANALYST 7566 19-APR-87 3000 20 5
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 1
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 2
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 3
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30 4
7900 JAMES CLERK 7698 03-DEC-81 950 30 5
7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 6
14 rows selected.