关于优化sql的疑惑

suixingliu 2010-11-15 04:04:54

SQL> select * from scott.emp e1 where e1.sal>(select avg(sal) from scott.emp e2 where e1.deptno=e2.deptno);

已用时间: 00: 00: 00.00

执行计划
----------------------------------------------------------
Plan hash value: 2649664444

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 348 | 6 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL | EMP | 15 | 1305 | 3 (0)| 00:00:01 |
| 3 | SORT AGGREGATE | | 1 | 26 | | |
|* 4 | TABLE ACCESS FULL| EMP | 1 | 26 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("E1"."SAL"> (SELECT AVG("SAL") FROM "SCOTT"."EMP" "E2"
WHERE "E2"."DEPTNO"=:B1))
4 - filter("E2"."DEPTNO"=:B1)

Note
-----
- dynamic sampling used for this statement

统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
28 consistent gets
0 physical reads
0 redo size
992 bytes sent via SQL*Net to client
415 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4 rows processed

SQL> select * from scott.emp e1,(select e2.deptno deptno,avg(e2.sal) avg_sal from scott.emp e2 group by deptno) dept2 where e1.deptno=dept2.deptno and e1.sal>dept2.avg_sal;

已用时间: 00: 00: 00.00

执行计划
----------------------------------------------------------
Plan hash value: 2230095667

-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 452 | 8 (25)| 00:00:01 |
|* 1 | HASH JOIN | | 4 | 452 | 8 (25)| 00:00:01 |
| 2 | TABLE ACCESS FULL | EMP | 15 | 1305 | 3 (0)| 00:00:01 |
| 3 | VIEW | | 15 | 390 | 4 (25)| 00:00:01 |
| 4 | HASH GROUP BY | | 15 | 390 | 4 (25)| 00:00:01 |
| 5 | TABLE ACCESS FULL| EMP | 15 | 390 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("E1"."DEPTNO"="DEPT2"."DEPTNO")
filter("E1"."SAL">"DEPT2"."AVG_SAL")

Note
-----
- dynamic sampling used for this statement

统计信息
----------------------------------------------------------
350 recursive calls
2 db block gets
125 consistent gets
0 physical reads
0 redo size
1184 bytes sent via SQL*Net to client
415 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
12 sorts (memory)
0 sorts (disk)
4 rows processed

理论上应该是第一种方法效率低才对，为什么我的结果是这样？请高手指点。

...全文

226 19 打赏收藏转发到动态举报

写回复

用AI写文章

19 条回复

切换为时间正序

请发表友善的回复…

发表回复

suixingliu 2010-11-16

打赏
举报

[Quote=引用 10 楼 xman_78tom 的回复:]
lz 说的还是从语句上获取的执行步骤，不是实际查询计划的步骤，而 ORACLE 是按查询计划的步骤执行 SQL 语句的。

从前一条语句的查询计划看，SQL 语句的执行顺序为：
1、对 EMP 表执行全表扫描，获取每一行数据并将 deptno 值作为条件用于子查询；
2、对 EMP 表执行全表扫描，并使用步骤 1 获取 deptno 值对结果进行筛选；
3、对步骤 3 获取的查询结果执行……
[/Quote]
我今天一开机就试了一下昨天的东西，第一遍的结果和我的推测相同，就是红色的部分，第二遍的结果与第一遍的不同，蓝色的。必定是从数据库缓冲区得到的结果，没有价值。

昨天之所以会出现不正常的情况，是我操作的失误。所比较的结果必须都是在数据库缓冲区中没有的，也就是从来没有执行过的才行。为了验证我的想法，我又重启了服务，结果得到的结果和上面的基本相同。

suixingliu 2010-11-16

打赏
举报

SQL> select * from scott.emp e1,(select deptno dno,avg(sal) a_sal from scott.emp group by deptno) e2 where e1.deptno=e2.dno and e1.sal>e2.a_sal;

执行计划
----------------------------------------------------------
Plan hash value: 269884559

-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 58 | 8 (25)| 00:00:01 |
|* 1 | HASH JOIN | | 1 | 58 | 8 (25)| 00:00:01 |
| 2 | VIEW | | 3 | 78 | 4 (25)| 00:00:01 |
| 3 | HASH GROUP BY | | 3 | 15 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL| EMP | 15 | 75 | 3 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | EMP | 15 | 480 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("E1"."DEPTNO"="E2"."DNO")
filter("E1"."SAL">"E2"."A_SAL")

统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
14 consistent gets
0 physical reads
0 redo size
1178 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4 rows processed

SQL>

suixingliu 2010-11-16

打赏
举报

SQL> select * from scott.emp e1 where e1.sal>(select avg(sal) from scott.emp e2 where e1.deptno=e2.deptno);

执行计划
----------------------------------------------------------
Plan hash value: 1245077725

--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 58 | 8 (25)| 00:00:01 |
|* 1 | HASH JOIN | | 1 | 58 | 8 (25)| 00:00:01 |
| 2 | VIEW | VW_SQ_1 | 3 | 78 | 4 (25)| 00:00:01 |
| 3 | HASH GROUP BY | | 3 | 21 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL| EMP | 15 | 105 | 3 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | EMP | 15 | 480 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("E1"."DEPTNO"="ITEM_1")
filter("E1"."SAL">"AVG(SAL)")

统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
14 consistent gets
0 physical reads
0 redo size
992 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4 rows processed

suixingliu 2010-11-16

打赏
举报

SQL> select * from scott.emp e1,(select deptno dno,avg(sal) a_sal from scott.emp group by deptno) e2 where e1.deptno=e2.dno and e1.sal>e2.a_sal;
执行计划
----------------------------------------------------------
Plan hash value: 269884559

-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 58 | 8 (25)| 00:00:01 |
|* 1 | HASH JOIN | | 1 | 58 | 8 (25)| 00:00:01 |
| 2 | VIEW | | 3 | 78 | 4 (25)| 00:00:01 |
| 3 | HASH GROUP BY | | 3 | 15 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL| EMP | 15 | 75 | 3 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | EMP | 15 | 480 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("E1"."DEPTNO"="E2"."DNO")
filter("E1"."SAL">"E2"."A_SAL")

统计信息
----------------------------------------------------------
256 recursive calls
0 db block gets
60 consistent gets
0 physical reads
0 redo size
1178 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
4 rows processed

suixingliu 2010-11-16

打赏
举报

连接到:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> set autotrace traceonly
SQL> select * from scott.emp e1 where e1.sal>(select avg(sal) from scott.emp e2 where e1.deptno=e2.deptno);

执行计划
----------------------------------------------------------
Plan hash value: 1245077725

--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 58 | 8 (25)| 00:00:01 |
|* 1 | HASH JOIN | | 1 | 58 | 8 (25)| 00:00:01 |
| 2 | VIEW | VW_SQ_1 | 3 | 78 | 4 (25)| 00:00:01 |
| 3 | HASH GROUP BY | | 3 | 21 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL| EMP | 15 | 105 | 3 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | EMP | 15 | 480 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("E1"."DEPTNO"="ITEM_1")
filter("E1"."SAL">"AVG(SAL)")

统计信息
----------------------------------------------------------
720 recursive calls
0 db block gets
136 consistent gets
15 physical reads
0 redo size
992 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
9 sorts (memory)
0 sorts (disk)
4 rows processed

xman_78tom 2010-11-16

打赏
举报

我也觉得很纳闷，在基于代价的优化器上，这两条语句的执行计划应该是相同的。只有在基于规则的优化器上，前一条语句的执行计划才会如 1 楼。

不过使用基于规则的优化器，子查询一般情况下要比关联效率更高，这个可能与访问路径选择有关。

xman_78tom 2010-11-15

打赏
举报

lz 说的还是从语句上获取的执行步骤，不是实际查询计划的步骤，而 ORACLE 是按查询计划的步骤执行 SQL 语句的。

从前一条语句的查询计划看，SQL 语句的执行顺序为：
1、对 EMP 表执行全表扫描，获取每一行数据并将 deptno 值作为条件用于子查询；
2、对 EMP 表执行全表扫描，并使用步骤 1 获取 deptno 值对结果进行筛选；
3、对步骤 3 获取的查询结果执行聚合操作，获取 avg(sal) 值用于外层查询；
4、使用步骤 3 获取的 avg(sal) 值对步骤 1 获取的表行进行筛选。

从查询计划看对 EMP 表只执行了 2 次表扫描；而各个步骤所产生的结果缓存在 PGA 中供下一步使用。

suixingliu 2010-11-15

打赏
举报

[Quote=引用 8 楼 xman_78tom 的回复:]
应该从查询计划上分析执行步骤，而不能单从语句让分析执行步骤。

从查询计划上看，两条语句有相似的执行步骤，是 (select avg(sal) from scott.emp e2 where e1.deptno=e2.deptno) 查询结果与 scott.emp 表进行关联，而不是每从 scott.emp 表取一条记录，就执行一次子查询。

其实，在 oracle 10g 上使用“基于代……
[/Quote]
我觉得你说的不对，使用子查询怎么能使效率变高呢？我觉得是变低。因为sql是从右往左执行的，所以子查询要执行很多次才能和主查询完全匹配完毕。对表的扫描次数多，所以效率低。相反，使用内嵌视图的话，二者的查询都只用执行一次，对表的扫描只进行2次。多以效率高。
你说“由于使用了子查询，前条语句的性能更好。”，你说出你的理由。

xman_78tom 2010-11-15

打赏
举报

应该从查询计划上分析执行步骤，而不能单从语句让分析执行步骤。

从查询计划上看，两条语句有相似的执行步骤，是 (select avg(sal) from scott.emp e2 where e1.deptno=e2.deptno) 查询结果与 scott.emp 表进行关联，而不是每从 scott.emp 表取一条记录，就执行一次子查询。

其实，在 oracle 10g 上使用“基于代价”的优化规则，两条语句的查询计划和代价都相同；使用“基于规则”的优化规则，由于使用了子查询，前条语句的性能更好。

suixingliu 2010-11-15

打赏
举报

我想可能是我的数据量不够多，才会导致这样的结果。要不就是我的Oracle傻了。要不就是我的cpu傻了。不然不会出现这种情况的。

suixingliu 2010-11-15

打赏
举报

[Quote=引用 5 楼 wkc168 的回复:]
引用楼主 suixingliu 的回复:
SQL> select * from scott.emp e1 where e1.sal>(select avg(sal) from scott.emp e2 where e1.deptno=e2.deptno);

已用时间: 00: 00: 00.00

执行计划
--------------------------------------……
[/Quote]
我已经看到了，你看我一楼的黑色部分。

心中的彩虹 2010-11-15

打赏
举报

[Quote=引用楼主 suixingliu 的回复:]
SQL> select * from scott.emp e1 where e1.sal>(select avg(sal) from scott.emp e2 where e1.deptno=e2.deptno);

已用时间: 00: 00: 00.00

执行计划
----------------------------------------------------------
……
[/Quote]
照你这情况以及数据量第一个快些
看红色部分

1.
0 recursive calls
0 db block gets
28 consistent gets
0 physical reads
0 redo size
992 bytes sent via SQL*Net to client
415 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4 rows processed

2
350 recursive calls
2 db block gets
125 consistent gets
0 physical reads
0 redo size
1184 bytes sent via SQL*Net to client
415 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
12 sorts (memory)
0 sorts (disk)
4 rows processed

suixingliu 2010-11-15

打赏
举报

[Quote=引用 3 楼 gelyon 的回复:]
按我来分析，正常情况下第一种效率高，
因为我外层每一个deptno到内层去查，只查询我该deptno对应的avg(sal)，不会查询出其它deptno的信息，因此内层的全表扫描读取数据块就少！这就是你第一个执行计划的 0 db block gets 比第二个少的原因。

对于第二个SQL，虽然执行计划和第一个一样，但读取的数据块却多了，就是因为你两次全表扫描都是扫描出所有deptno的信息，……
[/Quote]
第一种是嵌套子查询，第二种是内嵌式图查询。理论上是第二种效率高。也就是说：第一种是执行M*N次，第二种是执行M+N次。所以理论上是第二种效率高。但是我就不明白了，为什么我跟踪的结果是相反的？

gelyon 2010-11-15

打赏
举报

按我来分析，正常情况下第一种效率高，
因为我外层每一个deptno到内层去查，只查询我该deptno对应的avg(sal)，不会查询出其它deptno的信息，因此内层的全表扫描读取数据块就少！这就是你第一个执行计划的 0 db block gets 比第二个少的原因。

对于第二个SQL，虽然执行计划和第一个一样，但读取的数据块却多了，就是因为你两次全表扫描都是扫描出所有deptno的信息，然后才关联过滤资料的，再求avg(sal)的时候，第二个SQL显然读取的数据块多于第一个，它是查询的所有deptno对应的avg(sal)

suixingliu 2010-11-15

打赏
举报

[Quote=引用 1 楼 gelyon 的回复:]
引用楼主 suixingliu 的回复:
理论上应该是第一种方法效率低才对，为什么我的结果是这样？请高手指点。

SQL code

1、select * from scott.emp e1 where e1.sal>(select avg(sal) from scott.emp e2 where e1.deptno=e2.deptno);

2、select * from ……
[/Quote]

不明白你的意思。

gelyon 2010-11-15

打赏
举报

[Quote=引用楼主 suixingliu 的回复:]
理论上应该是第一种方法效率低才对，为什么我的结果是这样？请高手指点。
[/Quote]



1、select * from scott.emp e1 where e1.sal>(select avg(sal) from scott.emp e2 where e1.deptno=e2.deptno);



2、select * from scott.emp e1,(select e2.deptno deptno,avg(e2.sal) avg_sal from scott.emp e2 group by deptno) dept2 where e1.deptno=dept2.deptno and e1.sal>dept2.avg_sal;





这话你从哪里得到的结果啊？