一道面试题目

huangcaibing 2003-10-20 10:29:03

两个表A、B
结构相同，都有主键列ID,
每个表都有20万条以上得数据，如何快速获得在B表中存在而在A表不存在的数据？

...全文

88 26 打赏收藏转发到动态举报

写回复

用AI写文章

26 条回复

切换为时间正序

请发表友善的回复…

发表回复

isaacchang 2004-01-12

打赏
举报

我觉得这个最快
select * from b
where id in
(select id from b minus select id from a);

sunluo 2004-01-12

打赏
举报

为什么没有支持 robixiao(阿喜) ?
select b.*
from a,
b
where a.id(+) = b.id
and a.id is null
如果ID有索引的,数据库环境、系统环境都相同的话,这样是最快的!

sagac 2004-01-12

打赏
举报

select id from b where not exists (select 1 from a where id=b.id);

抄也要抄个正确的。呵呵！

zzzstar 2004-01-12

打赏
举报

看看这个吧！答案就在这里......

---------------------------------------------------------------
You Asked (Jump to Tom's latest followup)

Tom:

can you give me some example at which situation
IN is better than exist, and vice versa.

and we said...

Well, the two are processed very very differently.

Select * from T1 where x in ( select y from T2 )

is typically processed as:

select *
from t1, ( select distinct y from t2 ) t2
where t1.x = t2.y;

The subquery is evaluated, distinct'ed, indexed (or hashed or sorted) and then
joined to the original table -- typically.

As opposed to

select * from t1 where exists ( select null from t2 where y = x )

That is processed more like:

for x in ( select * from t1 )
loop
if ( exists ( select null from t2 where y = x.x )
then
OUTPUT THE RECORD
end if
end loop

It always results in a full scan of T1 whereas the first query can make use of
an index on T1(x).

So, when is where exists appropriate and in appropriate?

Lets say the result of the subquery
( select y from T2 )

is "huge" and takes a long time. But the table T1 is relatively small and
executing ( select null from t2 where y = x.x ) is very very fast (nice index on
t2(y)). Then the exists will be faster as the time to full scan T1 and do the
index probe into T2 could be less then the time to simply full scan T2 to build
the subquery we need to distinct on.

Lets say the result of the subquery is small -- then IN is typicaly more
appropriate.

If both the subquery and the outer table are huge -- either might work as well
as the other -- depends on the indexes and other factors.

S海鸥 2004-01-12

打赏
举报

study!

chanet 2004-01-11

打赏
举报

2.select * from b
minus
select * from a

diamondex 2003-10-25

打赏
举报

mark

csyw 2003-10-25

打赏
举报

yuxuan 2003-10-24

打赏
举报

三种方法
1.select a.id from a,b where b.id=a.id(+) and a.id is null

2.select * from b
minus
select * from a

3.select id from b where not exists (select 1 from a where id=b.id);

oracs 2003-10-24

打赏
举报

我觉得最简单的办法就是
select * from b
minus
select * from a

aceplus 2003-10-24

打赏
举报

mark

yangqingdelphi 2003-10-22

打赏
举报

oralce性能优化中讲应该是用exists比in快.
select id from b where not exists (select 1 from a where id=b.id);
to: youren537(youren537 你的测试时间不能作为依据.因为你第一次用的是exist然后再用in，其实数据都在缓存中了.in当然快了。你要看执行计划.

yzy 2003-10-22

打赏
举报

我们一致裁定
select id from b where not exists (select 1 from a where id=b.id);
是标准答案!
大家鼓掌~~~~啪.啪.啪.啪.啪.啪.啪.啪.

huangcaibing 2003-10-22

打赏
举报

我也是一头雾水,我测试一下再揭贴啊

北极星2013 2003-10-21

打赏
举报

pengdali(大力 V3.0) 的办法不错，应该是最快的啦。

suleen 2003-10-21

打赏
举报

select id from a 是全表扫描,应该是用not exists快.

youren537 2003-10-21

打赏
举报

写倒了，应该是
select id from b
where not exists (select a.id from a where a.id = b.id);

两表各超过35万条记录，执行时间9.133seconds 出结果。

select id from b where not in (select id from a);

两表各超过35万条记录，执行时间
Executed in 2.514 seconds 出结果。

robixiao 2003-10-21