SQL SERVER 随机取不重复数据问题

VcLuoPneg 上海华伊网络技术有限公司 ASP.NET工程师  2015-12-17 01:28:30
假设[A]表有100万条数据,具体数据为1到1000000,如下

表 [A] 数据
-----------------------------------------------------------------------
1
2
3
4
.....
.....
.....
1000000
----------------------------------------------------------------------
另外有[B]表,[B]表有若干不重复数据数据,最小数据为1最大为1000000,如下
表 [B] 数据
-----------------------------------------------------------------------
5
55
153
14
32
500000
333
.....
N条不重复数据数据,最小为1,最大为1000000
-----------------------------------------------------------------------
现在要从[A]表每隔500条数据中要取1条数据插入[B]表,且该数据在[B]表不存在
如;
----------------------------------------------------------------------
1到500取一条
501到1000取一条
1001到1500取一条
1501到2000取一条
.....
.....
.....
999501到1000000取一条
----------------------------------------------------------------------
现在是用 WHILE 循环2000次取得数据
效率非常低,核心代码如下
INSERT INTO [B] (NUMBER) SELECT TOP 1 NUMBER FROM A where NUMBER NOT IN(SELECT NUMBER FROM B ) and BETWEEN (N*500) +1 AND ((N+1)*500)+1 ORDER BY NewID() ASC;--如果没有结果需要插入,则跳过不管
SET N+=1;
请问有没有优化的可能,现在执行时间有点久大概要45秒左右.
先感谢各位.
...全文
221 点赞 收藏 5
写回复
5 条回复
切换为时间正序
当前发帖距今超过3年,不再开放新的回复
发表回复
VcLuoPneg 2015-12-17
引用 3 楼 Tiger_Zhao 的回复:
-- 两个参数,例子减小了规模
DECLARE @max_number int -- 总数
DECLARE @group_size int -- 分组大小
SET @max_number = 100
SET @group_size = 10

-- 测试数据
DECLARE @A TABLE(number int)
DECLARE @B TABLE(number int)

INSERT INTO @A
SELECT number
  FROM master..spt_values
 WHERE type = 'p'
   AND number BETWEEN 1 AND @max_number

INSERT INTO @B
    SELECT TOP 7 number
      FROM master..spt_values
     WHERE type = 'p'
       AND number BETWEEN 1 AND @max_number
  ORDER BY NEWID()

SELECT * FROM @b ORDER BY number

-- 把下面INSERT前的注释去掉就是你要的语句
;WITH t1 AS (
    SELECT number - 1 AS group_no
      FROM master..spt_values
     WHERE type = 'p'
       AND number BETWEEN 1 AND (@max_number/@group_size)
)
,t2 AS (
    SELECT (a.number - 1)/@group_size AS group_no,
           a.number
      FROM @A a
     WHERE NOT EXISTS (SELECT *
                         FROM @B b
                        WHERE b.number = a.number)
)
--INSERT INTO @B (NUMBER) 
        SELECT tb.number
          FROM t1
   CROSS APPLY (
                    SELECT TOP 1 *
                      FROM t2
                     WHERE t2.group_no = t1.group_no
                  ORDER BY NEWID()
               ) tb
      ORDER BY tb.number
已有的B表数据
     number
-----------
         21
         22
         25
         33
         34
         43
         54
将要插入的B表数据
     number
-----------
          6
         11
         23
         37
         50
         51
         62
         71
         83
         97
同样感谢,小数据量快,但是到100万效率比WHILE 循环还慢,不过依然很感谢.
回复
VcLuoPneg 2015-12-17
引用 1 楼 zbdzjx 的回复:
试试看效果,再根据实际情况改。
with table1 as  --生成1000000连续数
(
select a.number*1000+b.number+1 allnum from 
(select * from master..spt_values where type='P' and number<1000)a,
(select * from master..spt_values where type='P' and number<1000)b
)
, table2 as  --随机生成近50000条记录
(
select distinct round(rand(checksum(newid()))*1000000,0) sumenum from 
(select * from master..spt_values where type='P' and number<50)a,
(select * from master..spt_values where type='P' and number<1000)b
)
select * from (
select allnum
, ROW_NUMBER() over(partition by round((allnum-1)/500, 0) order by rand(checksum(newid()))) rn
from table1 
where allnum not in (select sumenum from table2) 
) aa
where rn=1
order by allnum
一秒不到就能完成,非常感谢.虽然对代码不是非常理解. 我的SQLEXPRESS 竟然不能执行.感谢.
回复
Tiger_Zhao 2015-12-17
-- 两个参数,例子减小了规模
DECLARE @max_number int -- 总数
DECLARE @group_size int -- 分组大小
SET @max_number = 100
SET @group_size = 10

-- 测试数据
DECLARE @A TABLE(number int)
DECLARE @B TABLE(number int)

INSERT INTO @A
SELECT number
FROM master..spt_values
WHERE type = 'p'
AND number BETWEEN 1 AND @max_number

INSERT INTO @B
SELECT TOP 7 number
FROM master..spt_values
WHERE type = 'p'
AND number BETWEEN 1 AND @max_number
ORDER BY NEWID()

SELECT * FROM @b ORDER BY number

-- 把下面INSERT前的注释去掉就是你要的语句
;WITH t1 AS (
SELECT number - 1 AS group_no
FROM master..spt_values
WHERE type = 'p'
AND number BETWEEN 1 AND (@max_number/@group_size)
)
,t2 AS (
SELECT (a.number - 1)/@group_size AS group_no,
a.number
FROM @A a
WHERE NOT EXISTS (SELECT *
FROM @B b
WHERE b.number = a.number)
)
--INSERT INTO @B (NUMBER)
SELECT tb.number
FROM t1
CROSS APPLY (
SELECT TOP 1 *
FROM t2
WHERE t2.group_no = t1.group_no
ORDER BY NEWID()
) tb
ORDER BY tb.number

已有的B表数据
     number
-----------
21
22
25
33
34
43
54

将要插入的B表数据
     number
-----------
6
11
23
37
50
51
62
71
83
97
回复
xdashewan 2015-12-17
首先去重复我不多说了,无非就是not exist,然后对去重后的[A],按NUMBER / 100后整数做分组排序,order by NewID(),取每个分组的top 1
回复
zbdzjx 2015-12-17
试试看效果,再根据实际情况改。
with table1 as  --生成1000000连续数
(
select a.number*1000+b.number+1 allnum from 
(select * from master..spt_values where type='P' and number<1000)a,
(select * from master..spt_values where type='P' and number<1000)b
)
, table2 as  --随机生成近50000条记录
(
select distinct round(rand(checksum(newid()))*1000000,0) sumenum from 
(select * from master..spt_values where type='P' and number<50)a,
(select * from master..spt_values where type='P' and number<1000)b
)
select * from (
select allnum
, ROW_NUMBER() over(partition by round((allnum-1)/500, 0) order by rand(checksum(newid()))) rn
from table1 
where allnum not in (select sumenum from table2) 
) aa
where rn=1
order by allnum
回复
相关推荐
发帖
疑难问题
创建于2007-09-28

2.1w+

社区成员

MS-SQL Server 疑难问题
申请成为版主
帖子事件
创建了帖子
2015-12-17 01:28
社区公告
暂无公告