关于从多对多影射到一对一影射。

Generics 2007-10-27 12:24:59

这个问题的具体描述请参照：

http://topic.csdn.net/u/20070928/01/bca1ef8c-511e-41f7-a254-31c3208cf802.html

这个问题好象wzxhm已经基本解决了。他的存储过程虽然有问题（不能处理距离相同的情况。我虽然说过距离相同随便取一条，但是他的QUERY却全部保留了。）

我修改了他的QUERY，　昨天测试了一个20万条数据的表。发现用C#程序要至少需要十几分钟（我已经优化了C#程序了，用二叉树来优化搜寻和插入过程，一边处理数据一边删除原表，　可是还是很慢，　无奈），用wzxhm的那种方法却不到一分钟。

cadenza7的方法我不大熟悉，　不过感觉好象不大对，因为在cadenza7的代码里没看到dist起什么作用。没有dist根据什么删的呀？

...全文

151 15 打赏收藏转发到动态举报

写回复

用AI写文章

15 条回复

切换为时间正序

请发表友善的回复…

发表回复

fcuandy 2007-11-06

打赏
举报

A
即,所谓的 "出现 "一词是针对于所有记录,还是只针对于保留过的记录?
B
若针对所有记录,则第三行删除, 若只针对保留记录, 那么第3行保留.

如果按B的话,会快些,按A根本快不起来,只能一行行处理.
对于dist的重复行,就凭一个dist,道理跟"健信息不足一样", 随机保留一条,或按规则保留一条,需要借助一些东西才能删除,而增加所借助的这点东西,效率绝对是数量级的下降.
我自知写不出来效率高的东西,还是不写了.

Generics 2007-11-06

打赏
举报

请问结贴有什么规定时间么? 这个贴没有我满意的回答, 可我又想给点分给一些人, 怎么办?

Generics 2007-11-06

打赏
举报

楼上的都辛苦了, 估计wzxhm的办法就是最佳答案了. wzxhm, 进来再接50分吧.

Generics 2007-11-05

打赏
举报

否则的话, id1=1003永远都取不了值了.

Generics 2007-11-05

打赏
举报

第二行的id2跟前面重了, 当然要删除, 第三行的id1和id2在前面保留的行里没有出现过, 当然要保留啦.

fcuandy 2007-11-01

打赏
举报

假设按dist排序后是这样的记录
id1 id2
1001 1002
1003 1002
1003 1004
那么第一行保留
第二行 id2值出现过删除
第三行 id1值在保留记录里没出现,在删除记录里出现过, 这种情况怎么取. 只想知道这个.

即,所谓的"出现"一词是针对于所有记录,还是只针对于保留过的记录?

若针对所有记录,则第三行删除, 若只针对保留记录, 那么第3行保留.

请楼主说明这个.

winjay84 2007-11-01

打赏
举报

恩。好的。谢谢！
游标的效率清晰可见了。

honey52570 2007-10-31

打赏
举报

Generics 2007-10-30

打赏
举报

很抱歉，　刚刚测试了一下，　感觉不可行。我的测试数据是18万条数据，其中要删除8万条，用wzxhm的方法只需要一分钟（也许是因为建索引的原因？我也注意到那个LOOP只需要循环三四次就可以了），　用C#程序需要15分钟（我用multi-thread, 一边查一边删，一次删1000条的），用你的方法，等了30分钟没见结果，而且还不知道结果对不对（前面两种方法结果一样）。

winjay84 2007-10-30

打赏
举报

楼主不来了吗？
我想知道，是否可行？可行的话，效果怎么样？

winjay84 2007-10-29

打赏
举报

我尝试了用游标做，可以实现去重。你测试看看性能如何。

-- 测试数据(由你的给的地址上取的，那个典型例子)



create   table   TA(ID1   numeric(18,0),   ID2   numeric(18,0),   dist   float) 



insert into TA

select 10001, 11101, 0.01 

union all select 10001, 11102, 0.21 

union all select 10001, 11103, 0.31 

union all select 10001, 11104, 0.41 

union all select 10002, 11101, 0.12 

union all select 10002, 11102, 0.32 

union all select 10002, 11103, 0.52 

union all select 10002, 11104, 0.72 

union all select 10003, 11101, 0.23 

union all select 10003, 11102, 0.43 

union all select 10003, 11103, 0.63 

union all select 10003, 11104, 0.83 

union all select 10004, 11101, 0.64 

union all select 10004, 11102, 0.44 

union all select 10004, 11103, 0.24 

union all select 10004, 11104, 0.14

-- 创建存储过程



create proc proc_ta

as 

declare @ID1 numeric(18,0),@ID2 numeric(18,0),@dist float



begin

	if exists (select * from dbo.sysobjects 

      		where id = object_id(N'#ta') and xtype = 'U')

		drop table #ta



	create table #ta(tmp_ID1 numeric(18,0),tmp_ID2 numeric(18,0),tmp_dist float)



	declare ta_cursor cursor for

	select ID1,ID2,dist 

	from TA

	order by dist



	open ta_cursor



	fetch next from ta_cursor

	into @ID1,@ID2,@dist



	while @@fetch_status = 0

	begin

	        if not exists(select 1 from #ta where tmp_ID1 = @ID1)

		begin

			if not exists(select 1 from #ta where tmp_ID2 = @ID2)

				insert into #ta(tmp_ID1,tmp_ID2,tmp_dist)

				select @ID1,@ID2,@dist

		end 



		fetch next from ta_cursor

		into @ID1,@ID2,@dist

	end 



	close ta_cursor

	deallocate ta_cursor



	select * from #ta



end

-- 执行存储过程

exec proc_ta

-- 结果
tmp_ID1 tmp_ID2 tmp_dist
-------------------- -------------------- -------------------------
10001 11101 0.01
10004 11104 0.14000000000000001
10002 11102 0.32000000000000001
10003 11103 0.63

（所影响的行数为 4 行）

Limpire 2007-10-27

打赏
举报

完全看明白了，上面的有很大漏洞，==，完全測試正確再貼上來。

Limpire 2007-10-27

打赏
举报

請提供距离相同的一些數據，給出要的結果，我再測試，這個問題應該不難處理。

Limpire 2007-10-27

打赏
举报

declare @T table (ID1 numeric(18,0),ID2 numeric(18,0),dist decimal(18,2))

insert @T

select 10001,1110001,0.0

UNION ALL select 10002,1110003,1.0

UNION ALL select 10003,1110004,3.0

UNION ALL select 10004,1110005,10.0

UNION ALL select 10008,1110007,0.5

UNION ALL select 10005,1110003,0.8

UNION ALL select 10007,1110002,0.2

UNION ALL select 10009,1110008,7.0

UNION ALL select 10008,1110006,0.8

UNION ALL select 10006,1110007,0.51



/*

最后结果应该是

10001,   1110001,   0

10007,   1110002,   0.2

10008,   1110007,   0.5

10005,   1110003,   0.8

10003,   1110004,   3

10009,   1110008,   7

10004,   1110005,   10

*/



select * from @T a where not exists (select 1 from @T where dist<a.dist and (ID1=a.ID1 or ID2=a.ID2)) order by dist



/*

ID1                  ID2                  dist                 

-------------------- -------------------- -------------------- 

10001                1110001              .00

10007                1110002              .20

10008                1110007              .50

10005                1110003              .80

10003                1110004              3.00

10009                1110008              7.00

10004                1110005              10.00

*/