删除重复数据
看看TOM怎么删除重复记录的.
plz explain how to remove duplicate records from a large table containing about
5 million records in a single run and with a lesser time.
i tried it with following query but it takes 10 hours of time.
delete from test1 where rowid not in (select min(rowid) from test1 group by
rc_no);
even after incraesing the rollback segs tablespace to 7gb
we are not getting desired results and while using not in clause and cursor we
generally come across this kind of problem
thanks
and we said...
I'd generate the set of rowids to delete using analytics and then delete them..
like this:
ops$tkyte@ORA9IR2> create table t as select * from cust;
Table created.
Elapsed: 00:00:03.64
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select count(*), count(distinct cust_seg_nbr) from t;
Elapsed: 00:00:05.30
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> delete from t
2 where rowid in ( select rid
3 from ( select rowid rid,
4 row_number() over
5 (partition by cust_seg_nbr order by
rowid) rn
6 from t
7 )
8 where rn <> 1 )
9 /
1114985 rows deleted.
Elapsed: 00:01:46.06
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select count(*), count(distinct cust_seg_nbr) from t;
As for the RBS -- it'll get as big as it needs to be in order to process the
delete -- every index will make it "larger" and take longer as well (index
maintainence is expensive)
if you are deleting "alot of the rows" you might be better off disabling
indexes, doing the delete and rebuilding them.
OR, creating a new table that just keeps the "right records" and dropping the
old table:
ops$tkyte@ORA9IR2> create table t as select * from cust;
Table created.
Elapsed: 00:00:02.41
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select count(*), count(distinct cust_seg_nbr) from t;
Elapsed: 00:00:04.60
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create table t2
2 as
3 select cust_seg_nbr
4 from ( select t.*, row_number() over (partition by cust_seg_nbr order by
rowid) rn
5 from t
6 )
7 where rn = 1
8 /
Table created.
Elapsed: 00:00:10.93
ops$tkyte@ORA9IR2> drop table t;
Table dropped.
Elapsed: 00:00:00.56
ops$tkyte@ORA9IR2> rename t2 to t;
Table renamed.
Elapsed: 00:00:00.01
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select count(*), count(distinct cust_seg_nbr) from t;