每日一练：表设计、索引设计思路

flairsky 2009-03-03 02:26:49

需求：
1、网址URL作为查询条件，无具体规律，如：https://forum.csdn.net/PointForum/Forum/PostTopic.aspx 形式
2、表数据量在千万级
3、表更新速度很快

问题：
1、如何表设计，是范式还是保持冗余，抑或其他更有效的方法
2、如何有效设计索引

实现需求不难，难的是如何最好的实现需求。

...全文

505 36 打赏收藏转发到动态举报

写回复

用AI写文章

36 条回复

切换为时间正序

请发表友善的回复…

发表回复

htl258_Tony 2009-03-05

打赏
举报

学习

flairsky 2009-03-05

打赏
举报

[Quote=引用 27 楼 sigmod 的回复:]
用 Hash:

--建hash value列
alter table TAB
add URLHash as checksum(URL);
go

--在hash value上创建clustered index
-- create a non-unqiue clustered index on the hashed column for collisions.
create clustered index IX_URLHash
on TAB(URLHash);
go

--查询处理
select customerId
from TAB
where URLHASH = checksum(@queryURL)
and URL = @queryURL

任何index无非是filter and refine的原理, …
[/Quote]

拍案而起，等的就是这个，我总觉得是有个方法，就是没想起来

nzperfect 2009-03-04

打赏
举报

接分。

一品梅 2009-03-04

打赏
举报

其实表设计最难

orochi_gao 2009-03-04

打赏
举报

帮顶，鱼和熊掌不可兼得，DBA需要做的就是衡量这个。如果url这个栏位不建索引那千万级数据量查询就无从谈效率。
lz能不能考虑在数据导入端做个条件，长度大于一定的就另进一个表，用户在UI上查询时也通过长度条件判断查询不同的表？

flairsky 2009-03-04

打赏
举报

都不考虑在url上建索引实际效果问题么？

url很长，我并不觉得url上直接建索引是好方法，但间接做索引似乎消耗更大

orochi_gao 2009-03-04

打赏
举报

帮顶下吧。mssql千万级频繁更新(应该说是插入吧)没处理过。
不过听lz的描述需求，如果没有别的，我想就到第二范式就可以了，mssql应该可以胜任的，建主键+url索引或建覆盖索引都可以吧。
另:不过oracle可以适应(偶曾经在6000W的生产log表的时间栏位上建过索引以方便查询，效果还是不错的，此log表每天有30-200万的新数据量。机器配置我忘了HP的刀片)

yeah86 2009-03-04

打赏
举报

有时范式不一定要满足，关键是要符合自己的查询要求。不一定要刻意遵守

lingyin55 2009-03-04

打赏
举报

mark

flairsky 2009-03-04

打赏
举报

顶一下，有么有更好的办法？

ljluck7687 2009-03-04

打赏
举报

表设计符合第三范式就可以了，要尽量避免数据冗余。（数据冗余时要拆分表）
网址URL可建索引

ws_hgo 2009-03-04

打赏
举报

强烈关注
此贴

Zoezs 2009-03-04

打赏
举报

学习了

Garnett_KG 2009-03-04

打赏
举报

[Quote=引用 27 楼 sigmod 的回复:]
用 Hash:

--建hash value列
alter table TAB
add URLHash as checksum(URL);
go

--在hash value上创建clustered index
-- create a non-unqiue clustered index on the hashed column for collisions.
create clustered index IX_URLHash
on TAB(URLHash);
go

--查询处理
select customerId
from TAB
where URLHASH = checksum(@queryURL)
and URL = @queryURL

任何index无非是filter and…
[/Quote]

赞同。

顶你！（不仅仅为你的头像）

sigmod 2009-03-04

打赏
举报

用 Hash:

--建hash value列
alter table TAB
add URLHash as checksum(URL);
go

--在hash value上创建clustered index
-- create a non-unqiue clustered index on the hashed column for collisions.
create clustered index IX_URLHash
on TAB(URLHash);
go

--查询处理
select *
from TAB
where URLHash = checksum(@queryURL)
and URL = @queryURL
go

--任何index无非是filter and refine的原理, hash来处理url比较合适，collision不会太多

[Quote=引用 27 楼 sigmod 的回复:]
用 Hash:

--建hash value列
alter table TAB
add URLHash as checksum(URL);
go

--在hash value上创建clustered index
-- create a non-unqiue clustered index on the hashed column for collisions.
create clustered index IX_URLHash
on TAB(URLHash);
go

--查询处理
select customerId
from TAB
where URLHASH = checksum(@queryURL)
and URL = @queryURL

任何index无非是filter and…
[/Quote]

sigmod 2009-03-04