一个性能优化的问题
前次发帖询问了一个查询“购买了A物品的用户也购买了××物品”的问题,帖子见:
http://community.csdn.net/Expert/topic/4440/4440680.xml?temp=.4613764
经 rardge(Rardge) 解答,已经写出了一个可以工作的版本。
不过在数据表中有20万条数据时,查询一次耗时400毫秒左右,不加LIMIT返回999行,实际可能不止这么多行,999可能是数据库的设置吧。
数据表的模式:
CREATE TABLE mm_view_scene (
scene_id BIGINT NOT NULL REFERENCES scene(id),
member_id BIGINT NOT NULL REFERENCES member(id),
PRIMARY KEY (scene_id, member_id)
);
查询语句:
SELECT scene_id, count(*) AS view_count FROM mm_view_scene WHERE member_id IN
(
SELECT member_id FROM mm_view_scene WHERE scene_id = 30196
)
AND scene_id <> 30196
GROUP BY (scene_id)
HAVING COUNT(*) >= 1
ORDER BY COUNT(*) DESC
LIMIT 5000;
我用explain的结果:
"Limit (cost=5396.39..5396.40 rows=5 width=8)"
" -> Sort (cost=5396.39..5396.89 rows=199 width=8)"
" Sort Key: count(*)"
" -> HashAggregate (cost=5385.31..5388.79 rows=199 width=8)"
" Filter: (count(*) >= 1)"
" -> Hash IN Join (cost=1197.10..5370.97 rows=1912 width=8)"
" Hash Cond: ("outer".member_id = "inner".member_id)"
" -> Seq Scan on mm_view_scene (cost=0.00..3581.21 rows=191216 width=16)"
" Filter: (scene_id <> 30196)"
" -> Hash (cost=1194.69..1194.69 rows=961 width=8)"
" -> Bitmap Heap Scan on mm_view_scene (cost=10.36..1194.69 rows=961 width=8)"
" Recheck Cond: (scene_id = 30196)"
" -> Bitmap Index Scan on mm_view_scene_pkey (cost=0.00..10.36 rows=961 width=0)"
" Index Cond: (scene_id = 30196)"
我想慢的原因应该是WHERE member_id IN...这个地方吧,IN需要一条一条去判断。不知道如何才能优化一下速度,实际使用时,表中记录可能达到千万条。