hive bucket for parallelism?
yfk 博客专家认证 2012-08-02 09:30:10 http://www.slideshare.net/ragho/hive-user-meeting-august-2009-facebook
第8页提到:
buckets: split data base on hash of a column -- mainly for parallelism
此处不解,hive基于hadoop,hadoop本身就会将文件分块,为什么说bucket按列分块主要是为了提高并行度呢?
求教大牛!!