hive 建表中group by 和over partition by运行效率比较
数据量级大概4kw左右,我想要聚合不同维度的统计值,简单来说,想要聚合:
本公司销售件数;
同行业平均销售件数;
本省同行业平均销售件数;
本市同行业平均销售件数:
两种方法:1.group by
分别left join 4次,大致内容:
(select company,count(1) from t
group by company) t1
on 条件1
left join
(select category,count(1) from t
group by category) t2
on 条件2
left join
(select province,category,count(1) from t
group by province,category) t3
on 条件3
left join
(select city,category,count(1) from t
group by city,category) t4
on 条件4
2.over partition by
select company,
count(1) over (partition by name) ,
count(1) over (partition by category),
count(1) over (partition by province,category),
count(1) over (partition by city,category)
from t
求指教哪一种效率更高呢?为啥啊。。。如果想写的内容少一点的话肯定会用 partition by....