Hive动态分区insert 数据写入HDFS时分区字段为NULL失败

MR_青木 2018-12-05 11:34:56
问题描述:
将全国信息按照省级单位动态分区,然后设置分桶,在最后将结果写入HDFS文件系统中失败,只生成了一个分区

建表语句:
create table position_fenqufentong (id int,city_id bigint,city_name string,county_id bigint,county_name string,town_id bigint,town_name string,village_id bigint,village_name string) partitioned by (province_name string) clustered by (city_name) sorted by(city_id) into 3 buckets row format delimited fields terminated by '\t' collection items terminated by '\t' lines terminated by '\n' stored as textfile location '/user/hive/warehouse/chinamap.db/fenqufentong_position';

装载数据
from position_all insert into table position_fenqufentong partition(province_name) select id,city_id,city_name,county_id,county_name,town_id,town_name,village_id,village_name,province_name;

运行结果
Query ID = hadoop_20181205104902_9ed34916-2369-46fa-9186-41a3d9f83911
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1543976396107_0001, Tracking URL = http://hadoop3:8088/proxy/application_1543976396107_0001/
Kill Command = /opt/modules/hadoop-2.6.5/bin/hadoop job -kill job_1543976396107_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
2018-12-05 10:49:43,280 Stage-1 map = 0%, reduce = 0%
2018-12-05 10:50:17,859 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 9.04 sec
2018-12-05 10:50:18,908 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 9.09 sec
2018-12-05 10:51:19,553 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 9.09 sec
2018-12-05 10:51:56,830 Stage-1 map = 100%, reduce = 22%, Cumulative CPU 10.1 sec
2018-12-05 10:52:04,219 Stage-1 map = 100%, reduce = 44%, Cumulative CPU 11.81 sec
2018-12-05 10:52:12,488 Stage-1 map = 100%, reduce = 56%, Cumulative CPU 19.93 sec
2018-12-05 10:52:25,281 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 28.27 sec
2018-12-05 10:52:39,520 Stage-1 map = 100%, reduce = 78%, Cumulative CPU 29.07 sec
2018-12-05 10:52:41,611 Stage-1 map = 100%, reduce = 89%, Cumulative CPU 29.78 sec
2018-12-05 10:53:12,294 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 37.19 sec
MapReduce Total cumulative CPU time: 37 seconds 190 msec
Ended Job = job_1543976396107_0001
Loading data to table chinamap.position_fenqufentong partition (province_name=null)
Failed with exception org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter partition. alter is not possible
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 3 Cumulative CPU: 38.31 sec HDFS Read: 82878437 HDFS Write: 72514675 SUCCESS
Total MapReduce CPU Time Spent: 38 seconds 310 msec


...全文
1088 4 打赏 收藏 转发到动态 举报
写回复
用AI写文章
4 条回复
切换为时间正序
请发表友善的回复…
发表回复
JagerZhang 2021-11-19
  • 打赏
  • 举报
回复

动态分区分区字段是根据指定的province_name动态生成的,数据源position_all表中的province_name字段都是非空的,所以不知道这个null是从哪里来的

n421529963 2021-03-27
  • 打赏
  • 举报
回复
请问如何解决Unable to alter partition. alter is not possible,我是用spark写入分区表,相同的数据执行了两次,第二次就报这个错。
LinkSe7en 2018-12-05
  • 打赏
  • 举报
回复
分区字段还敢给null,你心也是够大。IFNULL(partition_column,'未知') 应该不难吧
MR_青木 2018-12-05
  • 打赏
  • 举报
回复
动态分区分区字段是根据指定的province_name动态生成的,数据源position_all表中的province_name字段都是非空的,所以不知道这个null是从哪里来的

20,810

社区成员

发帖
与我相关
我的任务
社区描述
Hadoop生态大数据交流社区,致力于有Hadoop,hive,Spark,Hbase,Flink,ClickHouse,Kafka,数据仓库,大数据集群运维技术分享和交流等。致力于收集优质的博客
社区管理员
  • 分布式计算/Hadoop社区
  • 涤生大数据
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧