使用自己写的MR程序Bulkload时出现的问题

the_gunner 2014-06-20 02:21:30
自己写了MR程序生成HFile,然后使用bulkload命令导入HBase的表
在数据量小的时候没什么问题,但是数据量稍大后,导入过程变得比较慢,并且导入过程中HBase管理界面无法查看,并报以下信息


14/06/20 14:01:32 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x46b795deb9000b, likely server has closed socket, closing socket connection and attempting reconnect
14/06/20 14:01:33 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (无法定位登录配置)

不知是否是HBase宕掉了,但使用jps发现各个节点的有关HBase的进程都还在(HQuorumPeer、HRegionServer)
求教如何解决?
...全文
696 4 打赏 收藏 转发到动态 举报
AI 作业
写回复
用AI写文章
4 条回复
切换为时间正序
请发表友善的回复…
发表回复
wfp458113181wfp 2014-09-15
  • 打赏
  • 举报
回复
应该是gc时间过长,导致 dfs 连接异常,然后写hdfs块失败,然后region挂掉。 我猜测,应该是你导入数据量很大,导致region做compaction,最终导致gc时间过长? 建议你调整参数,关闭自动compaction
the_gunner 2014-06-30
  • 打赏
  • 举报
回复
求大牛帮助啊~
the_gunner 2014-06-20
  • 打赏
  • 举报
回复
查看了一下宕机的regionserver的错误日志,其中宕机前的一段日志是
2014-06-20 14:50:05,624 WARN  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 15143ms
2014-06-20 14:50:19,068 INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 3778ms
2014-06-20 14:50:51,934 INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 7547ms
2014-06-20 14:50:58,050 INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 3453ms
2014-06-20 14:51:18,298 DEBUG [LruStats #0] hfile.LruBlockCache: Total=1.90 MB, free=402.60 MB, max=404.50 MB, blocks=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,evictions=0, evicted=0, evictedPerRun=NaN
2014-06-20 14:51:49,857 WARN  [regionserver60020.compactionChecker] util.Sleeper: We slept 27923ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2014-06-20 14:51:49,860 WARN  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 21199ms
2014-06-20 14:51:49,858 WARN  [regionserver60020.periodicFlusher] util.Sleeper: We slept 27925ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2014-06-20 14:52:15,099 INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 3436ms
2014-06-20 14:52:34,288 INFO  [regionserver60020-SendThread(u07:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 62530ms for sessionid 0x646b7dda0af0006, closing socket connection and attempting reconnect
2014-06-20 14:52:34,288 INFO  [regionserver60020-SendThread(u04:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 84753ms for sessionid 0x146b7dd7e790006, closing socket connection and attempting reconnect
2014-06-20 14:52:34,288 INFO  [regionserver60020-SendThread(u03:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 69742ms for sessionid 0x346b7dd5ce70003, closing socket connection and attempting reconnect
2014-06-20 14:52:37,330 INFO  [regionserver60020-SendThread(u05:2181)] zookeeper.ClientCnxn: Opening socket connection to server u05/192.168.85.132:2181. Will not attempt to authenticate using SASL (无法定位登录配置)
2014-06-20 14:52:37,366 INFO  [regionserver60020-SendThread(u05:2181)] zookeeper.ClientCnxn: Socket connection established to u05/192.168.85.132:2181, initiating session
2014-06-20 14:52:37,415 INFO  [regionserver60020-SendThread(u05:2181)] zookeeper.ClientCnxn: Opening socket connection to server u05/192.168.85.132:2181. Will not attempt to authenticate using SASL (无法定位登录配置)
2014-06-20 14:52:37,492 INFO  [regionserver60020-SendThread(u05:2181)] zookeeper.ClientCnxn: Socket connection established to u05/192.168.85.132:2181, initiating session
2014-06-20 14:52:37,555 INFO  [regionserver60020-SendThread(u04:2181)] zookeeper.ClientCnxn: Opening socket connection to server u04/192.168.85.131:2181. Will not attempt to authenticate using SASL (无法定位登录配置)
2014-06-20 14:52:37,569 INFO  [regionserver60020-SendThread(u04:2181)] zookeeper.ClientCnxn: Socket connection established to u04/192.168.85.131:2181, initiating session
2014-06-20 14:52:37,919 INFO  [regionserver60020-SendThread(u04:2181)] zookeeper.ClientCnxn: Session establishment complete on server u04/192.168.85.131:2181, sessionid = 0x646b7dda0af0006, negotiated timeout = 90000
2014-06-20 14:53:02,580 WARN  [regionserver60020.periodicFlusher] util.Sleeper: We slept 29465ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2014-06-20 14:53:07,787 WARN  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 27198ms
2014-06-20 14:53:07,499 INFO  [regionserver60020-SendThread(u05:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 30168ms for sessionid 0x146b7dd7e790006, closing socket connection and attempting reconnect
2014-06-20 14:53:07,499 INFO  [regionserver60020-SendThread(u05:2181)] zookeeper.ClientCnxn: Client session timed out, have not heard from server in 30084ms for sessionid 0x346b7dd5ce70003, closing socket connection and attempting reconnect
2014-06-20 14:53:07,496 WARN  [regionserver60020.compactionChecker] util.Sleeper: We slept 34024ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2014-06-20 14:53:24,962 INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 7518ms
2014-06-20 14:53:40,244 INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 8767ms
2014-06-20 14:53:44,101 WARN  [ResponseProcessor for block BP-2107977179-192.168.85.128-1402023732925:blk_1073755470_16255] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-2107977179-192.168.85.128-1402023732925:blk_1073755470_16255
2014-06-20 14:54:12,415 FATAL [regionserver60020] regionserver.HRegionServer: ABORTING region server u05,60020,1403244375098: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing u05,60020,1403244375098 as dead server
是否是跟JVM gc有关呢?
the_gunner 2014-06-20
  • 打赏
  • 举报
回复
另外,虽然发生连接不上HBase的情况,MR job并不会停止
本资源为大数据基础到中高级教学资源,适合稍微有点大数据或者java基础的人群学习,资源过大,上传乃是下载链接,不多说,上目录: 1_java基础2 l3 a2 a$ t7 J2 b+ `- p 2_java引入ide-eclipse 3_java基础知识-循环-类型转换 4_循环-函数-数组-重载 5_多为数组-冒泡-折半-选择排序 6_oop-封装-继承-static-final-private 7_多态-接口-异常体系 8_适配器/ k% N! Y7 j/ |- c) O5 M' V6 S 9_多线程-yield-join-daemon-synchronized; o, E; \* I: E2 W 10_多线程-同步代码块-同步方法 11_多线程-生产消费问题 12_多线程-死锁问题 13_字符集问题' X4 e; v9 q' U2 W% f" l7 f$ F 14_String-StringBuffer-StringBuilder 15_集合-list-arrayList-linkedlist 16_集合-hashset-hashmap-迭代器-entryset$ d3 b$ ~5 b! @- Z* }- C 17_快捷键设置* L* C. y4 Z1 v0 p) [8 p3 A 18_IO& f, H- i' w( B; P% V; Q" z. L( n/ q 19_IO2 20_文件归档和解档 21_TCP+udp协议-广播 22_UDP实现屏广程序-教师端3 m7 l; D) p! p$ q' H- L5 t1 s 23_UDP实现屏广程序-教师端2% |) h# a9 r) z6 b 24_GOF-设计模式$ k0 Y6 b) s& m% J 25_qq消息通信2 T! n* ^2 ? | l# ]- ^ 26_qq消息通信2 27_qq消息通信-群聊 28_qq消息通信-群聊-手动刷新好友列表-下线通知0 P+ D" ]/ f. q* O! d9 Z& L 29_qq消息通信-群聊-私聊消息' a3 S6 a2 d+ Y6 s( Z 30_qq消息通信-群聊-私聊消息2 31_虚拟机内存结构-反射 32_虚拟机内存结构-JVM-$ j; l* n7 g' u 33_代理模式 34_RDBMS 35_MySQL安装' `/ h# t# o# s& y1 \# ?* R5 f) p4 Z 36_MySQL常用命令-CRUD 37_java JDBC-insert 38_java JDBC-sql注入问题-preparedstatemnt 39_java 事务管理-批量插入0 X, w! w5 [- E( `( f* V1 [ 40_java事务管理-批量插入-存储过程 41_java mysql 函数 42_java mysql LongBlob + Text类型8 @9 ^) y7 s* L, _3 w7 Q9 q9 ^ 43_连接查询2 R: d" J9 J1 O3 D* B1 }2 u( {2 v 44_事务并发现象-脏读-幻读-不可重复读-隔离级别 45_隔离级别-并发现象展示-避免 46_表级锁-行级锁-forupdate 47_mysql数据源连接池 48_NIO" d% v1 P# ~3 S/ L 49_NIO程序- u5 T2 a5 N" {! @8 q4 c 50_Vmware安装-client centos7机安装2 Q. l/ r7 y) ^% n8 |4 _. k 51_centos文件权限-常用命令 52_网络静态ip-NAT连接方式-YUM安装, e9 j% z; B' ?! p1 D* Y 53_常用命令2 L V5 k8 y8 S h( Q0 `2 O4 s- I- N 54_for-while-if-nc6 z# I2 D6 f- D* |6 Y @ 55_jdk安装-环境变量配置2 C6 x4 C; s) M: {$ }- p 56_hadoop安装-配置 57_hadoop伪分布模式8 I/ e; `1 Y$ b+ p1 R5 ^ 58_编分发脚本-xcall-rsync1 X% G: Y' Q; }5 I$ [ 59_hadoop完全分布式-hdfs体验 60_hadoop的架构原理图 61_临文件 62_hadoop的简单介绍, p5 P$ @+ O2 V. p } 63_通过京东的流程讲解hadoop的处理过程; b1 Q* b- v& N, S4 G) j' Y 64_项目流程图 65_架构2 66_跑一个应用程序 67_hadoop的搭建的复习6 h) {. C, f( J( @& F0 G 68_脚本分析的过程" ?' q# U7 B/ ~" W, e- I 69_开启和关闭一个进程 70_hadoop常用的命令和关闭防火墙) Q" A0 B3 M8 s3 ? 71_hadoop存储为何是128M 72_hadoop的存储问题 73_hadoop的高可用 74_配置hadoop临目录 75_hadoop的hdfs的jar包 76_hadoop的存储问题+ B: J K& G* B4 Z 77_hadoop的hdfs常用的命令 78_hadoop的存储过程 79_hadoop的大数据节点% K S, J! U3 W& o2 d) Q 80_hdfs-maven-hdfs API访问8 s8 J# W* l- i% x, ]: L! L 81_hdfs-maven-idea的集成处理 82_hdfs-block大小-副本数设定9 o$ I! k4 |+ ]9 q2 h8 ]# x6 B, S* Y$ W 83_hdfs-网络拓扑-入剖析2 g4 Z0 j& K; Z, K 84_hdfs-入剖析2-packet-chunk 85_hdfs-压缩编解码器, u" o: K/ V5 B 86_hdfs-MR原理 87_hdfs-wordcount$ ?% ?& }' U. [0 M9 b 88_hadoop-mapreduce-切片演示-mapper 89_hadoop-mapreduce-url演示1 B% m, V- Z) ~. B9 |9 m2 u 90_job提交流程剖析 91_job split计算法则-读取切片的法则 92_job seqfile5 v! h+ R9 L1 w, U* T6 J# M 93_job 全排序-自定义分区类2 n% h" `: b4 c) C3 J9 S 94_job二次排序5 t3 Z2 R- ]( a: s* c0 Z 95_从db输入数据进行mr计算: L. M4 I6 y, R2 l/ u/ L 96_输出数据到db中 97_NLineInputFormat& u( k1 T& z( O# P, S* y1 Y 98_KeyValueTextInputFormat* p$ O1 z- h, n" e( x1 s& c% z' v 99_join mapper端连接- N, S# O2 }6 m0 T 100_join reduce端连接0 N1 |* R5 n* D8 C+ i 101_hadoop Namenode HA配置8 [( ^7 Q1 W' y3 q 102_avro串行化4 [! T( [, J# e5 h P' w' {% I 103_google pb串行化& S- V% x6 v) {( Y" W 104_hive安装-使用: r/ Q& x. ~6 `- d* Y& R U4 X 105_hive beeline-hiveserver2 106_hive beeline-外部表-内部标 107_hive 分区表-桶表 108_hive word count 109_hive连接查询-union查询-load数据 110_hbase概述 111_zk架构-集群搭建-容灾演练avi 112_zk API-观察者-临节点-序列节点-leader选举 113_hadoop namenode HA自动容灾" X3 `' ^/ U+ u+ U" F: } b 114_hadoop RM HA自动容灾 115_hbase集群搭建 116_hbase名字空间-表 117_hbase大批量操作7 [! ^" m3 B$ C. {1 S$ h. X 118_hbase架构-表和区域切割( p4 _0 k) J9 A/ ~; [ F 119_hbase架构-区域的合并 120_hbase get-scan-范围指定 121_扫描缓存-超-切片' O; n; m' P; a6 T/ H$ S! ^ 122_hbase的HA配置演示-和Hadoop的HA集成 123_hbase版本机制 124_hbase-ttl-min-versions-keep-deleted-cells" @- N5 [2 s; S3 T$ H' C 125_keep-deleted-cells控制是否保留删除的shell$ V8 |; Q7 g" ]- C# j% |! y 126_过滤器rowkey-family 127_过滤器-分页-row-col 128_filterList 129_rowkey2 h5 Y+ y9 _1 j0 K0 Q) n 130_区域观察者 131_区域观察者实现和部署" s o7 p+ F& p/ a) ]& W/ ? 132_重区域观察者的postPut和postScannext方法实现数据统一处理0 H) Q' Z- b; P# K 133_hbase的bulkload命令实现hbase集群之间数据的传输2 D6 d; F6 S8 x+ I/ I0 B0 @ 134_hive同hbase集成,统计hbase数据表信息% Q/ R! Z1 J3 J) k+ H! {6 D# M 135_使用TableInputFormat进行MR编程! m& C6 B/ v6 N" `, I' O& }4 u 136_使用phoenix交互hbase& h* s5 S- ~6 ]: u7 \ 137_squirrel工具. |+ E; g* R9 l3 E 138_flume简介 139_nc收集日志# [3 O7 K& n; f; y( f 140_hdfs sink收集日志到hdfs b9 o, k, j( G4 l! {* u: | 141_使用spooldir实现批量收集/ s8 F* }% o- n6 g& a9 w 142_使用exec结合tail命令实现实收集 143_使用seq源和压力源实现测试 144_使用avro源 145_导入avro maven-avro-client 146_导入avro maven-avro-client 147_使用hbasesink收集日志到hbase数据库 148_内存通道配置6 U/ X5 L3 ]7 b6 `5 x 149_source的通道选择器-复制策略-multiplexing 150_source的数据流程 151_sinkgroup的处理器-loadbalance- ^6 B0 j4 Z5 f9 d 152_sinkgroup的处理器-failover) y- ^1 Y. ~5 s9 G8 S! ^! a5 o 153_kafka集群安装与启动4 ^; K& j3 @6 p0 M 154_kafka创建主题以及查看主题结构 155_考察zk中kafka结构9 N: Y8 u4 {# m/ z1 d3 H 156_kafka分区服务器服务方式 157_kafka编程API实现生产者和消费者+ w9 l1 N( D8 E% z( D; G 158_kafka手动修改zk的偏移量实现消费处理( w7 s! K9 v7 U3 P7 T4 j 159_kafka与flume集成-source集成- _, G+ K) y% I4 D" q9 \ 160_kafka与flume集成-sink集成4 o6 W; v5 a; p9 s. X% I7 @ 161_kafka与flume集成-channel集成/ x' w3 g3 z& d: w 162_kafka简介!

20,848

社区成员

发帖
与我相关
我的任务
社区描述
Hadoop生态大数据交流社区,致力于有Hadoop,hive,Spark,Hbase,Flink,ClickHouse,Kafka,数据仓库,大数据集群运维技术分享和交流等。致力于收集优质的博客
社区管理员
  • 分布式计算/Hadoop社区
  • 涤生大数据
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧