利用Yarn做大数据导入Hbase速度慢的问题。
随遇_羽翔 2016-02-18 08:20:13 最近在做大表数据从Oracle导入到HBase的工作,但是遇到一些问题想和各位咨询交流一下。
具体的场景是这样的,目前有一张8千万数据的表,已经把这张表的数据导出为69G多的csv文件。利用hadoop中mapreduce的mapper功能导入到Hbase中。Hadoop的版本是2.5.2,Hbase的版本是1.1.2,把目标表预先拆分为100个分区。
当时现场的环境大概是3台8G4核的台式机搭建的小环境做试验。在yarn中观察发现数据进度从10%开始就变得异常缓慢,通过分析日志发现Yarn中变慢的进程都在报类似如下的错误。初步分析可能是哪里堵塞造成的。
2016-02-18 12:59:57,937 INFO [htable-pool1-t5] org.apache.hadoop.hbase.client.AsyncProcess: #2, table=sb_spxx_rk, attempt=16/35 failed=148ops, last exception: null on hadoop1,16020,1455766013395, tracking started null, retrying after=20157ms, replay=148ops
2016-02-18 13:00:17,180 INFO [htable-pool1-t8] org.apache.hadoop.hbase.client.AsyncProcess: #2, table=sb_spxx_rk, attempt=17/35 failed=141ops, last exception: null on hadoop1,16020,1455766013395, tracking started null, retrying after=20133ms, replay=141ops
2016-02-18 13:00:18,397 INFO [htable-pool1-t5] org.apache.hadoop.hbase.client.AsyncProcess: #2, table=sb_spxx_rk, attempt=17/35 failed=148ops, last exception: null on hadoop1,16020,1455766013395, tracking started null, retrying after=20169ms, replay=148ops
2016-02-18 13:00:37,420 INFO [htable-pool1-t8] org.apache.hadoop.hbase.client.AsyncProcess: #2, table=sb_spxx_rk, attempt=18/35 succeeded on hadoop1,16020,1455766013395, tracking started Thu Feb 18 12:57:42 CST 2016
2016-02-18 13:00:38,773 INFO [htable-pool1-t5] org.apache.hadoop.hbase.client.AsyncProcess: #2, table=sb_spxx_rk, attempt=18/35 succeeded on hadoop1,16020,1455766013395, tracking started Thu Feb 18 12:57:42 CST 2016
2016-02-18 13:01:07,053 INFO [htable-pool1-t5] org.apache.hadoop.hbase.client.AsyncProcess: #2, table=sb_spxx_rk, attempt=10/35 failed=19ops, last exception: null on hadoop4,16020,1455766014721, tracking started null, retrying after=10045ms, replay=19ops
在网上搜查了半天结果还是没有找到比较好的结果。不知道各位看到类似的情况,能不能帮助我分析一下或者提供给我一些参考意见看是什么原因造成的?