spark读取hbase时,报出java.lang.NoClassDefFoundError: org/apache/htrace/Trace异常

西红小柿 2017-10-18 05:34:24
两行hbase表分别有几亿条数据,读取成RDD后,对其进行处理,然后报出了一下异常:

17/10/17 10:39:25 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x54c03dba connecting to ZooKeeper ensemble=10.20.224.43:2181,10.20.224.19:2181,10.20.224.65:2181,10.20.224.49:2181
17/10/17 10:39:25 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=10.20.224.43:2181,10.20.224.19:2181,10.20.224.65:2181,10.20.224.49:2181 sessionTimeout=90000 watcher=hconnection-0x54c03dba0x0, quorum=10.20.224.43:2181,10.20.224.19:2181,10.20.224.65:2181,10.20.224.49:2181, baseZNode=/hbase
17/10/17 10:39:25 INFO zookeeper.ClientCnxn: Opening socket connection to server hadoap01plk019/10.20.224.19:2181. Will not attempt to authenticate using SASL (unknown error)
17/10/17 10:39:25 ERROR mapreduce.TableInputFormat: java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119)
at org.apache.hadoop.hbase.mapreduce.TableInputFormat.initialize(TableInputFormat.java:186)
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:165)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:166)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:133)
at org.apache.hadoop.hbase.spark.NewHBaseRDD.compute(NewHBaseRDD.scala:34)
at org.apache.hadoop.hbase.spark.NewHBaseRDD.compute(NewHBaseRDD.scala:25)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
... 19 more
Caused by: java.lang.NoClassDefFoundError: org/apache/htrace/Trace
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:217)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:419)
at org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65)
at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:105)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:919)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:657)
... 24 more

17/10/17 10:39:25 INFO zookeeper.ClientCnxn: Socket connection established, initiating session, client: /10.20.224.35:39364, server: hadoap01plk019/10.20.224.19:2181
17/10/17 10:39:25 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 408
17/10/17 10:39:25 INFO zookeeper.ClientCnxn: Opening socket connection to server hadoap01plk019/10.20.224.19:2181. Will not attempt to authenticate using SASL (unknown error)
17/10/17 10:39:25 ERROR executor.Executor: Exception in task 352.1 in stage 0.0 (TID 406)
java.io.IOException: Cannot create a record reader because of a previous error. Please look at the previous logs lines from the task's full log for more details.
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:174)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:166)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:133)
at org.apache.hadoop.hbase.spark.NewHBaseRDD.compute(NewHBaseRDD.scala:34)
at org.apache.hadoop.hbase.spark.NewHBaseRDD.compute(NewHBaseRDD.scala:25)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: The input format instance has not been properly initialized. Ensure you call initializeTable either in your constructor or initialize method
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getTable(TableInputFormatBase.java:588)
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:169)
... 15 more

使用的spark-submit提交命令如:

还特意将上述异常:没有找到的类的jar包,在命令里传入;
但是我使用测试表(各自只有一条数据),该spark的程序可以正常运行;

请问:这是什么原因导致的呢?
...全文
2289 2 打赏 收藏 转发到动态 举报
写回复
用AI写文章
2 条回复
切换为时间正序
请发表友善的回复…
发表回复
西红小柿 2017-10-19
  • 打赏
  • 举报
回复
发现在配置资源时,引发了该问题,当我配置executor个数为50个时,executore-cores个数为2个时,程序可以正常运行;但是提高executor个数时,就会引发上述的问题;请问是怎么回事啊?
西红小柿 2017-10-18
  • 打赏
  • 举报
回复
支持一下该问题

1,258

社区成员

发帖
与我相关
我的任务
社区描述
Spark由Scala写成,是UC Berkeley AMP lab所开源的类Hadoop MapReduce的通用的并行计算框架,Spark基于MapReduce算法实现的分布式计算。
社区管理员
  • Spark
  • shiter
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧