spark读取不了hbase中的数据

大数据中的大叔 2017-11-03 02:34:48
最近开始了spark的学习,在spark读取hbase中的数据这一阶段,却出现了异常,百度了很久,都无法搞定,我用的IDE是eclipse,scala的版本是2.11.6,hbase的版本是1.2.6,spark的版本是2.1.0,hadoop的版本是2.7.3(测试用),zookeeper我没有用自带的,用的是3.4.6的版本的,单独用hbase-shell曾删改查数据都没有问题。相应的jar包我也从hbase中的jar添加到了spark的工程当中去。我的机子是伪分布式的,只有一台电脑,hadoop和spark都是伪分布的。代码如下:
object Test {
def main(args: Array[String]) {
val conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "192.168.0.102")
conf.set("hbase.zookeeper.property.clientPort", "2181")
val sc = new SparkContext(new SparkConf().setMaster("local").setAppName("Hbase-test"))
conf.set(TableInputFormat.INPUT_TABLE, "student")
val stuRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[org.apache.hadoop.hbase.io.ImmutableBytes
Writable], classOf[org.apache.hadoop.hbase.client.Result])
val count = stuRDD.count()
println("Students RDD Count:"+count)
stuRDD.cache()
stuRDD.foreach({case(_,result) => val key = Bytes.toString(result.getRow)
val name = Bytes.toString(result.getValue("info".getBytes, "name".getBytes))
val gender = Bytes.toString(result.getValue("info".getBytes,"gender".getBytes))
val age = Bytes.toString(result.getValue("info".getBytes, "age".getBytes))
println("Row Key:"+key+" Name:"+name+" Gender:"+gender+" Age:"+age)
})

}
}

相关异常如下:

省略一部分无关的,起初我觉得是我zookeeper配置出了问题,我尝试用了hbase自带的zookeeper,错误还是一样。


17/11/03 14:30:29 INFO ZooKeeper: Initiating client connection, connectString=192.168.0.102:2181 sessionTimeout=90000 watcher=hconnection-0x5c089b2f0x0, quorum=192.168.0.102:2181, baseZNode=/hbase
17/11/03 14:30:29 INFO ClientCnxn: Opening socket connection to server bigdata3/192.168.0.102:2181. Will not attempt to authenticate using SASL (unknown error)
17/11/03 14:30:29 INFO ClientCnxn: Socket connection established to bigdata3/192.168.0.102:2181, initiating session
17/11/03 14:30:29 INFO ClientCnxn: Session establishment complete on server bigdata3/192.168.0.102:2181, sessionid = 0x15f807724d60007, negotiated timeout = 40000
17/11/03 14:30:29 INFO RegionSizeCalculator: Calculating region sizes for table "student".
17/11/03 14:31:07 INFO RpcRetryingCaller: Call exception, tries=10, retries=35, started=38458 ms ago, cancelled=false, msg=row 'student,,00000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=bigdata3,16201,1509688846997, seqNum=0
17/11/03 14:31:17 INFO RpcRetryingCaller: Call exception, tries=11, retries=35, started=48531 ms ago, cancelled=false, msg=row 'student,,00000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=bigdata3,16201,1509688846997, seqNum=0
17/11/03 14:31:17 INFO ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x15f807724d60007
17/11/03 14:31:17 INFO ZooKeeper: Session: 0x15f807724d60007 closed
17/11/03 14:31:17 INFO ClientCnxn: EventThread shut down
Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions:
Fri Nov 03 14:31:17 CST 2017, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68712: row 'student,,00000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=bigdata3,16201,1509688846997, seqNum=0

at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:210)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:210)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:327)
at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:302)
at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:167)
at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:162)
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:797)
at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:193)
at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
at org.apache.hadoop.hbase.client.MetaScanner.allTableRegions(MetaScanner.java:324)
at org.apache.hadoop.hbase.client.HRegionLocator.getAllRegionLocations(HRegionLocator.java:89)
at org.apache.hadoop.hbase.util.RegionSizeCalculator.init(RegionSizeCalculator.java:94)
at org.apache.hadoop.hbase.util.RegionSizeCalculator.<init>(RegionSizeCalculator.java:81)
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:256)
at org.apache.hadoop.hbase.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:239)
at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:125)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
at org.apache.spark.rdd.RDD.count(RDD.scala:1157)
at MySecond.package1.Test$.main(Test.scala:22)
at MySecond.package1.Test.main(Test.scala)
Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=68712: row 'student,,00000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=bigdata3,16201,1509688846997, seqNum=0
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:169)
at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: com.google.protobuf.ServiceException: java.lang.NoClassDefFoundError: com/yammer/metrics/core/Gauge
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:332)
at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:408)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:204)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:65)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:210)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:364)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:338)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:136)
... 4 more
Caused by: com.google.protobuf.ServiceException: java.lang.NoClassDefFoundError: com/yammer/metrics/core/Gauge
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:240)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:34094)
at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:400)
... 10 more
Caused by: java.lang.NoClassDefFoundError: com/yammer/metrics/core/Gauge
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:225)
... 13 more
Caused by: java.lang.ClassNotFoundException: com.yammer.metrics.core.Gauge
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 14 more
17/11/03 14:31:17 INFO SparkContext: Invoking stop() from shutdown hook
17/11/03 14:31:17 INFO SparkUI: Stopped Spark web UI at http://192.168.0.102:4040
17/11/03 14:31:17 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/11/03 14:31:17 INFO MemoryStore: MemoryStore cleared
17/11/03 14:31:17 INFO BlockManager: BlockManager stopped
17/11/03 14:31:17 INFO BlockManagerMaster: BlockManagerMaster stopped
17/11/03 14:31:17 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/11/03 14:31:17 INFO SparkContext: Successfully stopped SparkContext
17/11/03 14:31:17 INFO ShutdownHookManager: Shutdown hook called
17/11/03 14:31:17 INFO ShutdownHookManager: Deleting directory /tmp/spark-a7296cf1-3122-4cfa-9438-1fbb775c3a48
...全文
693 2 打赏 收藏 转发到动态 举报
写回复
用AI写文章
2 条回复
切换为时间正序
请发表友善的回复…
发表回复
Zzreal 2017-12-04
  • 打赏
  • 举报
回复
把zookeeper最大客户端连接数maxClientCnxns调大60=>300(默认60,改成300)试试;如果还不行,zk的最大连接等待时长也调大,应该就可以了。问题应该是读取超时导致的
gofunink 2017-11-21
  • 打赏
  • 举报
回复
我也报这个错,请问博主解决了吗

1,258

社区成员

发帖
与我相关
我的任务
社区描述
Spark由Scala写成,是UC Berkeley AMP lab所开源的类Hadoop MapReduce的通用的并行计算框架,Spark基于MapReduce算法实现的分布式计算。
社区管理员
  • Spark
  • shiter
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧