spark on yarn 跨集群访问带有Kerberos的HDFS数据报错

白玉剑 2017-07-12 09:09:48
各位大神好,最近尝试使用spark on yarn 的模式访问另一个启用了kerberos的hadoop集群上的数据,在程序执行的集群上是有一个用户的票证的,local模式下执行程序是能够访问的,但是指定了--master yarn 之后,不管是client模式还是cluster模式都报下面的错误,在网上苦寻无果,只好前来求助:
WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, stbdd001, executor 1): java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "stbddxx1/xxx.xx.xxx.xxx"; destination host is: "dmbxx2":8020;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1409)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy23.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:256)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy24.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1279)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1266)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1254)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:305)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:271)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:263)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1585)
at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:326)
at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:322)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:322)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:783)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:109)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:240)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:211)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:688)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:651)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:739)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:376)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1525)
at org.apache.hadoop.ipc.Client.call(Client.java:1448)
... 39 more
...全文
73790 7 打赏 收藏 转发到动态 举报
写回复
用AI写文章
7 条回复
切换为时间正序
请发表友善的回复…
发表回复
holyzing 2020-11-22
  • 打赏
  • 举报
回复
同样的问题,但我是在 standalong 模式下跑的, 苦苦寻找看到一个帖子 说 standalong 模式不支持 访问受 keberos 认证的 hdfs集群, https://blog.csdn.net/jsky_studio/article/details/46900689?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.control 但是这个帖子没有太大的参考价值,且时代久远(2015年), 不知道各位大神有没有什么解决办法,还是一些重要的信息,还望赐教
冰上浮云 2020-10-29
  • 打赏
  • 举报
回复 2
也遇到了同样的问题,方案寻找中
2020-10-29 20:20:12,809 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:finance (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
2020-10-29 20:20:12,810 WARN [main] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
2020-10-29 20:20:12,810 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:finance (auth:SIMPLE) cause:java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
2020-10-29 20:20:12,818 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:finance (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
2020-10-29 20:20:12,819 WARN [main] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
ffbinhcj 2017-11-17
  • 打赏
  • 举报
回复
@baiyujian082,我也遇到了和你同样的问题,local模式可以,yarn上跑不行,你最后是咋解决的呀?求指导
eiffel_0311 2017-07-31
  • 打赏
  • 举报
回复
查一下yarn 的日志/var/log/yarn-----
qq_32806467 2017-07-13
  • 打赏
  • 举报
回复 1
1.6版本加 sparkconf .set("spark.yarn.access.namenodes","hdfs://namenode") 2.x版本加 sparkconf .set("spark.yarn.access.hadoopFileSystems","hdfs://namenode")
白玉剑 2017-07-12
  • 打赏
  • 举报
回复 1
@zhong930
野男孩 2017-07-12
  • 打赏
  • 举报
回复
只能帮顶了。。。不用KERBEROS的

1,260

社区成员

发帖
与我相关
我的任务
社区描述
Spark由Scala写成,是UC Berkeley AMP lab所开源的类Hadoop MapReduce的通用的并行计算框架,Spark基于MapReduce算法实现的分布式计算。
社区管理员
  • Spark
  • shiter
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧