idea 开发spark程序提交集群运行总是报错ClassCastException: cannot assign instance of scala.coll

bigdata-sb 2018-01-20 01:51:08
scala版本:2.1.6
spark版本:spark-2.2.1-bin-hadoop2.7.tgz
hadoop版本:hadoop-2.7.3.tar.gz


package spark

import org.apache.spark.{SparkConf, SparkContext}

object WordCount {
def main(args: Array[String]): Unit = {
val conf = new SparkConf()
.setMaster("spark://192.168.1.110:7077")
.setAppName("WordCount")
.setJars(Seq("out/artifacts/WordCountExample_jar/WordCountExample.jar"))

val sc = new SparkContext(conf)
val inputRdd = sc.textFile("hdfs://192.168.1.110:8020/data/words.log")
val wordCountRDD = inputRdd.flatMap(_.split("\t")).map((_, 1)).reduceByKey(_ + _).sortBy(_._2, false)
val result = wordCountRDD.collect()
for (pair <- result) {
println(pair)
}

sc.stop()
}
}




项目我用maven构建或者是普通的scala项目构建运行报错都是同一个,


下面是错误日志信息

18/01/20 01:33:06 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.1.113, executor 1): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2288)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2282)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:428)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:85)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)


.....


18/01/20 01:33:08 INFO DAGScheduler: ShuffleMapStage 0 (map at WordCount.scala:14) failed in 47.488 s due to Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 4, 192.168.1.113, executor 1): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2288)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2282)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:428)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:85)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
...全文
1630 4 打赏 收藏 转发到动态 举报
写回复
用AI写文章
4 条回复
切换为时间正序
请发表友善的回复…
发表回复
s0611163 2021-03-23
  • 打赏
  • 举报
回复
遇到同样问题
LinkSe7en 2020-05-07
  • 打赏
  • 举报
回复
scala版本:2.1.6????? 你确定不是2.11.6????spark只支持2个scala大版本:2.11.x和2.12.x,要和spark的版本保持一致 你这问题看起来像是scala版本与线上不一致
Felix不会飞 2020-05-07
  • 打赏
  • 举报
回复
同样遇到这个问题 集群都是ok的 而且我通过sparkSql读取了一张hive表 也是正常的 可以读取 只是在进行rdd操作的时候就报错了
bigdata-sb 2018-01-20
  • 打赏
  • 举报
回复
hadoop集群和spark集群运行都是ok的。。。

1,258

社区成员

发帖
与我相关
我的任务
社区描述
Spark由Scala写成,是UC Berkeley AMP lab所开源的类Hadoop MapReduce的通用的并行计算框架,Spark基于MapReduce算法实现的分布式计算。
社区管理员
  • Spark
  • shiter
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧