Spark运行wordcount遇到如下问题,求解决

solene1314 2017-08-03 01:24:11
17/08/03 13:20:23 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.ClassCastException: java.util.Arrays$ArrayList cannot be cast to java.util.Iterator
at Spark.SparkApp.core.WordCount$1.call(WordCount.java:28)
at Spark.SparkApp.core.WordCount$1.call(WordCount.java:1)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:125)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:125)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
这个的问题是什么,以下是我的代码
...全文
530 4 打赏 收藏 转发到动态 举报
写回复
用AI写文章
4 条回复
切换为时间正序
请发表友善的回复…
发表回复
卡奥斯道 2017-08-26
  • 打赏
  • 举报
回复
public Iterator<String> call(String t) throws Exception { // TODO Auto-generated method stub return (Iterator<String>) Arrays.asList(t.split(" ")); } 正常的都是如下: public Iterable<String> call(String s) throws Exception { return Arrays.asList(s.split(" ")); } 你这方法返回值类型都错了。将方法返回值类型改为Iterable。OK?
LinkSe7en 2017-08-09
  • 打赏
  • 举报
回复
public Iterator<String> call(String t) throws Exception { // TODO Auto-generated method stub return (Iterator<String>) Arrays.asList(t.split(" ")); } 改为return Arrays.asList(t.split(" ")).iterator()
廖某 2017-08-08
  • 打赏
  • 举报
回复
根据报错日志: java.lang.ClassCastException: java.util.Arrays$ArrayList cannot be cast to java.util.Iterator 可知是类型转化的问题,应该是你这条语句引起的: return (Iterator<String>) Arrays.asList(t.split(" "));
solene1314 2017-08-03
  • 打赏
  • 举报
回复
package Spark.SparkApp.core; import java.util.Arrays; import java.util.Iterator; import java.util.List; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.FlatMapFunction; import org.apache.spark.api.java.function.Function2; import org.apache.spark.api.java.function.PairFunction; import org.apache.spark.api.java.function.VoidFunction; import scala.Tuple2; public class WordCount { public static void main(String[] args) { System.setProperty("hadoop.home.dir", "G:/hadoop-2.8.0/hadoop-2.8.0"); SparkConf conf = new SparkConf().setAppName("Spark WordCount").setMaster("local"); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD<String> lines = sc.textFile("G:/CodeProject/SparkApp/pom.xml"); JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String,String>() { public Iterator<String> call(String t) throws Exception { // TODO Auto-generated method stub return (Iterator<String>) Arrays.asList(t.split(" ")); } }); JavaPairRDD<String, Integer> pairs = words.mapToPair(new PairFunction<String, String,Integer>() { public Tuple2<String, Integer> call(String word) throws Exception { // TODO Auto-generated method stub return new Tuple2<String, Integer>(word, 1); } }); JavaPairRDD<String, Integer> wordsCount = pairs.reduceByKey(new Function2<Integer, Integer, Integer>() { public Integer call(Integer v1, Integer v2) throws Exception { // TODO Auto-generated method stub return v1+v2; } }); wordsCount.foreach(new VoidFunction<Tuple2<String,Integer>>() { public void call(Tuple2<String, Integer> pairs) throws Exception { // TODO Auto-generated method stub System.out.println(pairs._1+":"+pairs._2); } }); sc.stop(); } }

20,808

社区成员

发帖
与我相关
我的任务
社区描述
Hadoop生态大数据交流社区,致力于有Hadoop,hive,Spark,Hbase,Flink,ClickHouse,Kafka,数据仓库,大数据集群运维技术分享和交流等。致力于收集优质的博客
社区管理员
  • 分布式计算/Hadoop社区
  • 涤生大数据
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧