关于rdd写入hdfs数据
代码如下
def processRdd(rdd: RDD[(String, String)]): Unit = {
val dateformat = new SimpleDateFormat("yyyyMMdd")
val cal = Calendar.getInstance
cal.add(Calendar.DATE, -1)
val today = dateformat.format(new Date())
val yesterday = dateformat.format(cal.getTime)
val lines = rdd.map(_._2)
val words = lines.flatMap(_.split("\n"))
words.foreach(word => {
val EndTime = JSON.parseObject(word).getJSONArray("SHEET").getJSONObject(0).getJSONObject("HEADER").getLong("ENDTIME").toString.substring(0, 8)
if (EndTime != today && EndTime != yesterday) {
//println(EndTime)
words.saveAsTextFile("hdfs://tmp/" + EndTime + "/error")
}
else {
words.saveAsTextFile("hdfs://tmp/" + EndTime)
//println(EndTime)
}
})
}
作用就是消费kafka里的json数据,取值做比对,然后写到hdfs目录里,运行时提示rdd里不能嵌套rdd,本人新手,不知道怎么写