spark scala算子内部引用sparkSession属性为空
很懒的耗子 2017-06-05 06:24:28 代码目的是在datafram filter算子内进行数据的检测,需要在算子内部根据数据的不同调用不同hdfs不同路径的模型,在初始化hdfs fileSystem时需要用到hadoopConfiguration对象,问题来了,在算子内部引用sparkSession的属性时,只有sparkSession有值,saprkSession.sparkContext, sparkSession.sparkContext.hadoopConfiguration则为null,算子外部则都正常,试过在算子外部定义号hdfs fileSystem,在内部引用,则会出现hdfs fileSystem未序列化异常。测试代码和log如下,为方便理解log经过剪切。百度,谷歌相关问题几乎没有。。。麻烦各位大神了。
代码:
def main(args: Array[String]): Unit = {
var spark = SparkSession.builder()
.master("spark://192.168.3.30:7077").appName("RunKMeansByUser").getOrCreate()
spark.sparkContext.addJar("D:\\ftpdata\\大数据\\testJar\\testByUser.jar");
// 测试用数据取得
val results = 。。。。。
results.cache();
logger.error("------------------------测试数据量:" + results.count() + "---------------------------")
val kmeansByUser = new RunKMeansByUser();
val resultsSchema = results.schema;
logger.error("------------------------sparkParam filter out---------------------------" + spark)
logger.error("------------------------sparkContext filter out---------------------------" + spark.sparkContext)
logger.error("------------------------hadoopConfiguration filter out---------------------------" + spark.sparkContext.hadoopConfiguration)
val anomalies = results.filter(line => {
logger.error("------------------------sparkParam filter in---------------------------" + spark)
logger.error("------------------------sparkContext filter in---------------------------" + spark.sparkContext)
logger.error("------------------------hadoopConfiguration filter in---------------------------" + spark.sparkContext.hadoopConfiguration)
// val hdfs = FileSystem.get(new URI("hdfs://192.168.3.30:9000"), spark.sparkContext.hadoopConfiguration)
// 数据检测
// kmeansByUser.buildTrustmoAnomalyDetector(line, spark, resultsSchema)
true
})
logger.error("------------------------异常数据量:" + anomalies.count() + "---------------------------")
anomalies.show();
results.unpersist()
}
log
17/06/05 17:39:09 ERROR RunKMeansByUser$: ------------------------sparkParam filter out---------------------------org.apache.spark.sql.SparkSession@72f3f14c
17/06/05 17:39:09 ERROR RunKMeansByUser$: ------------------------sparkContext filter out---------------------------org.apache.spark.SparkContext@4cb4c4cc
17/06/05 17:39:09 ERROR RunKMeansByUser$: ------------------------hadoopConfiguration filter out---------------------------Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml
17/06/06 01:39:34 ERROR RunKMeansByUser$: ------------------------sparkParam filter in---------------------------org.apache.spark.sql.SparkSession@43defe41
17/06/06 01:39:34 ERROR RunKMeansByUser$: ------------------------sparkContext filter in---------------------------null
17/06/06 01:39:34 ERROR Executor: Exception in task 3.0 in stage 4.0 (TID 17)
java.lang.NullPointerException