python导入pyspark时出错怎么解决?

X_s_yu彧 2020-01-27 06:43:17
本人近来在学习链路预测,看了两篇Alireza Hajibagheri等人的论文A Holistic Approach for Link Prediction in Multiplex Networks,Extracting Information from Negative Interactions in Multiplex Networks using Mutual Information,有幸获得了论文涉及到的算法代码与实验数据集,但是代码无法运行,经本人初步排查发现时导入pyspark时出现错误,部分错误代码如下:
import sys
import os
from Configurations import osName,directory_supervised,dataset_name

# Spark directory for windows. Alter addresses based on the location
# of spark on your machine. Not necessary for other operating systems
if osName == "WINDOWS":
os.environ['SPARK_HOME'] = "C:/Mine/Spark/spark-1.4.1-bin-hadoop2.6"
sys.path.append("C:/Mine/Spark/spark-1.4.1-bin-hadoop2.6/python")
sys.path.append('C:/Mine/Spark/spark-1.4.1-bin-hadoop2.6/python/pyspark')
os.environ['HADOOP_HOME'] = "C:/Mine/Spark/hadoop-2.6.0"
sys.path.append("C:/Mine/Spark/hadoop-2.6.0/bin")

from pyspark import SparkContext
from pyspark.mllib.regression import LabeledPoint
from pyspark.sql import SQLContext
from pyspark.sql.types import *
sc = SparkContext()
sqlContext = SQLContext(sc)
from pyspark.mllib.tree import RandomForest
from pyspark.mllib.classification import SVMWithSGD, SVMModel
from pyspark.mllib.util import MLUtils
from pyspark.mllib.evaluation import BinaryClassificationMetrics,MulticlassMetrics
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
import matplotlib.pyplot as plt
from pylab import title,gcf

错误信息如下:
Traceback (most recent call last):
File "G:\所谓科研\Link Prediction\second time\LinkPredictionPackage\RPM.py", line 21, in <module>
from pyspark import SparkContext
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\__init__.py", line 51, in <module>
from pyspark.context import SparkContext
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\context.py", line 31, in <module>
from pyspark import accumulators
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\accumulators.py", line 97, in <module>
from pyspark.serializers import read_int, PickleSerializer
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\serializers.py", line 71, in <module>
from pyspark import cloudpickle
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\cloudpickle.py", line 145, in <module>
_cell_set_template_code = _make_cell_set_template_code()
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\cloudpickle.py", line 126, in _make_cell_set_template_code
return types.CodeType(
TypeError: an integer is required (got type bytes)
于是我通过IDLE命令行导入pyspark
1、
import pyspark

错误信息如下:
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
import pyspark
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\__init__.py", line 51, in <module>
from pyspark.context import SparkContext
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\context.py", line 31, in <module>
from pyspark import accumulators
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\accumulators.py", line 97, in <module>
from pyspark.serializers import read_int, PickleSerializer
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\serializers.py", line 71, in <module>
from pyspark import cloudpickle
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\cloudpickle.py", line 145, in <module>
_cell_set_template_code = _make_cell_set_template_code()
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\cloudpickle.py", line 126, in _make_cell_set_template_code
return types.CodeType(
TypeError: an integer is required (got type bytes)
2、
from pyspark import SparkContext

错误信息如下:
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
from pyspark import SparkContext
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\__init__.py", line 51, in <module>
from pyspark.context import SparkContext
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\context.py", line 31, in <module>
from pyspark import accumulators
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\accumulators.py", line 97, in <module>
from pyspark.serializers import read_int, PickleSerializer
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\serializers.py", line 71, in <module>
from pyspark import cloudpickle
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\cloudpickle.py", line 145, in <module>
_cell_set_template_code = _make_cell_set_template_code()
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\cloudpickle.py", line 126, in _make_cell_set_template_code
return types.CodeType(
TypeError: an integer is required (got type bytes)
python小白痴真的不会呀,求大神答疑
...全文
1366 2 打赏 收藏 转发到动态 举报
写回复
用AI写文章
2 条回复
切换为时间正序
请发表友善的回复…
发表回复
X_s_yu彧 2020-04-02
  • 打赏
  • 举报
回复
引用 1 楼 Rmain的回复:
这个是版本配套的问题。 spark目前不支持python3.8 。 用python3.7 应该就行了。 https://stackoverflow.com/questions/58700384/how-to-fix-typeerror-an-integer-is-required-got-type-bytes-error-when-tryin?noredirect=1
可是我用的就是python3.7呀
Rmain 2020-03-30
  • 打赏
  • 举报
回复
这个是版本配套的问题。 spark目前不支持python3.8 。 用python3.7 应该就行了。 https://stackoverflow.com/questions/58700384/how-to-fix-typeerror-an-integer-is-required-got-type-bytes-error-when-tryin?noredirect=1

37,718

社区成员

发帖
与我相关
我的任务
社区描述
JavaScript,VBScript,AngleScript,ActionScript,Shell,Perl,Ruby,Lua,Tcl,Scala,MaxScript 等脚本语言交流。
社区管理员
  • 脚本语言(Perl/Python)社区
  • IT.BOB
加入社区
  • 近7日
  • 近30日
  • 至今

试试用AI创作助手写篇文章吧