本人近来在学习链路预测,看了两篇Alireza Hajibagheri等人的论文A Holistic Approach for Link Prediction in Multiplex Networks,Extracting Information from Negative Interactions in Multiplex Networks using Mutual Information,有幸获得了论文涉及到的算法代码与实验数据集,但是代码无法运行,经本人初步排查发现时导入pyspark时出现错误,部分错误代码如下:
import sys
import os
from Configurations import osName,directory_supervised,dataset_name
# Spark directory for windows. Alter addresses based on the location
# of spark on your machine. Not necessary for other operating systems
if osName == "WINDOWS":
os.environ['SPARK_HOME'] = "C:/Mine/Spark/spark-1.4.1-bin-hadoop2.6"
sys.path.append("C:/Mine/Spark/spark-1.4.1-bin-hadoop2.6/python")
sys.path.append('C:/Mine/Spark/spark-1.4.1-bin-hadoop2.6/python/pyspark')
os.environ['HADOOP_HOME'] = "C:/Mine/Spark/hadoop-2.6.0"
sys.path.append("C:/Mine/Spark/hadoop-2.6.0/bin")
from pyspark import SparkContext
from pyspark.mllib.regression import LabeledPoint
from pyspark.sql import SQLContext
from pyspark.sql.types import *
sc = SparkContext()
sqlContext = SQLContext(sc)
from pyspark.mllib.tree import RandomForest
from pyspark.mllib.classification import SVMWithSGD, SVMModel
from pyspark.mllib.util import MLUtils
from pyspark.mllib.evaluation import BinaryClassificationMetrics,MulticlassMetrics
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
import matplotlib.pyplot as plt
from pylab import title,gcf
错误信息如下:
Traceback (most recent call last):
File "G:\所谓科研\Link Prediction\second time\LinkPredictionPackage\RPM.py", line 21, in <module>
from pyspark import SparkContext
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\__init__.py", line 51, in <module>
from pyspark.context import SparkContext
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\context.py", line 31, in <module>
from pyspark import accumulators
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\accumulators.py", line 97, in <module>
from pyspark.serializers import read_int, PickleSerializer
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\serializers.py", line 71, in <module>
from pyspark import cloudpickle
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\cloudpickle.py", line 145, in <module>
_cell_set_template_code = _make_cell_set_template_code()
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\cloudpickle.py", line 126, in _make_cell_set_template_code
return types.CodeType(
TypeError: an integer is required (got type bytes)
于是我通过IDLE命令行导入pyspark
1、
import pyspark
错误信息如下:
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
import pyspark
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\__init__.py", line 51, in <module>
from pyspark.context import SparkContext
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\context.py", line 31, in <module>
from pyspark import accumulators
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\accumulators.py", line 97, in <module>
from pyspark.serializers import read_int, PickleSerializer
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\serializers.py", line 71, in <module>
from pyspark import cloudpickle
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\cloudpickle.py", line 145, in <module>
_cell_set_template_code = _make_cell_set_template_code()
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\cloudpickle.py", line 126, in _make_cell_set_template_code
return types.CodeType(
TypeError: an integer is required (got type bytes)
2、
from pyspark import SparkContext
错误信息如下:
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
from pyspark import SparkContext
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\__init__.py", line 51, in <module>
from pyspark.context import SparkContext
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\context.py", line 31, in <module>
from pyspark import accumulators
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\accumulators.py", line 97, in <module>
from pyspark.serializers import read_int, PickleSerializer
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\serializers.py", line 71, in <module>
from pyspark import cloudpickle
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\cloudpickle.py", line 145, in <module>
_cell_set_template_code = _make_cell_set_template_code()
File "D:\Programs\Python\Python38\lib\site-packages\pyspark\cloudpickle.py", line 126, in _make_cell_set_template_code
return types.CodeType(
TypeError: an integer is required (got type bytes)
python小白痴真的不会呀,求大神答疑