调用sklearn中的RandomForestClassifier出现memory error
最近在参加大数据比赛,在本地跑数据,用的是win32系统,python开发环境,sklearn的机器学习包。
调用sklearn中的RandomForestClassifier出现memory error,查看资源管理器,也确实是内存陡增,请问大神,针对这个问题该肿么办?
代码如下:
import time
import numpy as np
from sklearn.ensemble import RandomForestClassifier
ISOTIMEFORMAT = '%Y-%m-%d %X'
print 'Begin:'
print time.strftime(ISOTIMEFORMAT,time.localtime())
resutlfeaturespath = r"C:\Users\robbert\Desktop\bigDataCompetition\tianyi\data\resultfeaturesday"
featuresdatapath = r"C:\Users\robbert\Desktop\bigDataCompetition\tianyi\data\week6and7features.txt"
traindata = np.loadtxt(featuresdatapath,delimiter = ',',dtype = np.int)
trainfeatures = traindata[:,:6]
trainlabel = traindata[:,6]
clf = RandomForestClassifier(n_estimators = 100)
clf.fit(trainfeatures,trainlabel)
print 'model constructed:'
print time.strftime(ISOTIMEFORMAT,time.localtime())
for i in range(7):
resultfp = resutlfeaturespath + str(i + 1) + ".txt"
testdata = np.loadtxt(resultfp,delimiter = ',',dtype = np.int)
testfeatures = testdata[:,:6]
testlabel = clf.predict(testfeatures)
resultpath = resutlfeaturespath + str(i + 1) + "result.txt"
np.save(resultpath, delimiter = ',')
print 'Done:'
print time.strftime(ISOTIMEFORMAT,time.localtime())
输出结果如下:
Begin:
2016-01-18 20:36:41
model constructed:
2016-01-18 20:37:08
Traceback (most recent call last):
File "C:\Users\robbert\Desktop\bigDataCompetition\tianyi\pythonproject2\train_randomForest\train_randomForest.py", line 32, in <module>
testlabel = clf.predict(testfeatures)
File "C:\Python27\lib\site-packages\sklearn\ensemble\forest.py", line 498, in predict
proba = self.predict_proba(X)
File "C:\Python27\lib\site-packages\sklearn\ensemble\forest.py", line 547, in predict_proba
for e in self.estimators_)
File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 804, in __call__
while self.dispatch_one_batch(iterator):
File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 662, in dispatch_one_batch
self._dispatch(tasks)
File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 570, in _dispatch
job = ImmediateComputeBatch(batch)
File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 183, in __init__
self.results = batch()
File "C:\Python27\lib\site-packages\sklearn\externals\joblib\parallel.py", line 72, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "C:\Python27\lib\site-packages\sklearn\ensemble\forest.py", line 125, in _parallel_helper
return getattr(obj, methodname)(*args, **kwargs)
File "C:\Python27\lib\site-packages\sklearn\tree\tree.py", line 673, in predict_proba
proba = self.tree_.predict(X)
File "sklearn/tree/_tree.pyx", line 736, in sklearn.tree._tree.Tree.predict (sklearn\tree\_tree.c:8449)
File "sklearn/tree/_tree.pyx", line 738, in sklearn.tree._tree.Tree.predict (sklearn\tree\_tree.c:8321)
MemoryError