R语言中k均值聚类对数据量的要求,我有一个数据量大概200多万样本,变量七个,在确定k值的时候,总是提醒数据量太大如何破?

Watch_dou 2017-07-12 10:54:53
R语言中k均值聚类对数据量的要求,我有一个数据量大概200多万样本,变量七个,在确定k值的时候,总是提醒数据量太大如何破?
#图形确定最佳K
wssplot <- function(data,nc=15,seed=1234){
wss <- (nrow(data)-1)*sum(apply(data,2,var))
for (i in 2:nc){
set.seed(seed)
wss[i] <- sum(kmeans(data,centers = i)$withinss)
}
plot(1:nc,wss,type='b',xlab = 'Number of Clusters',
ylab = 'Whithin groups sum of squares')
}
wssplot(norm_data)
结果总是出现:
> wssplot(norm_data)
Error: cannot allocate vector of size 132.3 Mb
Called from: aperm.default(X, c(s.call, s.ans))

如何解决?????

...全文
1088 3 打赏 收藏 转发到动态 举报
AI 作业
写回复
用AI写文章
3 条回复
切换为时间正序
请发表友善的回复…
发表回复
wung888888 2017-08-13
  • 打赏
  • 举报
回复
Consider whether you really need all this data explicitly, or can the matrix be sparse? There is good support in R (see Matrix package for e.g.) for sparse matrices. Keep all other processes and objects in R to a minimum when you need to make objects of this size. Use gc() to clear now unused memory, or, better only create the object you need in one session. If the above cannot help, get a 64-bit machine with as much RAM as you can afford, and install 64-bit R. If you cannot do that there are many online services for remote computing. If you cannot do that the memory-mapping tools like package ff (or bigmemory as Sascha mentions) will help you build a new solution. In my limited experience ff is the more advanced package, but you should read the High Performance Computing topic on CRAN Task Views. https://stackoverflow.com/questions/5171593/r-memory-management-cannot-allocate-vector-of-size-n-mb
zara 2017-07-12
  • 打赏
  • 举报
回复
R语言,没接触过。看那提示,通常是可用物理内存不够大吧
  • 打赏
  • 举报
回复
才132.3 Mb,15年前的电脑应该都没问题。

3,424

社区成员

发帖
与我相关
我的任务
社区描述
其他开发语言 其他开发语言
社区管理员
  • 其他开发语言社区
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧