spark分区内数据的获取

geyuan5263 2017-02-17 04:00:27

data.mapPartitionsWithIndex
{
（index,points）=>
}
怎么在大括号中访问index+i分区中的数据呢，新手请教！

...全文

721 3 打赏收藏转发到动态举报

写回复

用AI写文章

3 条回复

切换为时间正序

请发表友善的回复…

发表回复

LinkSe7en 2017-02-22

打赏
举报

楼上已经解释得很清楚了。如果你需要跨行访问数据，请使用self join。

java8964 2017-02-21

打赏
举报

Why do you want to access other partition's data? mapParitions(func) or mapPartitionsWihIndex(func) are for performance optimization, which allow your function to be run once PER partition, that's why its the function type must be Iterator<T> => Iterator<U>. You access the whole parittion's data in one iterator, but should and can NOT access other partitions' data.

只要开始永远不晚 2017-02-21

打赏
举报

Why do you want to access other partition's data? 你为什么想访问另一个分区的数据？ mapParitions(func) or mapPartitionsWihIndex(func) are for performance optimization, which allow your function to be run once PER partition, that's why its the function type must be Iterator<T> => Iterator<U>. You access the whole parittion's data in one iterator, but should and can NOT access other partitions' data. mapParitions(func) 或mapPartitionsWihIndex(func) 是优化时用到的，这些操作允许你依次访问每个分区，这就是为什这个函数提供一个Iterator迭代引用给你，你可以通过这个迭代器遍历分区内的全部数据，但是一个分区的迭代器不能访问其他分区的数据。