hadoop中使用TotalOrderOartitioner的全排序问题
近期想解决一下hadoop全排序的问题,参考了这位大神的代码
https://www.iteblog.com/archives/2147.html#TotalOrderPartitioner-2
输入数据用
#!/bin/sh
for i in {1..100000};do
echo $RANDOM
done;
这段代码
sh iteblog.sh > data1 产生。
但是我的输出中运用TotalOrderPartitioner产生了分割点,以SequenceFile格式存放的,但是我的3个reduce产生的文件只是文件内部有序,相互之间并不是全排序的关系,看了网上大部分的博客,运用TotalOrderPartitioner是可以实现全排序的结果的,我的代码如下:
hadoop版本2.7.3
Mapper程序:
public class SimpleMapper extends Mapper<Text, Text, Text, IntWritable> {
@Override
protected void map(Text key, Text value,Context context) throws IOException, InterruptedException {
IntWritable intWritable = new IntWritable(Integer.parseInt(key.toString()));
context.write(key, intWritable);
}
}
reducer程序:
public class SimpleReducer extends Reducer<Text, IntWritable, IntWritable, NullWritable> {
protected void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
for (IntWritable value : values)
context.write(value, NullWritable.get());
}
}
Driver程序:
public class SimpleDriver {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Total Order Sorting");
job.setJarByClass(SimpleDriver.class);
job.setInputFormatClass(KeyValueTextInputFormat.class);
job.setSortComparatorClass(KeyComparator.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setNumReduceTasks(3);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(NullWritable.class);
TotalOrderPartitioner.setPartitionFile(job.getConfiguration(), new Path(args[2]));
InputSampler.Sampler<Text, Text> sampler = new InputSampler.RandomSampler<>(0.01, 1000, 100);
InputSampler.writePartitionFile(job, sampler);
job.setPartitionerClass(TotalOrderPartitioner.class);
job.setMapperClass(SimpleMapper.class);
job.setReducerClass(SimpleReducer.class);
job.setJobName("iteblog");
if (!job.waitForCompletion(true))
return;
}
}
KeyComparator程序:
public class KeyComparator extends WritableComparator {
public int compare(WritableComparable w1, WritableComparable w2) {
int v1 = Integer.parseInt(w1.toString());
int v2 = Integer.parseInt(w2.toString());
return v1 - v2;
}
protected KeyComparator() {
super(Text.class, true);
}
}
如能解决,不胜感激