mapreduce排序 中的问题

cky864 2014-11-20 04:11:38
以下是我从WordCount源码改成功能为排序的代码。

其中输入文件是:
file1
2
32
654
32
15
756
65223

file2
5956
22
650

file3
26
54
6
想要产生的输出文件:
2 1
6 2
15 3
22 4
26 5
32 6
32 7
54 8
650 9
654 10
756 11
5956 12
65223 13
如果利用
word.set( Integer.parseInt(value.toString()));
context.write(word, one);
替换成
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set( Integer.parseInt(itr.nextToken()));
context.write(word, one);
}
则不能够成功产生想要的结果。
产生的结果如下:
生成里output文件夹,但是文件夹内没有文件。
控制台没有输出。
其reducer阶段没有执行:
job.setReducerClass(IntSumReducer.class);

将// job.setCombinerClass(IntSumReducer.class);的注释去掉,能够执行Combiner。
即控制台能够产生以下输出:
1 FFFF 2HHH2
2 FFFF 15HHH15
3 FFFF 324 FFFF 32HHH32
5 FFFF 654HHH654
6 FFFF 756HHH756
7 FFFF 65223HHH65223
8 FFFF 22HHH22
9 FFFF 650HHH650
10 FFFF 5956HHH5956
11 FFFF 6HHH6
12 FFFF 26HHH26
13 FFFF 54HHH54
执行了Combiner但没有执行reducer产生里output文件夹但是文件夹中没有文件


import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

public static class TokenizerMapper
extends Mapper<Object, Text, IntWritable, IntWritable>{

private final static IntWritable one = new IntWritable(1);
private IntWritable word = new IntWritable();

public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
// 如果将下面2行注释代码和后面的4行替换,不能够成功生成输出文件。
/*
word.set( Integer.parseInt(value.toString()));
context.write(word, one);
*/
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set( Integer.parseInt(itr.nextToken()));
context.write(word, one);
}
}
}

public static class IntSumReducer
extends Reducer<IntWritable,IntWritable,IntWritable,IntWritable> {
private IntWritable result = new IntWritable();
private static int sum = 1;
public void reduce(IntWritable key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
for (IntWritable val : values) {
// System.out.print(sum);
System.out.print(sum+" FFFF "+key);
result.set(sum);
context.write(key, result);
sum++;
}
System.out.println("HHH"+key);

}
}

public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length < 2) {
System.err.println("Usage: wordcount <in> [<in>...] <out>");
System.exit(2);
}
@SuppressWarnings("deprecation")
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
// job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class);
for (int i = 0; i < otherArgs.length - 1; ++i) {
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job,
new Path(otherArgs[otherArgs.length - 1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
产生不一致的结果的原因是什么
word.set( Integer.parseInt(value.toString()));
context.write(word, one);

StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set( Integer.parseInt(itr.nextToken()));
context.write(word, one);
}
产生的结果有什么不一样。
...全文
490 7 打赏 收藏 转发到动态 举报
写回复
用AI写文章
7 条回复
切换为时间正序
请发表友善的回复…
发表回复
tchqiq 2014-12-12
  • 打赏
  • 举报
回复
好像第一次写mr的时候也遇到过楼主这样的问题 就是截取字符串怎么也不对 很奇怪 后来没遇到,不知道是不是hadoop升级了的原因
cky864 2014-11-24
  • 打赏
  • 举报
回复
引用 2 楼 baifanwudi 的回复:
感觉你的configuration中少了Map阶段的key-value格式吧 job.setMapOutputKeyClass(IntWritable.class); job.setMapOutputValueClass(IntWritable.class);
加上这2行,结果没变化。只产生文件夹。
cky864 2014-11-24
  • 打赏
  • 举报
回复
引用 4 楼 baifanwudi 的回复:
我试了你的程序,走的是StringTokenizer ,没什么问题。 加了setcombiner那行结果是 2 14 6 15 15 16 22 17 26 18 32 19 32 20 54 21 650 22 654 23 756 24 5956 25 65223 26 注销combiner那行,结果是 2 1 6 2 15 3 22 4 26 5 32 6 32 7 54 8 650 9 654 10 756 11 5956 12 65223 13 自己多尝试就行了。
StringTokenizer是可以出结果的 但是 word.set( Integer.parseInt(value.toString())); context.write(word, one); 出不了结果。
小白鸽 2014-11-24
  • 打赏
  • 举报
回复
我不知道为什么你的不出结果,你注意看3个地方 14/11/24 14:59:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/11/24 14:59:11 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 14/11/24 14:59:11 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 14/11/24 14:59:11 INFO input.FileInputFormat: Total input paths to process : 3 14/11/24 14:59:11 WARN snappy.LoadSnappy: Snappy native library not loaded 14/11/24 14:59:11 INFO mapred.JobClient: Running job: job_local_0001 14/11/24 14:59:11 INFO mapred.Task: Using ResourceCalculatorPlugin : null 14/11/24 14:59:11 INFO mapred.MapTask: io.sort.mb = 100 14/11/24 14:59:11 INFO mapred.MapTask: data buffer = 79691776/99614720 14/11/24 14:59:11 INFO mapred.MapTask: record buffer = 262144/327680 14/11/24 14:59:11 INFO mapred.MapTask: Starting flush of map output 14/11/24 14:59:11 INFO mapred.MapTask: Finished spill 0 14/11/24 14:59:11 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting 14/11/24 14:59:12 INFO mapred.JobClient: map 0% reduce 0% 14/11/24 14:59:14 INFO mapred.LocalJobRunner: 14/11/24 14:59:14 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done. 14/11/24 14:59:14 INFO mapred.Task: Using ResourceCalculatorPlugin : null 14/11/24 14:59:14 INFO mapred.MapTask: io.sort.mb = 100 14/11/24 14:59:14 INFO mapred.MapTask: data buffer = 79691776/99614720 14/11/24 14:59:14 INFO mapred.MapTask: record buffer = 262144/327680 14/11/24 14:59:14 INFO mapred.MapTask: Starting flush of map output 14/11/24 14:59:14 INFO mapred.MapTask: Finished spill 0 14/11/24 14:59:14 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting 14/11/24 14:59:15 INFO mapred.JobClient: map 100% reduce 0% 14/11/24 14:59:17 INFO mapred.LocalJobRunner: 14/11/24 14:59:17 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done. 14/11/24 14:59:17 INFO mapred.Task: Using ResourceCalculatorPlugin : null 14/11/24 14:59:17 INFO mapred.MapTask: io.sort.mb = 100 14/11/24 14:59:17 INFO mapred.MapTask: data buffer = 79691776/99614720 14/11/24 14:59:17 INFO mapred.MapTask: record buffer = 262144/327680 14/11/24 14:59:17 INFO mapred.MapTask: Starting flush of map output 14/11/24 14:59:17 INFO mapred.MapTask: Finished spill 0 14/11/24 14:59:17 INFO mapred.Task: Task:attempt_local_0001_m_000002_0 is done. And is in the process of commiting 14/11/24 14:59:20 INFO mapred.LocalJobRunner: 14/11/24 14:59:20 INFO mapred.Task: Task 'attempt_local_0001_m_000002_0' done. 14/11/24 14:59:20 INFO mapred.Task: Using ResourceCalculatorPlugin : null 14/11/24 14:59:20 INFO mapred.LocalJobRunner: 14/11/24 14:59:20 INFO mapred.Merger: Merging 3 sorted segments 14/11/24 14:59:20 INFO mapred.Merger: Down to the last merge-pass, with 3 segments left of total size: 136 bytes 14/11/24 14:59:20 INFO mapred.LocalJobRunner: 1 FFFF 2HHH2 2 FFFF 6HHH6 3 FFFF 15HHH15 4 FFFF 22HHH22 5 FFFF 26HHH26 6 FFFF 327 FFFF 32HHH32 8 FFFF 54HHH54 9 FFFF 650HHH650 10 FFFF 654HHH654 11 FFFF 756HHH756 12 FFFF 5956HHH5956 13 FFFF 65223HHH65223 14/11/24 14:59:20 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting 14/11/24 14:59:20 INFO mapred.LocalJobRunner: 14/11/24 14:59:20 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now 14/11/24 14:59:20 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://192.168.205.128:9000/tmp/out 14/11/24 14:59:23 INFO mapred.LocalJobRunner: reduce > reduce 14/11/24 14:59:23 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done. 14/11/24 14:59:24 INFO mapred.JobClient: map 100% reduce 100% 14/11/24 14:59:24 INFO mapred.JobClient: Job complete: job_local_0001 14/11/24 14:59:24 INFO mapred.JobClient: Counters: 19 14/11/24 14:59:24 INFO mapred.JobClient: File Output Format Counters 14/11/24 14:59:24 INFO mapred.JobClient: Bytes Written=75 14/11/24 14:59:24 INFO mapred.JobClient: FileSystemCounters 14/11/24 14:59:24 INFO mapred.JobClient: FILE_BYTES_READ=3420 14/11/24 14:59:24 INFO mapred.JobClient: HDFS_BYTES_READ=177 14/11/24 14:59:24 INFO mapred.JobClient: FILE_BYTES_WRITTEN=165780 14/11/24 14:59:24 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=75 14/11/24 14:59:24 INFO mapred.JobClient: File Input Format Counters 14/11/24 14:59:24 INFO mapred.JobClient: Bytes Read=52 14/11/24 14:59:24 INFO mapred.JobClient: Map-Reduce Framework 14/11/24 14:59:24 INFO mapred.JobClient: Reduce input groups=12 14/11/24 14:59:24 INFO mapred.JobClient: Map output materialized bytes=148 14/11/24 14:59:24 INFO mapred.JobClient: Combine output records=0 14/11/24 14:59:24 INFO mapred.JobClient: Map input records=13 14/11/24 14:59:24 INFO mapred.JobClient: Reduce shuffle bytes=0 14/11/24 14:59:24 INFO mapred.JobClient: Reduce output records=13 14/11/24 14:59:24 INFO mapred.JobClient: Spilled Records=26 14/11/24 14:59:24 INFO mapred.JobClient: Map output bytes=104 14/11/24 14:59:24 INFO mapred.JobClient: Total committed heap usage (bytes)=1266155520 14/11/24 14:59:24 INFO mapred.JobClient: Combine input records=0 14/11/24 14:59:24 INFO mapred.JobClient: Map output records=13 14/11/24 14:59:24 INFO mapred.JobClient: SPLIT_RAW_BYTES=321 14/11/24 14:59:24 INFO mapred.JobClient: Reduce input records=13
小白鸽 2014-11-24
  • 打赏
  • 举报
回复
我试了你的程序,走的是StringTokenizer ,没什么问题。 加了setcombiner那行结果是 2 14 6 15 15 16 22 17 26 18 32 19 32 20 54 21 650 22 654 23 756 24 5956 25 65223 26 注销combiner那行,结果是 2 1 6 2 15 3 22 4 26 5 32 6 32 7 54 8 650 9 654 10 756 11 5956 12 65223 13 自己多尝试就行了。
小白鸽 2014-11-21
  • 打赏
  • 举报
回复
感觉你的configuration中少了Map阶段的key-value格式吧 job.setMapOutputKeyClass(IntWritable.class); job.setMapOutputValueClass(IntWritable.class);
少主无翼 2014-11-20
  • 打赏
  • 举报
回复
感觉怪怪的,Mapper的key应该是LongWritable吧,默认的TextInputFomrat好像是LongWritable,不过不知道Object有没有问题

20,808

社区成员

发帖
与我相关
我的任务
社区描述
Hadoop生态大数据交流社区,致力于有Hadoop,hive,Spark,Hbase,Flink,ClickHouse,Kafka,数据仓库,大数据集群运维技术分享和交流等。致力于收集优质的博客
社区管理员
  • 分布式计算/Hadoop社区
  • 涤生大数据
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧