求助hadoop相关问题

zz_soliya 2020-04-13 09:15:36

今日在学习hadoop相关知识，但对输入输出有些迷惑。如图所示，我输入输出是在FileIn/outputFormat里进行，但为什么这篇代码有有args和otherArgs两种，以及，如果我要修改成自己的hdfs路径，我该怎么去修改呢？

public class TrainWork {

	public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {

		Configuration conf = new Configuration();

		String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

		if (otherArgs.length != 2) {

			System.err.println("Usage: SplitWord <in> <out>");

			System.exit(2);

		}

		conf.set("output", args[1]);

		// 统计分类文档数和总文档数量

		Job job = new Job(conf, "FileCount"); // 设置一个用户定义的job名称

		job.setJarByClass(FileTotal.class);

		job.setMapperClass(FileCountMapper.class); // 为job设置Mapper类

		job.setReducerClass(FileCountReducer.class); // 为job设置Reducer类

		job.setOutputKeyClass(Text.class); // 为job的输出数据设置Key类

		job.setOutputValueClass(Text.class); // 为job输出设置value类

		job.setInputFormatClass(TextInputFormat.class);

		FileInputFormat.addInputPath(job, new Path(otherArgs[0])); // 为job设置输入路径

		FileOutputFormat.setOutputPath(job, new Path(otherArgs[1] + "/fileCount"));// 为job设置输出路径

		// job.setNumReduceTasks(10);

		job.waitForCompletion(true);



		// 训练集

		Job job1 = new Job(conf, "splitword"); // 设置一个用户定义的job名称

		job1.setJarByClass(ModelTrain.class);

		job1.setMapperClass(ModelTrainMapper.class); // 为job设置Mapper类

		// job1.setCombinerClass(ModelTrainCombiner.class);

		job1.setReducerClass(ModelTrainReducer.class); // 为job设置Reducer类

		job1.setOutputKeyClass(Text.class); // 为job的输出数据设置Key类

		job1.setOutputValueClass(Text.class); // 为job输出设置value类

		job1.setInputFormatClass(TextInputFormat.class);

		FileInputFormat.addInputPath(job1, new Path(otherArgs[0])); // 为job设置输入路径

		FileOutputFormat.setOutputPath(job1, new Path(otherArgs[1] + "/splitWord"));// 为job设置输出路径

		// job1.setNumReduceTasks(10);

		job1.waitForCompletion(true);

	}

}

...全文

109 4 打赏收藏转发到动态举报

写回复

用AI写文章

4 条回复

切换为时间正序

请发表友善的回复…

发表回复

DanielMaster 2020-04-14

打赏
举报

otherArgs加hdfs://节点名就可以直接指定路径到Hdfs，不加的话就是在本地路径。不需要修改代码。

shuai7boy 2020-04-13

打赏
举报

引用 2 楼 zz_soliya 的回复:

[quote=引用 1 楼 shuai7boy 的回复:] 首先， otherArgs是基于args来创建的，应该对参数的一个处理。其次，自己设置路径的话使用 conf.set("fs.defaultFS", "hdfs://master:9000"); 进行设置

我又看了一下，发现可以在run as中区设置输入输出的路径，意思是我可以不去修改代码部分是吗？但是我去输出并没有在hdfs中的output找到part-r-00000文件，这是什么原因呢？麻烦解答一下，万分感谢 [/quote] 也可以使用传递参数处理，但是你得设置conf接收设置参数，说到底还是得修改代码。

zz_soliya 2020-04-13