Mapreduce代码设计与分析--沈岩(第二次报告总结)

weixin_38120312 2011-12-23 02:39:03

<div><span class="Apple-style-span" style="line-height: 18px; font-size: 11.6667px; color: rgb(102, 102, 102); font-family: Arial; -webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; "><p style="line-height: normal; ">mapreduce程序设计</p><p style="line-height: normal; ">import java.io.IOException;<br style="line-height: normal; ">import org.apache.hadoop.conf.Configuration;<br style="line-height: normal; ">import org.apache.hadoop.conf.Configured;<br style="line-height: normal; ">import org.apache.hadoop.fs.Path;<br style="line-height: normal; ">import org.apache.hadoop.io.LongWritable;<br style="line-height: normal; ">import org.apache.hadoop.io.Text;<br style="line-height: normal; ">import org.apache.hadoop.mapreduce.Job;<br style="line-height: normal; ">import org.apache.hadoop.mapreduce.Mapper;<br style="line-height: normal; ">import org.apache.hadoop.mapreduce.Reducer;<br style="line-height: normal; ">import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;<br style="line-height: normal; ">import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;<br style="line-height: normal; ">import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;<br style="line-height: normal; ">import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;</p><p style="line-height: normal; ">import org.apache.hadoop.util.Tool;                 </p><p style="line-height: normal; ">/*org.apache.hadoop.mapreduce.lib. 取代org.apache.hadoop.mapred.xxx,这里的改变让程序员修改代码时会更加的方便,比原来能够少写很多代码</p><p style="line-height: normal; ">the old API++++++++++++++++++++++++++++++++++++++++++++++++++</p><p style="line-height: normal; ">public static class MapClass extends MapReduceBase<br style="line-height: normal; ">implements Mapper<K1, V1, K2, V2> {<br style="line-height: normal; ">public void map(K1 key, V1 value,<br style="line-height: normal; ">OutputCollector<K2, V2> output,<br style="line-height: normal; ">Reporter reporter) throws IOException { }<br style="line-height: normal; ">}<br style="line-height: normal; ">public static class Reduce extends MapReduceBase<br style="line-height: normal; ">implements Reducer<K2, V2, K3, V3> {<br style="line-height: normal; ">public void reduce(K2 key, Iterator<V2> values,<br style="line-height: normal; ">OutputCollector<K3, V3> output,<br style="line-height: normal; ">Reporter reporter) throws IOException { }<br style="line-height: normal; ">}</p><p style="line-height: normal; ">The new API ++++++++++++++++++++++++++++++++++++++++++++++++<br style="line-height: normal; ">public static class MapClass extends Mapper<K1, V1, K2, V2> {<br style="line-height: normal; ">public void map(K1 key, V1 value, Context context)<br style="line-height: normal; ">throws IOException, InterruptedException { }<br style="line-height: normal; ">}</p><p style="line-height: normal; ">public static class Reduce extends Reducer<K2, V2, K3, V3> {<br style="line-height: normal; ">public void reduce(K2 key, Iterable<V2> values, Context context)<br style="line-height: normal; ">throws IOException, InterruptedException { }<br style="line-height: normal; ">}<br style="line-height: normal; "></p><p style="line-height: normal; ">*/<br style="line-height: normal; ">import org.apache.hadoop.util.ToolRunner;<br style="line-height: normal; "><br style="line-height: normal; ">public class tt extends Configured implements Tool {<br style="line-height: normal; ">public static class MapClass<br style="line-height: normal; ">extends Mapper<LongWritable, Text, Text, Text> {<br style="line-height: normal; ">public void map(LongWritable key, Text value, Context context)<br style="line-height: normal; ">throws IOException, InterruptedException {<br style="line-height: normal; ">String[] citation = value.toString().split(",");//split的作用是将该字符串里面的变量赋值给citation这个字符串数组当中。<br style="line-height: normal; ">context.write(new Text(citation[1]), new Text(citation[0]));  //使用新的API取代了collect相关的API,将map中的key和value进行了互换。<br style="line-height: normal; ">}<br style="line-height: normal; ">}<br style="line-height: normal; ">public static class Reduce extends Reducer<Text, Text, Text, Text> {  //前两个参数设置是输入参数,后两个参数是输出参数。<br style="line-height: normal; ">public void reduce(Text key, Iterable<Text> values,<br style="line-height: normal; ">Context context)<br style="line-height: normal; ">throws IOException, InterruptedException {<br style="line-height: normal; ">String csv ="";<br style="line-height: normal; "><br style="line-height: normal; ">for (Text val:values) {//Text类型是类似于String类型的文本格式,但是在处理编码上还是和String有差别,与内存序列化有关,是hadoop经过封装之后的新类。<br style="line-height: normal; ">if (csv.length() > 0) csv += ",";<br style="line-height: normal; ">csv += val.toString();<br style="line-height: normal; ">}<br style="line-height: normal; "><br style="line-height: normal; ">context.write(key, new Text(csv));<br style="line-height: normal; ">}<br style="line-height: normal; ">}<br style="line-height: normal; ">public int run(String[] args) throws Exception {  //由hadoop本身调用该程序<br style="line-height: normal; ">Configuration conf = getConf();<br style="line-height: normal; ">Job job = new Job(conf, "tt"); //利用job取代了jobclient<br style="line-height: normal; ">job.setJarByClass(tt.class);<br style="line-height: normal; ">Path in = new Path(args[0]);<br style="line-height: normal; ">Path out = new Path(args[1]);<br style="line-height: normal; ">FileInputFormat.setInputPaths(job, in);<br style="line-height: normal; ">FileOutputFormat.setOutputPath(job, out);<br style="line-height: normal; "><span style="line-height: normal; color: rgb(255, 0, 0); ">job.setMapperClass(MapClass.class);</span><br style="line-height: normal; "><span style="line-height: normal; color: rgb(255, 0, 0); ">job.setReducerClass(Reduce.class);</span><br style="line-height: normal; "><span style="line-height: normal; color: rgb(255, 0, 0); ">job.setInputFormatClass(TextInputFormat.class);</span><br style="line-height: normal; "><span style="line-height: normal; color: rgb(255, 0, 0); ">job.setOutputFormatClass(TextOutputFormat.class);</span><br style="line-height: normal; "><span style="line-height: normal; color: rgb(255, 0, 0); ">job.setOutputKeyClass(Text.class);</span><br style="line-height: normal; "><span style="line-height: normal; color: rgb(255, 0, 0); ">job.setOutputValueClass(Text.class); </span> //此处如果不进行设置,系统会抛出异常,还要记住新旧API不能混用<br style="line-height: normal; ">System.exit(job.waitForCompletion(true)?0:1);<br style="line-height: normal; ">return 0;<br style="line-height: normal; ">}<br style="line-height: normal; ">public static void main(String[] args) throws Exception {<br style="line-height: normal; ">int res = ToolRunner.run(new Configuration(), new tt(), args);    //调用新的类的方法免除配置的相关琐碎的细节<br style="line-height: normal; ">System.exit(res);<br style="line-height: normal; ">}<br style="line-height: normal; ">}</p><p style="line-height: normal; ">上面的代码在eclipse中是可以运行的,但是输入文件是hadoop in action中的文件cite75_99.TXT,</p><p style="line-height: normal; ">格式如下:</p><p style="line-height: normal; ">[root@asus input]# head -n 5 cite75_99.txt <br style="line-height: normal; ">"CITING","CITED"<br style="line-height: normal; ">3858241,956203<br style="line-height: normal; ">3858241,1324234<br style="line-height: normal; ">3858241,3398406<br style="line-height: normal; ">3858241,3557384</p><p style="line-height: normal; ">我写的这个例子开始就是这样报错<span style="line-height: normal; "><span style="line-height: normal; ">org</span>.<span style="line-height: normal; ">apache</span>.<span style="line-height: normal; ">hadoop</span>.<span style="line-height: normal; ">io</span>.<span style="line-height: normal; ">LongWritable</span> <span style="line-height: normal; ">cannot</span> </span><br style="line-height: normal; "><span style="line-height: normal; "><span style="line-height: normal; ">be</span> <span style="line-height: normal; ">cast</span> <span style="line-height: normal; ">to</span> <span style="line-height: normal; ">org</span>.<span style="line-height: normal; ">apache</span>.<span style="line-height: normal; ">hadoop</span>.<span style="line-height: normal; ">io</span>.<span style="line-height: normal; ">Text</span> 然后按照上面的程序修改调用了新的API 就能够有效的将key的类型设置成Text,我用红颜色标记的部分是必须要这样写的 因为设置Text必须要在map reduce 和conf中同时设置才管用。我的邮箱是shenyanxxxy@qq.com 如果有hadoop的兴趣爱好者可以联系我 我们共同来商讨。</span></p></span></div><div><br></div>
...全文
19 回复 打赏 收藏 转发到动态 举报
写回复
用AI写文章
回复
切换为时间正序
请发表友善的回复…
发表回复

435

社区成员

发帖
与我相关
我的任务
社区描述
其他技术讨论专区
其他 技术论坛(原bbs)
社区管理员
  • 其他技术讨论专区社区
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧