关于MapReduce仿真

niuliwan 2013-05-18 10:27:47

我手上有个用C++写的MapReduce的仿真程序，就是仿真MapReduce工作过程的，哪位大神知道那个程序该怎么运行。（我的hadoop集群环境是一台安装ubuntu系统的物理机并在上面装有两台ubuntu虚拟机，其中物理机作为namenode和jobtracker,两台虚拟机作为datanode和tasktracker）。
谢谢各位大神了，求交流！！

...全文

377 15 打赏收藏转发到动态举报

写回复

用AI写文章

15 条回复

切换为时间正序

请发表友善的回复…

发表回复

Miaoer1 2013-06-01

打赏
举报

楼上大师啊！

撸大湿 2013-06-01

打赏
举报

引用 10 楼 w454694219 的回复:

[quote=引用 9 楼 tntzbzc 的回复:] [quote=引用 8 楼 chenxs_03 的回复:] 这个程序应该是用的hadoop的streaming，直接用io，你的C++ 相当于下面的-mapper中的/bin/cat或、 -reducer 的/bin/wc $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \ -input myInputDirs \ -output myOutputDir \ -mapper /bin/cat \ -reducer /bin/wc 参考http://hadoop.apache.org/docs/stable/streaming.html

LZ这个MR不是HADOOP标准版的，楼上你确定可以把LZ的C++ MR通过streaming跑HADOOP MR？我比较赞同7楼的观点有哪位网友能在HADOOP上跑通，麻烦分享一下，谢谢

引用 7 楼 dickens88 的回复:

目测楼主被坑了

[/quote] 这个程序应该跟 hadoop的系统没有关系，它应该是直接在Linux下跑的，只是仿真MapReduce处理数据的过程。[/quote] 是哦，而且只能在单机上跑，不能做分布式协同处理

静夜思555 2013-06-01

打赏
举报

引用 9 楼 tntzbzc 的回复:

[quote=引用 8 楼 chenxs_03 的回复:] 这个程序应该是用的hadoop的streaming，直接用io，你的C++ 相当于下面的-mapper中的/bin/cat或、 -reducer 的/bin/wc $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \ -input myInputDirs \ -output myOutputDir \ -mapper /bin/cat \ -reducer /bin/wc 参考http://hadoop.apache.org/docs/stable/streaming.html

LZ这个MR不是HADOOP标准版的，楼上你确定可以把LZ的C++ MR通过streaming跑HADOOP MR？我比较赞同7楼的观点有哪位网友能在HADOOP上跑通，麻烦分享一下，谢谢

引用 7 楼 dickens88 的回复:

目测楼主被坑了

[/quote] 这个程序应该跟 hadoop的系统没有关系，它应该是直接在Linux下跑的，只是仿真MapReduce处理数据的过程。

静夜思555 2013-06-01

打赏
举报

还需要写个仿真版的MR作业

静夜思555 2013-06-01

打赏
举报

采用多线程仿真

chenxs_03 2013-05-26

打赏
举报

这个程序应该是用的hadoop的streaming，直接用io，你的C++ 相当于下面的-mapper中的/bin/cat或、 -reducer 的/bin/wc $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \ -input myInputDirs \ -output myOutputDir \ -mapper /bin/cat \ -reducer /bin/wc 参考http://hadoop.apache.org/docs/stable/streaming.html

撸大湿 2013-05-26

打赏
举报

引用 8 楼 chenxs_03 的回复:

这个程序应该是用的hadoop的streaming，直接用io，你的C++ 相当于下面的-mapper中的/bin/cat或、
-reducer 的/bin/wc

$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper /bin/cat \
-reducer /bin/wc
参考http://hadoop.apache.org/docs/stable/streaming.html

LZ这个MR不是HADOOP标准版的，楼上你确定可以把LZ的C++ MR通过streaming跑HADOOP MR？
我比较赞同7楼的观点
有哪位网友能在HADOOP上跑通，麻烦分享一下，谢谢

引用 7 楼 dickens88 的回复:

目测楼主被坑了

dickens88 2013-05-24

打赏
举报

目测楼主被坑了

niuliwan 2013-05-20

打赏
举报

是啊，是MapReduce的仿真程序，仿真MapReduce过程的。我也不太懂，刚刚接触。

撸大湿 2013-05-19

打赏
举报

LZ
你这代码是HADOOP的C++ MAPREDUCE吗？
一行注释也没有，我眼拙，请问这是哪一路的MAPREDUCE？

niuliwan 2013-05-18

打赏
举报

不好意思，我刚开始接触 hadoop,不太懂。希望得到您的帮助。

niuliwan 2013-05-18

打赏
举报

#include <iostream> #include <vector> #include <map> #include <string> #include <list> // #define DBUG_PRINT(MSG) std::cerr<<MSG<<std::endl; #define DBUG_PRINT(MSG) namespace mapreduce { class Mapper; class Reducer; class MapReduceInput { std::string filebase; public: void set_filebase(std::string dirname); std::string get_filebase(); }; class MapReduceOutput { std::string filebase; public: void set_filebase(std::string dirname); std::string get_filebase(); }; class MapReduceSpecification { public: std::vector<MapReduceInput*> ilist; MapReduceOutput mr_out; Mapper *mapper; Reducer *reducer; int num_thr; std::vector<pthread_t> threads; public: MapReduceSpecification() : mapper(NULL), reducer(NULL) { this->set_threads(1); } MapReduceInput * add_input(); MapReduceOutput * output(); void set_mapper(Mapper *pm) { this->mapper = pm; } void set_reducer(Reducer *pr) { this->reducer = pr; } void set_threads(int _nthr) { this->num_thr = _nthr; this->threads.resize(this->num_thr); } }; class MapInput { public: std::string _line; std::string _file; int _lno; FILE *_pf; FILE *_pif; bool get_line(); MapInput(std::string _filename, FILE *_pfile, FILE *_pintermediate) : _file(_filename), _lno(0), _pf(_pfile), _pif(_pintermediate) { } public: std::string const& value() const { return this->_line; } std::string const& file() const { return this->_file; } }; class Mapper { public: MapInput *pmi; virtual void Map(const MapInput &input) = 0; bool Emit(std::string const &key, std::string const &value); virtual ~Mapper() { } }; typedef std::pair<std::string, std::string> string_pair_t; struct intermediate_sorter { bool operator()(string_pair_t const &lhs, string_pair_t const &rhs) { return lhs.first < rhs.first; } }; class ReduceInput { public: FILE *_pf; FILE *_pif; std::string _key; std::vector<string_pair_t> _intermediate; std::vector<string_pair_t>::iterator f, l; bool all_done; public: ReduceInput(FILE *_pfile, FILE *_pintermediate); void read_intermediate(); void sort_intermediate(); bool set_next_range(); std::string key() const { return this->_key; } std::string value() { return f->second; } void NextValue() { ++this->f; } bool done() { return this->f == this->l; } }; class Reducer { public: ReduceInput *pri; virtual void Reduce(ReduceInput *input) = 0; bool Emit(std::string const &str); virtual ~Reducer() { } }; class MapReduceResult { }; struct worker_data { MapReduceSpecification *pspec; std::list<std::string> *pfiles; MapReduceResult *pres; int thr_no; worker_data() : pspec(0), pfiles(0), pres(0), thr_no(0) { } }; void * worker_proc(void *vptr); bool MapReduce(MapReduceSpecification &spec, MapReduceResult &res); int StringToInt(std::string str); std::string IntToString(int i); }

niuliwan 2013-05-18

打赏
举报

#include "mapreduce.hpp" #include <sys/types.h> #include <dirent.h> #include <pthread.h> #include <istream> #include <ostream> #include <sstream> namespace mapreduce { int StringToInt(std::string str) //将字符串转化成整形 { std::istringstream sin(str); int ret; sin>>ret; return ret; } std::string IntToString(int i) //将整形转化成字符串 { std::ostringstream sout; sout<<i; return sout.str(); } std::vector<std::string> explode(char delim, std::string const &str) //把字符串分割成数组，字符delim前面的取出来 { std::vector<std::string> ret; std::string tmp; for (size_t i = 0; i < str.size(); ++i) { if (str[i] == delim) { ret.push_back(tmp); //如果等于delim 则插入tmp tmp = ""; } else { tmp += str[i]; } } if (!tmp.empty()) { ret.push_back(tmp); } return ret; } bool get_line(std::string &ret, FILE *pf) // 获取文件整行 *pf为文件首地址 { std::string tmp; int ch = fgetc(pf); while (ch != EOF && ch != '\n') { tmp += (char)ch; ch = fgetc(pf); } ret.swap(tmp); if (ch == EOF && ret.empty()) return false; return true; } void scandir(std::string dirname, std::vector<std::string> &files) //扫描文件 { DIR *dh = opendir(dirname.c_str()); if (!dh) return; struct dirent *pde = NULL; while ((pde = readdir(dh)) != NULL) { std::string fname(pde->d_name); if (fname == "." || fname == "..") continue; files.push_back(fname); } closedir(dh); } void MapReduceInput::set_filebase(std::string dirname) //将文件名输入，设置好文件基地址，相当于一个接口 { if (dirname.empty()) dirname = "/"; if (dirname[dirname.size() - 1] != '/') dirname += "/"; DBUG_PRINT("setting filebase to: "<<dirname); this->filebase = dirname; } std::string MapReduceInput::get_filebase() //获取文件基地址值 { if (this->filebase.empty()) { return "/"; } return this->filebase; } void MapReduceOutput::set_filebase(std::string dirname) { if (dirname.empty()) dirname = "/"; if (dirname[dirname.size() - 1] != '/') dirname += "/"; this->filebase = dirname; } std::string MapReduceOutput::get_filebase() { if (this->filebase.empty()) { return "/"; } return this->filebase; } MapReduceInput * MapReduceSpecification::add_input() { MapReduceInput *pin = new MapReduceInput; this->ilist.push_back(pin); return pin; } MapReduceOutput * MapReduceSpecification::output() { return &(this->mr_out); } bool Mapper::Emit(std::string const &key, std::string const &value) { std::string data = key + "\t" + value + "\n"; size_t ewt = fwrite(data.c_str(), data.size(), 1, this->pmi->_pif); if (ewt != 1) return false; return true; } void ReduceInput::read_intermediate() { std::string line; while (mapreduce::get_line(line, this->_pif)) { std::vector<std::string> data = explode('\t', line); line = ""; this->_intermediate.push_back(make_pair(data[0], data[1])); } } void ReduceInput::sort_intermediate() { std::sort(this->_intermediate.begin(), this->_intermediate.end(), intermediate_sorter()); } ReduceInput::ReduceInput(FILE *_pfile, FILE *_pintermediate) : _pf(_pfile), _pif(_pintermediate), all_done(false) { this->read_intermediate(); this->sort_intermediate(); this->l = this->f = this->_intermediate.begin(); if (!this->_intermediate.empty()) { this->_key = this->_intermediate[0].first; } else { this->all_done = true; } while (this->l != this->_intermediate.end() && this->l->first == this->_key) { ++this->l; } } bool ReduceInput::set_next_range() { if (this->l == this->_intermediate.end()) { this->all_done = true; return false; } this->f = this->l; this->_key = this->f->first; while (this->l != this->_intermediate.end() && this->l->first == this->_key) { ++this->l; } return true; } bool Reducer::Emit(std::string const &str) { std::string data = this->pri->_key + "\t" + str + "\n"; size_t ewt = fwrite(data.c_str(), data.size(), 1, this->pri->_pf); if (ewt != 1) return false; return true; } bool MapInput::get_line() { bool gl_ret = mapreduce::get_line(this->_line, this->_pf); if (gl_ret) ++this->_lno; return gl_ret; } void * worker_proc(void *vptr) { worker_data *pdata = (worker_data*)(vptr); /* Open the intermediate file in write mode. */ std::string ifile_name = "/tmp/mapreduce_intermediate." + IntToString(pdata->thr_no) + ".dat"; FILE *pif = fopen(ifile_name.c_str(), "w"); if (!pif) { std::cerr<<"Error opening intermediate output file: "<<ifile_name<<" for writing.\n"; return false; } /* For each file in the list of files. */ for (std::list<std::string>::iterator i = pdata->pfiles->begin(); i != pdata->pfiles->end(); ++i) { /* Open it and pass the opened file handle to the get_line() * function till there are lines to be read and also call the * Mapping function. */ std::cerr<<"Processing file: "<<*i<<std::endl; FILE *pf = fopen(i->c_str(), "r"); if (!pf) { std::cerr<<"Cound not open input file: "<<*i<<"\n"; continue; } MapInput mi(*i, pf, pif); pdata->pspec->mapper->pmi = &mi; while (mi.get_line()) { DBUG_PRINT("Processing line: "<<mi._line); pdata->pspec->mapper->Map(mi); } fclose(pf); } fclose(pif); return 0; } bool MapReduce(MapReduceSpecification &spec, MapReduceResult &res) { std::vector<std::list<std::string> > files; files.resize(spec.num_thr); int thr_no = 0; /* For each directory containing the input file(s). */ for (size_t i = 0; i < spec.ilist.size(); ++i) { /* Get the directory listing. */ std::vector<std::string> file_list; mapreduce::scandir(spec.ilist[i]->get_filebase(), file_list); /* For each file in this directory. */ for (size_t j = 0; j < file_list.size(); ++j) { /* Add the complete path to the list of files. */ std::string fname = spec.ilist[i]->get_filebase() + file_list[j]; files[thr_no].push_back(fname); ++thr_no; thr_no %= spec.num_thr; } } std::vector<worker_data> wdata; wdata.resize(spec.num_thr); /* For each thread. */ for (size_t i = 0; i < spec.num_thr; ++i) { /* Start a new thread for processing, and pass to it the files * to process. */ wdata[i].pspec = &spec; wdata[i].pfiles = &files[i]; wdata[i].pres = &res; wdata[i].thr_no = i; int pthr_ret = pthread_create(&spec.threads[i], NULL, worker_proc, &wdata[i]); std::cerr<<"Return code for thread("<<i<<"): "<<pthr_ret<<std::endl; if (pthr_ret) { std::cerr<<"Error creating thread no: "<<i<<std::endl; } } /* Wait for each parallel mapper to complete. */ for (size_t i = 0; i < spec.num_thr; ++i) { pthread_join(spec.threads[i], NULL); } /* Ideally, we would want to perform combining while doing the * mapping, then also the reduction in parallel and finally a * parallel merge, but since this is a simulation only, we just do * the mapping in parallel. */ /* Aggregate all the individual intermedite files into one. */ for (size_t i = 0; i < spec.num_thr; ++i) { /* Open the intermediate file in read mode. */ std::string ifile_name = "/tmp/mapreduce_intermediate." + IntToString(i) + ".dat"; FILE *pif = fopen(ifile_name.c_str(), "r"); if (!pif) { std::cerr<<"Error opening intermediate output file: "<<ifile_name<<" for reading.\n"; return false; } FILE *piof = fopen("/tmp/mapreduce_intermediate.dat", (i == 0 ? "w" : "a")); if (!piof) { std::cerr<<"Error opening intermediate output file for writing.\n"; return false; } /* Get one line from he intermediate inoput file and put it * into the intermediate output file. */ std::string tmp; while (get_line(tmp, pif)) { tmp += '\n'; fwrite(tmp.c_str(), tmp.size(), 1, piof); } fclose(piof); fclose(pif); } FILE *pif = fopen("/tmp/mapreduce_intermediate.dat", "r"); if (!pif) { std::cerr<<"Error opening intermeidate file for reading.\n"; return 0; } std::string ofname = spec.mr_out.get_filebase() + "out.dat"; FILE *pf = fopen(ofname.c_str(), "w"); if (!pf) { std::cerr<<"Error opening output file: "<<ofname<<"\n"; return false; } DBUG_PRINT("Now performing reduction."); /* Now perform reduction on the intermediate file. */ ReduceInput ri(pf, pif); spec.reducer->pri = &ri; while (!ri.all_done) { spec.reducer->Reduce(&ri); ri.set_next_range(); } fclose(pf); fclose(pif); return true; } }

头疼 2013-05-18