今天第一次搭建了一个hadoop集群,发个帖记录一下,方便查询也期待各位高手指教。
系统环境
3台linux主机(221.10.38.1,221.10.38.2,221.10.38.3 ) hadoop版本:hadoop-0.20.2-cdh3u0
目标:
在3台主机上搭建hadoop集群,一个namenode(221.10.38.1) 2个datanode(221.10.38.2,221.10.38.3)
前提:
3台主机分别已安装了hadoop-0.20.2-cdh3u0,且是相同用户,相同目录,namenode和2个datanode之间ssh配置也已完成。(这里我有个疑问,namenode到各datanode的ssh是否必须相互ssh无密码登录?另外2个datanode是否需要互相无密码ssh登录?)
我在这里只做了namenode和2个datanode的互相ssh。
步骤一:
配置各主机/etc/hosts
221.10.38.1作为namenode的hosts文件大概是这样的:
127.0.0.1 hadooptest1 localhost
221.10.38.1 hadooptest1 hadooptest1
221.10.38.2 hadooptest2 hadooptest2
221.10.38.3 hadooptest3 hadooptest3
221.10.38.2作为datenode的hosts文件大概是这样的:
127.0.0.1 hadooptest2 localhost
221.10.38.1 hadooptest1 hadooptest1
221.10.38.2 hadooptest2 hadooptest2
221.10.38.3 hadooptest3 hadooptest3
221.10.38.3同样hosts是这样:
127.0.0.1 hadooptest3 localhost
221.10.38.1 hadooptest1 hadooptest1
221.10.38.2 hadooptest2 hadooptest2
221.10.38.3 hadooptest3 hadooptest3
步骤二:
修改hadoop配置文件
1,core-site.xml
2. mapred-site.xml
3.hdfs-site.xml
4.master
5.salves
步骤三:
复制配置文件到各datanode
scp core-site.xml hadooptest2:/home/hadoop/hadoop-0.20.2-cdh3u0/conf
scp core-site.xml hadooptest3:/home/hadoop/hadoop-0.20.2-cdh3u0/conf
scp mapred-site.xml hadooptest2:/home/hadoop/hadoop-0.20.2-cdh3u0/conf
scp mapred-site.xml hadooptest3:/home/hadoop/hadoop-0.20.2-cdh3u0/conf
。
。
。
。
步骤四:
格式化namenode
/bin/hadoop namenode -format
启动[hadoop@hadooptest1 bin]$ ./start-all.sh
如果没问题的话应该就是启动成功!
但是大部分情况是不会一次启动成功
我遇到的第一个问题。namenode 和jobtrask都起来了 但是datanode都没有起来。
到datanode节点查看日志 hadoop-hadoop-datanode-hadooptest3.log
2012-01-12 11:52:36,381 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = hadooptest3/10.10.36.218
STARTUP_MSG: args = []
STARTUP_MSG: version = 0.20.2-cdh3u0
STARTUP_MSG: build = -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14; compiled by 'hudson' on Fri Mar 25 19:56:23 PDT 2011
************************************************************/
2012-01-12 11:52:36,922 INFO org.apache.hadoop.security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing.
2012-01-12 11:52:38,033 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.10.38.29:9000. Already tried 0 time(s).
2012-01-12 11:52:39,038 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.10.38.29:9000. Already tried 1 time(s).
2012-01-12 11:52:40,043 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.10.38.29:9000. Already tried 2 time(s).
2012-01-12 11:52:41,048 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.10.38.29:9000. Already tried 3 time(s).
2012-01-12 11:52:42,052 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.10.38.29:9000. Already tried 4 time(s).
2012-01-12 11:52:43,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.10.38.29:9000. Already tried 5 time(s).
2012-01-12 11:52:44,060 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.10.38.29:9000. Already tried 6 time(s).
2012-01-12 11:52:45,064 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.10.38.29:9000. Already tried 7 time(s).
2012-01-12 11:52:46,068 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.10.38.29:9000. Already tried 8 time(s).
2012-01-12 11:52:47,073 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.10.38.29:9000. Already tried 9 time(s).
2012-01-12 11:52:47,078 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to /10.10.38.29:9000 failed on local exception: java.net.NoRouteToHostException: No route to host
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1139)
at org.apache.hadoop.ipc.Client.call(Client.java:1107)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
at $Proxy4.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:342)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:317)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:297)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:344)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:280)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1533)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1473)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1491)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1616)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1626)
Caused by: java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:425)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:532)
at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:210)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1244)
at org.apache.hadoop.ipc.Client.call(Client.java:1075)
... 13 more
因为不清楚引起的原因,检查了很多遍配置文件和hosts 发现都正确,百度了下说是关闭防火墙
在各机器执行命令:
service iptables stop 关闭防火墙
再次启动!
错误改变,变为如下提示:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = hadooptest3/10.10.36.218
STARTUP_MSG: args = []
STARTUP_MSG: version = 0.20.2-cdh3u0
STARTUP_MSG: build = -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14; compiled by 'hudson' on Fri Mar 25 19:56:23 PDT 2011
************************************************************/
2012-01-12 15:38:34,378 INFO org.apache.hadoop.security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing.
2012-01-12 15:38:34,647 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /home/hadoop/dfs/data: namenode namespaceID = 908563396; datanode namespaceID = 2085669283
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:233)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:148)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:373)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:280)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1533)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1473)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1491)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1616)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1626
网上找了下 说这是个一个经典错误,原因为namenode和datanode的namespaceID 不一致导致的
我的原因起因是:datenode节点做过别的namenode的节点,所以保存的namespaceID 与新格式化的不一致
解决办法:这个问题的解决要感谢
http://forum.hadoop.tw/viewtopic.php?f=4&t=43
第一种:
修改 datanode 的 namespaceID
编辑每台 datanode 的 /tmp/hadoop/hadoop-root/dfs/data/current/VERSION 把
namespaceID=2085669283
改成
namespaceID=908563396
第二种:
修改namenode的 namespaceID
/tmp/hadoop/hadoop-root/dfs/name/current/VERSION
namespaceID=908563396
改成
namespaceID=2085669283
我这样做了发现namenode的namespaceID根本就改不了
采用第一种办法后:
重启hadoop 重启成功!
注:写的有些匆忙 回头我再补充下。欢迎批评指正。