基于 docker 搭建 hadoop 跨主机集群

qq_40487325 2017-10-18 09:43:56
摘要:本文是基于 docker 17.09.0 搭建的 hadoop 2.7.2 分布式跨主机集群,集群规模为一个 master 和两个 salve,一共使用三台物理主机(两台或者多台物理机均可模拟),集群网络使用的是 docker swarm 搭建。

备注:中文社区中相关资料极少,相关资料请直接翻阅官方文档

转载请注明出处: http://blog.arthurpapa.cn/articles/2017/10/17/1508232763465.html

环境介绍

环境要求
1. 操作系统:centos7
2. 物理机(或虚拟机):两台或两台以上,本文使用三台
3. 网络:物理机需在同一子网中(最好给它分配固定ip),验证方式:互相ping是否能够ping通
4. 均安装了docker 17.X 版本(要求支持swarm)
本文环境介绍
1. 物理机
hostname ip
master 192.168.0.150
slave1 192.168.0.151
slave2 192.168.0.152
以下为具体配置过程

1. 下载 hadoop 镜像

master 下下载镜像。slave 机子不需要下载,会在后续 swarm 网络中自动分发。
本文使用的 hadoop 镜像为 kiwenlau/hadoop:1.0,相关介绍点此处。
[root@master ~]# docker pull kiwenlau/hadoop:1.0
2. 建立 Swarm 网络

在 master 上运行以下命令,建立以 master 为主节点的 swarm 网络。注意,ip 改为你自己网络中 master 的 ip。
[root@master ~]# docker swarm init --advertise-addr 192.168.1.150
运行之后会有如下信息提示

Swarm initialized: current node (ytxpz5xrzujbkmil0geiacepr) is now a manager.

To add a worker to this swarm, run the following command:

docker swarm join --token SWMTKN-1-3fbqppvsbpkg461lkeddkwjtng8ee0ufnuvfq1h8zzmhv3v54x-d4hdzdb997g6ljv4lga0r3xro 192.168.1.150:237

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
3. 子节点加入 Swarm 网络

进入 slave1 中,运行如下命令:

[root@slave1 ~]# docker swarm join --token SWMTKN-1-3fbqppvsbpkg461lkeddkwjtng8ee0ufnuvfq1h8zzmhv3v54x-d4hdzdb997g6ljv4lga0r3xro 192.168.1.150:237
同样进入 slave2 中,运行相同命令

[root@slave2 ~]# docker swarm join --token SWMTKN-1-3fbqppvsbpkg461lkeddkwjtng8ee0ufnuvfq1h8zzmhv3v54x-d4hdzdb997g6ljv4lga0r3xro 192.168.1.150:237
这样,节点 slave1 slave2 就加入了 master 的 swarm 网络了。其中运行的命令即为第二步中创建完网络提示的信息。

4. 创建一个专用网络

## 运行如下命令,查看现由网络。可以看到只有一个swarm网络ingress。这个是默认网络,我们一般不会直接使用
[root@master ~]# docker network ls
NETWORK ID NAME DRIVER SCOPE
8b78fc47d7fb bridge bridge local
9570aad7f0e0 docker_gwbridge bridge local
cf0d62f00408 host host local
xdelmzi55ifr ingress overlay swarm
90c51fe9392a none null local

## 创建一个专用网络
[root@master ~]# docker network create --opt encrypted --driver overlay --attachable hadoop

## 创建完成后可以看到多出了一个hadoop网络
[root@master ~]# docker network ls
8b78fc47d7fb bridge bridge local
9570aad7f0e0 docker_gwbridge bridge local
iv1irku3ohf7 hadoop overlay swarm
cf0d62f00408 host host local
xdelmzi55ifr ingress overlay swarm
90c51fe9392a none null local
5. 启动容器服务

[root@master ~]# docker service create -t --name hadoop-master --hostname hadoop-master --network hadoop --detach=false --replicas 1 --publish mode=host,target=8088,published=8088,protocol=tcp --publish mode=host,target=50070,published=50070,protocol=tcp kiwenlau/hadoop:1.0
[root@master ~]# docker service create -t --name hadoop-slave1 --hostname hadoop-slave1 --network hadoop --detach=false --replicas 1 kiwenlau/hadoop:1.0
[root@master ~]# docker service create -t --name hadoop-slave2 --hostname hadoop-slave2 --network hadoop --detach=false --replicas 1 kiwenlau/hadoop:1.0
## 启动成功后查看结果

[root@master ~]# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
p79vetbllisg hadoop-master replicated 1/1 kiwenlau/hadoop:1.0
8d9ll46cf7p2 hadoop-slave1 replicated 1/1 kiwenlau/hadoop:1.0
ghp3ffhnlbf3 hadoop-slave2 replicated 1/1 kiwenlau/hadoop:1.0

## 查看服务情况 可以观察到三个服务已经被分配到了不同的服务器上。这样容器就启动成功了。
[root@master ~]# docker service ps p79vetbllisg 8d9ll46cf7p2 ghp3ffhnlbf3
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
oml7l4xt0ay7 hadoop-slave2.1 kiwenlau/hadoop:1.0 slave2 Running Running 1 minutes ago
gva3i0ufqund hadoop-slave1.1 kiwenlau/hadoop:1.0 slave1 Running Running 1 minutes ago
1rn1f8jwv8gp hadoop-master.1 kiwenlau/hadoop:1.0 msater Running Running 1 minutes ago *:8088->8088/tcp,*:50070->50070/tcp
6. 启动 hadoop

观察上一步我们发现, hadoop-master 容器启动在 master 主机上。我们进入到 master。

[root@master ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8282e646d559 kiwenlau/hadoop:1.0 "sh -c 'service ss..." 2 hours ago Up 2 hours 0.0.0.0:8088->8088/tcp, 0.0.0.0:50070->50070/tcp hadoop-master.1.1rn1f8jwv8gpmrb5i67ijysff
########## 进入容器
[root@master ~]# docker exec -it 8282e646d559 bash
########## 进入容器后的当前目录下可以看到已经有如下脚本
root@hadoop-master:~# ls
hdfs input run-wordcount.sh start-hadoop.sh
########## ping 下两个容器,看看是否能够ping通
root@hadoop-master:~# ping hadoop-slave1
root@hadoop-master:~# ping hadoop-slave2

########## 分别测试两个容器ssh是否正常,能够正常ssh登录则表明正常。 如果出现time out异常,请查看本文最后的异常分析。
root@hadoop-master:~# ssh hadoop-slave1
root@hadoop-slave1:~# exit
root@hadoop-master:~# ssh hadoop-slave2
root@hadoop-slave2:~# exit

########## 启动hadoop
root@hadoop-master:~# ./start-hadoop.sh
Starting namenodes on [hadoop-master]
hadoop-master: Warning: Permanently added 'hadoop-master,10.0.0.3' (ECDSA) to the list of known hosts.
hadoop-master: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-hadoop-master.out
hadoop-slave2: Warning: Permanently added 'hadoop-slave2,10.0.0.6' (ECDSA) to the list of known hosts.
hadoop-slave1: Warning: Permanently added 'hadoop-slave1,10.0.0.4' (ECDSA) to the list of known hosts.
hadoop-slave1: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-hadoop-slave1.out
hadoop-slave2: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-hadoop-slave2.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-hadoop-master.out

starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn--resourcemanager-hadoop-master.out
hadoop-slave1: Warning: Permanently added 'hadoop-slave1,10.0.0.4' (ECDSA) to the list of known hosts.
hadoop-slave2: Warning: Permanently added 'hadoop-slave2,10.0.0.6' (ECDSA) to the list of known hosts.
hadoop-slave1: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-hadoop-slave1.out
hadoop-slave2: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-hadoop-slave2.out

########## 跑个word count测试下
root@hadoop-master:~# ./start-hadoop.sh
17/10/17 04:17:37 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/10.0.0.3:8032

....

input file1.txt:
Hello Hadoop

input file2.txt:
Hello Docker

wordcount output:
Docker 1
Hadoop 1
Hello 2
自此,使用 docker 的跨主机的 hadoop 集群搭建完成。

参考:Docker swarm mode overlay network security model
异常分析

1. swarm 网络中的容器能够 ping 通但是无法互相 ssh 登录。

问题描述:笔者在搭建过程中碰到了这个问题,docker 容器 hadoop-master 和 hadoop-slave1,hadoop-slave2 在一个 swarm 网络中,能够互相 ping 通,但是在 ssh 登录的时候出现connection time out异常,等了很久最后连接超时,也没有报其他问题。笔者在碰到这个问题的时候,找到的原因是物理主机 slave1,slave2 的防火墙没有关,直接截拦了对容器内部的 ssh 访问。

解决方案:

######进入slave1, slave2物理主机,都运行如下两条命令
# 关闭防火墙
[root@slave1 ~]# systemctl stop firewalld.service
# 关闭开机自启防火墙
[root@slave1 ~]# systemctl disable firewalld.service
...全文
1342 2 打赏 收藏 转发到动态 举报
写回复
用AI写文章
2 条回复
切换为时间正序
请发表友善的回复…
发表回复
qq_26097713 2018-04-18
  • 打赏
  • 举报
回复
走到最后,运行run-wordcount.sh ,报下面的错是怎么回事 mkdir: Call From gshy/10.0.0.3 to hadoop-master:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused put: Call From gshy/10.0.0.3 to hadoop-master:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused 18/04/18 08:37:21 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/10.0.0.2:8032 Exception in thread "main" java.net.ConnectException: Call From gshy/10.0.0.3 to hadoop-master:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732) at org.apache.hadoop.ipc.Client.call(Client.java:1479) at org.apache.hadoop.ipc.Client.call(Client.java:1412) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424) at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:145) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:266) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308) at org.apache.hadoop.examples.WordCount.main(WordCount.java:87) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712) at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528) at org.apache.hadoop.ipc.Client.call(Client.java:1451) ... 34 more input file1.txt: cat: Call From gshy/10.0.0.3 to hadoop-master:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused input file2.txt: cat: Call From gshy/10.0.0.3 to hadoop-master:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused wordcount output: cat: Call From gshy/10.0.0.3 to hadoop-master:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
小9 2017-10-29
  • 打赏
  • 举报
回复
什么情况?

20,808

社区成员

发帖
与我相关
我的任务
社区描述
Hadoop生态大数据交流社区,致力于有Hadoop,hive,Spark,Hbase,Flink,ClickHouse,Kafka,数据仓库,大数据集群运维技术分享和交流等。致力于收集优质的博客
社区管理员
  • 分布式计算/Hadoop社区
  • 涤生大数据
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧