Docker搭建Spark集群,你學(xué)會(huì)了嗎?
docker搭建spark集群
有個(gè)小技巧:先配置好一個(gè),在(宿主機(jī)上)復(fù)制scp -r拷貝Spark到其他Slaves。
1.安裝配置基礎(chǔ)Spark
【在test-cluster-hap-master-01虛擬主機(jī)上】
將已下載好的Spark壓縮包(spark-3.1.1-bin-hadoop-3.2.2-lbx-jszt.tgz)通過(guò)工具【XFtp】拷貝到虛擬主機(jī)的opt目錄下:
2.通過(guò)腳本掛起鏡像
cd   /opt/script/setup/sparktest-cluster-spk-master-01
#!/bin/bash 
#編寫(xiě)作者:千羽的編程時(shí)光
cname="test-cluster-spk-master-01"
#port1="8080"
#port2="7077"
log="/opt/data/"${cname}
images="10.249.0.137:80/base/jdk-1.8:20210202"
mkdir -p ${log}
mkdir ${log}/logs
mkdir ${log}/work
mkdir ${log}/data
mkdir ${log}/jars
# docker run -d --net=overlay-net --ip ${ip} -p ${port1}:${port1} -p ${port2}:${port2} --name ${cname} --hostname ${cname} --privileged=true --restart=always 
docker run -d --net=host --name ${cname} --hostname ${cname} --privileged=true --restart=always \
-v ${log}/logs:/usr/local/spark-3.1.1/logs \
-v ${log}/work:/usr/local/spark-3.1.1/work \
-v ${log}/jars:/usr/local/spark-3.1.1/jars \
-v ${log}/data:/opt/data \
${images} \
/usr/sbin/inittest-cluster-spk-master-02
#!/bin/bash 
cname="test-cluster-spk-master-02"
#port1="8080"
#port2="7077"
log="/opt/data/"${cname}
images="10.249.0.137:80/base/jdk-1.8:20210202"
mkdir -p ${log}
mkdir ${log}/logs
mkdir ${log}/work
mkdir ${log}/data
mkdir ${log}/jars
#docker run -d --net=overlay-net --ip ${ip} -p ${port1}:${port1} -p ${port2}:${port2} --name ${cname} --hostname ${cname} --privileged=true --restart=always 
docker run -d --net=host --name ${cname} --hostname ${cname} --privileged=true --restart=always \
-v ${log}/logs:/usr/local/spark-3.1.1/logs \
-v ${log}/work:/usr/local/spark-3.1.1/work \
-v ${log}/jars:/usr/local/spark-3.1.1/jars \
-v ${log}/data:/opt/data \
${images} \
/usr/sbin/inittest-cluster-spk-slave-01
#!/bin/bash 
cname="test-cluster-spk-slave-01"
#port1="8080"
#port2="7077"
log="/opt/data/"${cname}
images="10.249.0.137:80/base/jdk-1.8:20210202"
mkdir -p ${log}
mkdir ${log}/logs
mkdir ${log}/work
mkdir ${log}/data
mkdir ${log}/jars
#docker run -d --net=overlay-net --ip ${ip} -p ${port1}:${port1} -p ${port2}:${port2} --name ${cname} --hostname ${cname} --privileged=true --restart=always 
docker run -d --net=host --name ${cname} --hostname ${cname} --privileged=true --restart=always \
-v ${log}/logs:/usr/local/spark-3.1.1/logs \
-v ${log}/work:/usr/local/spark-3.1.1/work \
-v ${log}/jars:/usr/local/spark-3.1.1/jars \
-v ${log}/data:/opt/data \
${images} \
/usr/sbin/init[root@zookeeper-03-test spark]# ll
總用量 4
-rw-r--r--. 1 root root 1166 7月  28 17:44 install.sh
[root@zookeeper-03-test spark]# chmod +x install.sh 
[root@zookeeper-03-test spark]# ll
總用量 4
-rwxr-xr-x. 1 root root 1166 7月  28 17:44 install.sh
[root@zookeeper-03-test spark]#3.上傳spark安裝包
在容器映射目錄下 :/opt/data/test-cluster-spk-slave-01/data
[root@hadoop-01 data]# pwd
/opt/data用Xftp上傳包

這里需要上傳兩個(gè),使用的是spark-3.1.1-bin-without-hadoop.tgz
但是需要將spark-3.1.1-bin-hadoop-3.2.2-lbx-jszt下的jars包移到/usr/local/spark-3.1.1/jars下
4.解壓安裝包
mkdir -p /usr/local/spark-3.1.1
cd /opt/data
tar -zxvf spark-3.1.1-bin-without-hadoop.tgz -C /usr/local/spark-3.1.1


編輯全局變量
vim /etc/profile增加以下全局變量
export SPARK_HOME=/usr/local/spark-3.1.1   
export PATH=$PATH:$SPARK_HOME/bin即時(shí)生效
source /etc/profile5.配置spark-env.sh
cd /usr/local/spark-3.1.1/conf
cp spark-env.sh.template spark-env.sh
vim spark-env.shexport SPARK_MASTER_IP=test-cluster-spk-master-01
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=800m
#export SPARK_DRIVER_MEMORY=4g
export SPARK_EXECUTOR_INSTANCES=2
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export SPARK_LOCAL_DIRS=/home/hadoop/tmp/spark/tmp
#定時(shí)清理worker文件 一天一次
export SPARK_WORKER_OPTS="  
-Dspark.worker.cleanup.enabled=true  
-Dspark.worker.cleanup.interval=86400 
-Dspark.worker.cleanup.appDataTtl=86400"
export JAVA_HOME=/usr/local/jdk1.8
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SCALA_HOME=/usr/local/scala
export PATH=${SCALA_HOME}/bin:$PATH
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=zookeeper-01-test:2181,zookeeper-02-test:2181,zookeeper-03-test:2181 -Dspark.deploy.zookeeper.dir=/usr/local/spark"(4)配置workers
cp workers.template workers
vim workers
# 添加
test-cluster-spk-slave-001
6.配置log4j.properties
cp log4j.properties.template log4j.properties
vim log4j.properties
log4j.rootCategory=WARN, console
7.復(fù)制到其他slave
(宿主機(jī)上)復(fù)制scp -r拷貝Spark到其他Slaves節(jié)點(diǎn):
scp -r /usr/local/spark/spark-2.1.0-bin-hadoop2.7 root@slave-001-spark-dev:/usr/local/spark/
scp -r /usr/local/spark/spark-2.1.0-bin-hadoop2.7 root@slave-002-spark-dev:/usr/local/spark/
scp -r /usr/local/spark/spark-2.1.0-bin-hadoop2.7 root@slave-003-spark-dev:/usr/local/spark/如執(zhí)行命令出現(xiàn)出現(xiàn)問(wèn)題時(shí),請(qǐng)現(xiàn)在相應(yīng)的Slave節(jié)點(diǎn)執(zhí)行mkdir -p /usr/local/spark
復(fù)制到master-02時(shí),使用start-mater.sh啟動(dòng)master-02
8.啟動(dòng)spark
- 先啟動(dòng)兩個(gè)master,然后啟動(dòng)slave節(jié)點(diǎn)
 
[root@test-cluster-spk-master-01 sbin]# ./start-master.sh 
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark-3.1.1/logs/spark-root-org.apache.spark.deploy.master.Master-1-test-cluster-spk-master-01.out
[root@test-cluster-spk-master-01 sbin]# jps
548 Jps
492 Master
[root@test-cluster-spk-master-01 sbin]# pwd
/usr/local/spark-3.1.1/sbin
[root@test-cluster-spk-master-01 sbin]#- 主節(jié)點(diǎn)2啟動(dòng)完成
 
[root@test-cluster-spk-master-02 sbin]# ./start-master.sh 
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark-3.1.1/logs/spark-root-org.apache.spark.deploy.master.Master-1-test-cluster-spk-master-02.out
[root@test-cluster-spk-master-02 sbin]# pwd
/usr/local/spark-3.1.1/sbin
[root@test-cluster-spk-master-02 sbin]# jps
274 Jps
218 Master
[root@test-cluster-spk-master-02 sbin]#- 從節(jié)點(diǎn)啟動(dòng)完成
 
/usr/local/spark-3.1.1/sbin/start-slave.sh test-cluster-hap-slave-001 test-cluster-hap-master-02:7077,test-cluster-hap-master-02:7077- 主節(jié)點(diǎn)1啟動(dòng)完成
 
9.驗(yàn)證
原本是訪問(wèn)http://10.8.46.35:8080 就可,但是我這里在配置鏡像的時(shí)候,多了8080,導(dǎo)致這里訪問(wèn)不了。看日志可以知道,已經(jīng)走向8081
所以http://10.8.46.35:8081/即可
主節(jié)點(diǎn)1  | 停掉主節(jié)點(diǎn)  | 
  | 
 
  | 
主節(jié)點(diǎn)2  | 從節(jié)點(diǎn)成為ALIVE  | 
  | 
  | 
從節(jié)點(diǎn)1  | 從節(jié)點(diǎn)1  | 
  | 
  | 
10.遇到的坑
包不兼容
這里遇到了許多問(wèn)題,第一個(gè)是包不兼容,導(dǎo)致搭建兩次失敗

然后換了官方的包spark-3.1.1-bin-without-hadoop,啟動(dòng)還是有問(wèn)題。
最后通過(guò)替換jars才成功。(使用spark-3.1.1-bin-hadoop-3.2.2-lbx-jszt下的jars)
ctrl + p + q 從容器退出到宿主機(jī)
done~






















 
 
 



















 
 
 
 