基于ZStack構(gòu)建深度學(xué)習(xí)云平臺

作者：朱天順 2019-02-14 14:44:48

本文將介紹基于產(chǎn)品化云平臺——ZStack，來構(gòu)建對初學(xué)者友好、易運(yùn)維、易使用的深度學(xué)習(xí)云平臺。

前言

深度學(xué)習(xí)是機(jī)器學(xué)習(xí)和人工智能研究的熱門分支，也是當(dāng)今最流行的科學(xué)研究趨勢之一。深度學(xué)習(xí)方法為計算機(jī)視覺、機(jī)器學(xué)習(xí)帶來了革命性的進(jìn)步，而新的深度學(xué)習(xí)技術(shù)也正在不斷誕生。由于深度學(xué)習(xí)正快速發(fā)展，新的研究者很難對這一技術(shù)實(shí)時跟進(jìn)。國內(nèi)各大公有云廠商都提供了相應(yīng)的深度學(xué)習(xí)相關(guān)產(chǎn)品，但對于初學(xué)者并不那么實(shí)用。本文將介紹基于產(chǎn)品化云平臺——ZStack，來構(gòu)建對初學(xué)者友好、易運(yùn)維、易使用的深度學(xué)習(xí)云平臺。

由于ZStack的輕量性，我們僅通過一臺普通PC機(jī)就能部署云平臺，進(jìn)而實(shí)現(xiàn)深度學(xué)習(xí)平臺構(gòu)建。讀者可結(jié)合本文輕松擴(kuò)展出規(guī)模更大、功能更為完備的深度學(xué)習(xí)云平臺。

1 、ZStack簡介

ZStack是下一代開源的云計算IaaS（基礎(chǔ)架構(gòu)即服務(wù)）軟件。它主要面向未來的智能數(shù)據(jù)中心，通過提靈活完善的APIs來管理包括計算、存儲和網(wǎng)絡(luò)在內(nèi)的數(shù)據(jù)中心資源。用戶可以利用ZStack快速構(gòu)建自己的智能云數(shù)據(jù)中心，也可以在穩(wěn)定的ZStack之上搭建靈活的云應(yīng)用場景。

ZStack功能架構(gòu)

ZStack產(chǎn)品優(yōu)勢：

ZStack是基于專有云平臺4S（Simple簡單，Strong健壯，Scalable彈性，Smart智能）標(biāo)準(zhǔn)設(shè)計的下一代云平臺IaaS軟件。

1. 簡單（Simple）

• 簡單安裝部署：提供安裝文件網(wǎng)絡(luò)下載，30分鐘完成從裸機(jī)到云平臺的安裝部署。

• 簡單搭建云平臺：支持云主機(jī)的批量（生成，刪除等）操作，提供列表展示和滑窗詳情。

• 簡單實(shí)用操作：詳細(xì)的用戶手冊，足量的幫助信息，良好的社區(qū)，標(biāo)準(zhǔn)的API提供。

• 友好UI交互：設(shè)計精良的專業(yè)操作界面，精簡操作實(shí)現(xiàn)強(qiáng)大的功能。

2. 健壯（Strong）

• 穩(wěn)定且高效的系統(tǒng)架構(gòu)設(shè)計：擁有全異步的后臺架構(gòu)，進(jìn)程內(nèi)微服務(wù)架構(gòu)，無鎖架構(gòu)，無狀態(tài)服務(wù)架構(gòu)，一致性哈希環(huán)，保證系統(tǒng)架構(gòu)的高效穩(wěn)定。目前已實(shí)現(xiàn)：單管理節(jié)點(diǎn)管理上萬臺物理主機(jī)、數(shù)十萬臺云主機(jī)；而多個管理節(jié)點(diǎn)構(gòu)建的集群使用一個數(shù)據(jù)庫、一套消息總線可管理十萬臺物理主機(jī)、數(shù)百萬臺云主機(jī)、并發(fā)處理數(shù)萬個API。

• 支撐高并發(fā)的API請求：單ZStack管理節(jié)點(diǎn)可以輕松處理每秒上萬個并發(fā)API調(diào)用請求。

• 支持HA的嚴(yán)格要求：在網(wǎng)絡(luò)或節(jié)點(diǎn)失效情況下，業(yè)務(wù)云主機(jī)可自動切換到其它健康節(jié)點(diǎn)運(yùn)行；利用管理節(jié)點(diǎn)虛擬化實(shí)現(xiàn)了單管理節(jié)點(diǎn)的高可用，故障時支持管理節(jié)點(diǎn)動態(tài)遷移。

3. 彈性（Scalable）

• 支撐規(guī)模無限制：單管理節(jié)點(diǎn)可管理從一臺到上萬臺物理主機(jī)，數(shù)十萬臺云主機(jī)。

• 全API交付：ZStack提供了全套IaaS API，用戶可使用這些APIs完成全新跨地域的可用區(qū)域搭建、

網(wǎng)絡(luò)配置變更、以及物理服務(wù)器的升級。

• 資源可按需調(diào)配：云主機(jī)和云存儲等重要資源可根據(jù)用戶需求進(jìn)行擴(kuò)縮容。ZStack不僅支持對云主

機(jī)的CPU、內(nèi)存等資源進(jìn)行在線更改，還可對云主機(jī)的網(wǎng)絡(luò)帶寬、磁盤帶寬等資源進(jìn)行動態(tài)調(diào)整。

4. 智能（Smart）

• 自動化運(yùn)維管理：在ZStack環(huán)境里，一切由APIs來管理。ZStack利用Ansible庫實(shí)現(xiàn)全自動部署和

升級，自動探測和重連，在網(wǎng)絡(luò)抖動或物理主機(jī)重啟后能自動回連各節(jié)點(diǎn)。其中定時任務(wù)支持定時

開關(guān)云主機(jī)以及定時對云主機(jī)快照等輪詢操作。

• 在線無縫升級：5分鐘一鍵無縫升級，用戶只需升級管控節(jié)點(diǎn)。計算節(jié)點(diǎn)、存儲節(jié)點(diǎn)、網(wǎng)絡(luò)節(jié)點(diǎn)在

管控軟件啟動后自動升級。

• 智能化的UI交互界面：實(shí)時的資源計算，避免用戶誤操作。

• 實(shí)時的全局監(jiān)控：實(shí)時掌握整個云平臺當(dāng)前系統(tǒng)資源的消耗情況，通過實(shí)時監(jiān)控，智能化調(diào)配，從

而節(jié)省IT的軟硬件資源。

0x2 構(gòu)建深度學(xué)習(xí)平臺

2.1 組件部署介紹

TensorFlow

是一個開放源代碼軟件庫，用于進(jìn)行高性能數(shù)值計算。借助其靈活的架構(gòu)，用戶可以輕松地將計算工作部署到多種平臺（CPU、GPU、TPU）和設(shè)備（桌面設(shè)備、服務(wù)器集群、移動設(shè)備、邊緣設(shè)備等）。TensorFlow最初是由 Google Brain 團(tuán)隊(duì)中的研究人員和工程師開發(fā)的，可為機(jī)器學(xué)習(xí)和深度學(xué)習(xí)提供強(qiáng)力支持，并且其靈活的數(shù)值計算核心廣泛應(yīng)用于許多其他科學(xué)領(lǐng)域。

cuDNN

NVIDIA CUDA深層神經(jīng)網(wǎng)絡(luò)庫（cuDNN）是一種用于深層神經(jīng)網(wǎng)絡(luò)的GPU加速庫原始圖形。cuDNN為標(biāo)準(zhǔn)例程提供了高度調(diào)優(yōu)的實(shí)現(xiàn)，如前向和后向卷積、池化、歸一化和激活層。cuDNN是NVIDIA深度學(xué)習(xí)SDK的一部分。

TensorBoard

是一個可視化工具，能夠有效地展示Tensorflow在運(yùn)行過程中的計算圖、各種指標(biāo)隨著時間的變化趨勢以及訓(xùn)練中使用到的數(shù)據(jù)信息。

Jupyter

Jupyter是一個交互式的筆記本，可以很方便地創(chuàng)建和共享文學(xué)化程序文檔，支持實(shí)時代碼，數(shù)學(xué)方程，可視化和 markdown。一般用與做數(shù)據(jù)清理和轉(zhuǎn)換，數(shù)值模擬，統(tǒng)計建模，機(jī)器學(xué)習(xí)等等。

2.2 云平臺環(huán)境準(zhǔn)備

環(huán)境介紹

本次使用如下配置構(gòu)建深度學(xué)習(xí)平臺：

物理服務(wù)器配置

GPU型號

云主機(jī)配置

云主機(jī)系統(tǒng)

IP地址

主機(jī)名

Intel(R) i5-3470 DDR3 24G

NVIDIA QuadroP2000

8vCPU16G

CentOS7.4

192.168.66.6

GPU-TF

本次使用一臺普通PC機(jī)部署ZStack云平臺，使用云平臺中GPU透傳功能將一塊NVIDIA QuadroP2000顯卡透傳給一個CentOS7.4虛擬機(jī)，進(jìn)行平臺的構(gòu)建。

ZStack云平臺部署步驟詳情參考官方文檔：https://www.zstack.io/help/product_manuals/user_guide/3.html#c3

2.2.1 創(chuàng)建云主機(jī)

選擇“云資源池”點(diǎn)擊“云主機(jī)”點(diǎn)擊“創(chuàng)建云主機(jī)按鈕”打開云主機(jī)創(chuàng)建頁面；

創(chuàng)建云主機(jī)的步驟：

1、選擇添加方式；平臺支持創(chuàng)建單個云主機(jī)和創(chuàng)建多個云主機(jī)，根據(jù)需求進(jìn)行選擇。

2、設(shè)置云主機(jī)名稱；在設(shè)置名稱時建議以業(yè)務(wù)系統(tǒng)名稱進(jìn)行命名，方便管理運(yùn)維。

3、選擇計算規(guī)格；根據(jù)定義的計算規(guī)格結(jié)合業(yè)務(wù)需求選擇適合的計算規(guī)格。

4、選擇鏡像模板；根據(jù)業(yè)務(wù)需求選擇相應(yīng)的鏡像模板。

5、選擇三層網(wǎng)絡(luò)；在新版本中平臺三層網(wǎng)絡(luò)同時支持IPv4和IPv6，請根據(jù)自身業(yè)務(wù)需求進(jìn)行選擇；同時也可以在創(chuàng)建云主機(jī)過程中設(shè)置網(wǎng)卡屬性。

6、確認(rèn)配置無誤后點(diǎn)擊“確定”開始創(chuàng)建。

2.2.2 透傳GPU操作

點(diǎn)擊云主機(jī)名稱點(diǎn)擊配置信息；

找到GPU設(shè)備標(biāo)簽，點(diǎn)擊操作選擇加載，然后選擇相應(yīng)的GPU設(shè)備給云主機(jī)直接使用。

0x3 開始部署

3.1 運(yùn)行環(huán)境準(zhǔn)備

安裝pip
# curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
# python get-pip.py
# pip --version
pip 18.1 from /usr/lib/python2.7/site-packages/pip (python 2.7)
# python --version
Python 2.7.5
安裝GCC G++
# yum install gcc gcc-c++
# gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36)
安裝一些需要的包
#yum -y install zlib*
#yum install openssl-devel -y
#yum install sqlite* -y
升級CentOS默認(rèn)Python2.7.5版本到3.6.5
下載Python源碼包
# wget -c https://www.python.org/ftp/python/3.6.5/Python-3.6.5.tgz
解壓源碼包
# tar -zvxf Python-3.6.5.tgz
進(jìn)入源碼目錄
# cd Python-3.6.5/
# ./configure --with-ssl
編譯并安裝
# make && make install
查看一下新安裝的python3的文件位置
# ll /usr/local/bin/python*

設(shè)置python默認(rèn)版本號為3.x
# mv /usr/bin/python /usr/bin/python.bak
# ln -s /usr/local/bin/python3 /usr/bin/python
查看一下2.x版本的文件位置
# ll /usr/bin/python*

為使yum命令正常使用，需要將其配置的python依然指向2.x版本 
 
# vim /usr/bin/yum 
 
#vim /usr/libexec/urlgrabber-ext-down 
 
將上面兩個文件的頭部文件修改為老版本即可  
!/usr/bin/python --> !/usr/bin/python2.7 
 
安裝python-dev、python-pip 
 
# yum install python-dev python-pip -y 
 
禁用自帶Nouveau驅(qū)動 
 
Nouveau使用 
 
# lsmod | grep nouveau 
 
nouveau 1662531 0 
 
mxm_wmi 13021 1 nouveau 
 
wmi 19086 2 mxm_wmi,nouveau 
 
video 24538 1 nouveau 
 
i2c_algo_bit 13413 1 nouveau 
 
drm_kms_helper 176920 2 qxl,nouveau 
 
ttm 99555 2 qxl,nouveau 
 
drm 397988 5 qxl,ttm,drm_kms_helper,nouveau 
 
i2c_core 63151 5 drm,i2c_piix4,drm_kms_helper,i2c_algo_bit,nouveau 
 
#vim /usr/lib/modprobe.d/dist-blacklist.conf 
 
# nouveau 
 
blacklist nouveau 
 
options nouveau modeset=0 
 
:wq 保存退出 
 
# mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak 備份引導(dǎo)鏡像 
 
# dracut /boot/initramfs-$(uname -r).img $(uname -r) 重建引導(dǎo)鏡像 
 
# reboot 
 
#lsmod | grep nouveau 再次驗(yàn)證禁用是否生效

3.2 安裝CUDA

升級內(nèi)核：  
  
# rpm -import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org  
  
# rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm  
  
# yum -y --enablerepo=elrepo-kernel install kernel-ml.x86_64 kernel-ml-devel.x86_64  
  
查看內(nèi)核版本默認(rèn)啟動順序：  
  
awk -F\' '$1=="menuentry " {print $2}' /etc/grub2.cfg  
  
CentOS Linux (4.20.0-1.el7.elrepo.x86_64) 7 (Core)  
  
CentOS Linux (3.10.0-862.el7.x86_64) 7 (Core)  
  
CentOS Linux (0-rescue-c4581dac5b734c11a1881c8eb10d6b09) 7 (Core)  
  
#vim /etc/default/grub  
  
GRUB_DEFAULT=saved 改為GRUB_0=saved  
  
運(yùn)行g(shù)rub2-mkconfig命令來重新創(chuàng)建內(nèi)核配置  
  
# grub2-mkconfig -o /boot/grub2/grub.cfg  
  
#reboot  
  
# uname -r 重啟后驗(yàn)證一下內(nèi)核版本  
  
4.20.0-1.el7.elrepo.x86_64  
  
CUDA Toolkit安裝有兩種方式：  
  
Package安裝 (RPM and Deb packages)  
Runfile安裝  
這里選擇使用Runfile模式進(jìn)行安裝  
  
安裝包下載：https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux  
 
根據(jù)自身操作系統(tǒng)進(jìn)行安裝包篩選，并下載。復(fù)制下載鏈接直接用wget -c命令進(jìn)行下載  
  
# wget -c https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux  
  
#chmod +x cuda_10.0.130_410.48_linux  
  
#./cuda_10.0.130_410.48_linux  
  
Do you accept the previously read EULA?  
  
accept/decline/quit: accept  
  
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48?  
  
(y)es/(n)o/(q)uit: y  
  
Install the CUDA 10.0 Toolkit?  
  
(y)es/(n)o/(q)uit: y  
  
Enter Toolkit Location  
  
[ default is /usr/local/cuda-10.0 ]:  
  
Do you want to install a symbolic link at /usr/local/cuda?  
  
(y)es/(n)o/(q)uit: y  
  
Install the CUDA 10.0 Samples?  
  
(y)es/(n)o/(q)uit: y  
  
Enter CUDA Samples Location  
  
[ default is /root ]:  
  
配置CUDA運(yùn)行環(huán)境變量：  
  
# vim /etc/profile  
  
# CUDA  
  
export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}  
  
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}  
  
# source /etc/profile  
  
檢查版本  
  
# nvcc --version  
  
nvcc: NVIDIA (R) Cuda compiler driver  
  
Copyright (c) 2005-2018 NVIDIA Corporation  
  
Built on Sat_Aug_25_21:08:01_CDT_2018  
  
Cuda compilation tools, release 10.0, V10.0.130  
  
使用實(shí)例驗(yàn)證測試CUDA是否正常：  
  
#cd /root/NVIDIA_CUDA-10.0_Samples/1_Utilities/deviceQuery  
  
# make  
  
"/usr/local/cuda-10.0"/bin/nvcc -ccbin g++ -I../../common/inc -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o deviceQuery.o -c deviceQuery.cpp  
  
"/usr/local/cuda-10.0"/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o deviceQuery deviceQuery.o  
  
mkdir -p ../../bin/x86_64/linux/release  
  
cp deviceQuery ../../bin/x86_64/linux/release  
  
# cd ../../bin/x86_64/linux/release/  
  
# ./deviceQuery  
  
#./deviceQuery Starting...  
  
CUDA Device Query (Runtime API) version (CUDART static linking)  
  
Detected 1 CUDA Capable device(s)  
  
Device 0: "Quadro P2000"  
  
CUDA Driver Version / Runtime Version 10.0 / 10.0  
  
CUDA Capability Major/Minor version number: 6.1  
  
Total amount of global memory: 5059 MBytes (5304745984 bytes)  
  
( 8) Multiprocessors, (128) CUDA Cores/MP: 1024 CUDA Cores  
  
GPU Max Clock rate: 1481 MHz (1.48 GHz)  
  
Memory Clock rate: 3504 Mhz  
  
Memory Bus Width: 160-bit  
  
L2 Cache Size: 1310720 bytes  
  
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)  
  
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers  
  
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers  
  
Total amount of constant memory: 65536 bytes  
  
Total amount of shared memory per block: 49152 bytes  
  
Total number of registers available per block: 65536  
  
Warp size: 32  
  
Maximum number of threads per multiprocessor: 2048  
  
Maximum number of threads per block: 1024  
  
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)  
  
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)  
  
Maximum memory pitch: 2147483647 bytes  
  
Texture alignment: 512 bytes  
  
Concurrent copy and kernel execution: Yes with 2 copy engine(s)  
  
Run time limit on kernels: No  
  
Integrated GPU sharing Host Memory: No  
  
Support host page-locked memory mapping: Yes  
  
Alignment requirement for Surfaces: Yes  
  
Device has ECC support: Disabled  
  
Device supports Unified Addressing (UVA): Yes  
  
Device supports Compute Preemption: Yes  
  
Supports Cooperative Kernel Launch: Yes  
  
Supports MultiDevice Co-op Kernel Launch: Yes  
  
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 11  
  
Compute Mode:  
  
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >  
  
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 10.0, NumDevs = 1  
  
Result = PASS  
  
Result = PASS且測試過程中無報錯，表示測試通過！

3.3安裝 cuDNN

cuDNN的全稱為NVIDIA CUDA® Deep Neural Network library，是NVIDIA專門針對深度神經(jīng)網(wǎng)絡(luò)（Deep Neural Networks）中的基礎(chǔ)操作而設(shè)計基于GPU的加速庫。cuDNN為深度神經(jīng)網(wǎng)絡(luò)中的標(biāo)準(zhǔn)流程提供了高度優(yōu)化的實(shí)現(xiàn)方式。

下載安裝包：https://developer.nvidia.com/rdp/cudnn-download   
   
注：下載前需先注冊 NVIDIA Developer Program，然后才能下載。

可以根據(jù)自身的環(huán)境選擇相應(yīng)版本進(jìn)行下載，這個有身份驗(yàn)證只能瀏覽器下載然后再上傳到云主機(jī)中。  
  
安裝：  
  
#rpm -ivh libcudnn7-7.4.2.24-1.cuda10.0.x86_64.rpm libcudnn7-devel-7.4.2.24-1.cuda10.0.x86_64.rpm libcudnn7-doc-7.4.2.24-1.cuda10.0.x86_64.rpm  
  
準(zhǔn)備中... ################################# [100%]  
  
正在升級/安裝...  
  
1:libcudnn7-7.4.2.24-1.cuda10.0 ################################# [ 33%]  
  
2:libcudnn7-devel-7.4.2.24-1.cuda10################################# [ 67%]  
  
3:libcudnn7-doc-7.4.2.24-1.cuda10.0################################# [100%]  
  
驗(yàn)證cuDNN：  
  
# cp -r /usr/src/cudnn_samples_v7/ $HOME  
  
# cd $HOME/cudnn_samples_v7/mnistCUDNN  
  
# make clean && make  
  
rm -rf *o  
  
rm -rf mnistCUDNN  
  
/usr/local/cuda/bin/nvcc -ccbin g++ -I/usr/local/cuda/include -IFreeImage/include -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o fp16_dev.o -c fp16_dev.cu  
  
g++ -I/usr/local/cuda/include -IFreeImage/include -o fp16_emu.o -c fp16_emu.cpp  
  
g++ -I/usr/local/cuda/include -IFreeImage/include -o mnistCUDNN.o -c mnistCUDNN.cpp  
  
/usr/local/cuda/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o mnistCUDNN fp16_dev.o fp16_emu.o mnistCUDNN.o -I/usr/local/cuda/include -IFreeImage/include -LFreeImage/lib/linux/x86_64 -LFreeImage/lib/linux -lcudart -lcublas -lcudnn -lfreeimage -lstdc++ -lm  
  
# ./mnistCUDNN  
  
cudnnGetVersion() : 7402 , CUDNN_VERSION from cudnn.h : 7402 (7.4.2)  
  
Host compiler version : GCC 4.8.5  
  
There are 1 CUDA capable devices on your machine :  
  
device 0 : sms 8 Capabilities 6.1, SmClock 1480.5 Mhz, MemSize (Mb) 5059, MemClock 3504.0 Mhz, Ecc=0, boardGroupID=0  
  
Using device 0  
  
Testing single precision  
  
Loading image data/one_28x28.pgm  
  
Performing forward propagation ...  
  
Testing cudnnGetConvolutionForwardAlgorithm ...  
  
Fastest algorithm is Algo 1  
  
Testing cudnnFindConvolutionForwardAlgorithm ...  
  
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.036864 time requiring 0 memory  
  
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.044032 time requiring 3464 memory  
  
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.053248 time requiring 57600 memory  
  
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.116544 time requiring 207360 memory  
  
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.181248 time requiring 2057744 memory  
  
Resulting weights from Softmax:  
  
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000  
  
Loading image data/three_28x28.pgm  
  
Performing forward propagation ...  
  
Resulting weights from Softmax:  
  
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000  
  
Loading image data/five_28x28.pgm  
  
Performing forward propagation ...  
  
Resulting weights from Softmax:  
  
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006  
  
Result of classification: 1 3 5  
  
Test passed!  
  
Testing half precision (math in single precision)  
  
Loading image data/one_28x28.pgm  
  
Performing forward propagation ...  
  
Testing cudnnGetConvolutionForwardAlgorithm ...  
  
Fastest algorithm is Algo 1  
  
Testing cudnnFindConvolutionForwardAlgorithm ...  
  
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.032896 time requiring 0 memory  
  
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.036448 time requiring 3464 memory  
  
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.044000 time requiring 28800 memory  
  
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.115488 time requiring 207360 memory  
  
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.180224 time requiring 2057744 memory  
  
Resulting weights from Softmax:  
  
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001  
  
Loading image data/three_28x28.pgm  
  
Performing forward propagation ...  
  
Resulting weights from Softmax:  
  
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000  
  
Loading image data/five_28x28.pgm  
  
Performing forward propagation ...  
  
Resulting weights from Softmax:  
  
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006  
  
Result of classification: 1 3 5  
  
Test passed!  
  
Test passed!且測試過程中無報錯，表示測試通過！

3.4安裝 TensorFlow

# pip3 install --upgrade setuptools==30.1.0 
 
# pip3 install tf-nightly-gpu 
 
驗(yàn)證測試： 
 
在 Python 交互式 shell 中輸入以下幾行簡短的程序代碼： 
 
# python  
import tensorflow as tf  
hello = tf.constant('Hello, TensorFlow!')  
sess = tf.Session()  
print(sess.run(hello)) 
 
如果系統(tǒng)輸出以下內(nèi)容，就說明您可以開始編寫 TensorFlow 程序了： 
 
Hello, TensorFlow! 
 
同時使用nvidia-smi命令可以看到當(dāng)前顯卡的處理任務(wù)。

3.5 安裝TensorBoard 可視化工具

可以用 TensorBoard 來展現(xiàn) TensorFlow 圖，繪制圖像生成的定量指標(biāo)圖以及顯示附加數(shù)據(jù)（如其中傳遞的圖像）。通過 pip 安裝 TensorFlow 時，也會自動安裝 TensorBoard：

驗(yàn)證版本：
# pip3 show tensorboard
Name: tensorboard
Version: 1.12.2
Summary: TensorBoard lets you watch Tensors Flow
Home-page: https://github.com/tensorflow/tensorboard
Author: Google Inc.
Author-email: opensource@google.com
License: Apache 2.0
Location: /usr/lib/python2.7/site-packages
Requires: protobuf, numpy, futures, grpcio, wheel, markdown, werkzeug, six
Required-by:
啟動服務(wù)：
# tensorboard --logdir /var/log/tensorboard.log
TensorBoard 1.13.0a20190107 at http://GPU-TF:6006 (Press CTRL+C to quit)
根據(jù)提示在瀏覽器上輸入http://服務(wù)器IP:6006

3.6 安裝Jupyter

安裝：
# sudo pip3 install jupyter
生成配置文件：
# jupyter notebook --generate-config
Writing default config to: /root/.jupyter/jupyter_notebook_config.py
生成Jupyter密碼：
# python
Python 3.6.5 (default, Jan 15 2019, 02:51:51)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from notebook.auth import passwd;
>>> passwd()
Enter password:
Verify password:
'sha1:6067bcf7350b:8407670bb3f94487c9404ed3c20c1ebf7ddee32e'
>>>
將生成的hash串寫入Jupyter配置文件：
# vim /root/.jupyter/jupyter_notebook_config.py

啟動服務(wù) 
 
# jupyter notebook --allow-root --ip='192.168.66.11' 
 
瀏覽器登陸

輸入密碼后登陸：即可正常訪問 
 
執(zhí)行測試任務(wù)： 
 
運(yùn)行TensorFlow Demo示例 
 
Jupyter中新建 HelloWorld 示例，代碼如下： 
 
import tensorflow as tf 
 
# Simple hello world using TensorFlow 
 
# Create a Constant op 
 
# The op is added as a node to the default graph. 
 
# 
 
# The value returned by the constructor represents the output 
 
# of the Constant op. 
 
hello = tf.constant('Hello, TensorFlow!') 
 
# Start tf session 
 
sess = tf.Session() 
 
# Run the op 
 
print(sess.run(hello))

0x4總結(jié)

通過使用ZStack云平臺可以快速構(gòu)建深度學(xué)習(xí)平臺，云平臺自身無需太多的復(fù)雜配置，在安裝各種驅(qū)動及深度學(xué)習(xí)組件時也與物理機(jī)無異。安裝好驅(qū)動后進(jìn)行性能測試發(fā)現(xiàn)與同配置物理邏輯性能相當(dāng)，GPU部分沒有任何性能損失。

當(dāng)上述軟件環(huán)境都準(zhǔn)備完成以后，可配置多塊GPU并從模板創(chuàng)建多個云主機(jī)一一透傳，結(jié)合ZStack本身的多租戶屬性，可使得多人在同一套環(huán)境中互不影響進(jìn)行開發(fā)或者運(yùn)行應(yīng)用程序，從而成為一個真正的深度學(xué)習(xí)“云”平臺。

責(zé)任編輯：張燕妮來源： ZStack