【深度學(xué)習(xí)系列】用PaddlePaddle和Tensorflow進(jìn)行圖像分類
上個月發(fā)布了四篇文章,主要講了深度學(xué)習(xí)中的“hello world”----mnist圖像識別,以及卷積神經(jīng)網(wǎng)絡(luò)的原理詳解,包括基本原理、自己手寫CNN和paddlepaddle的源碼解析。這篇主要跟大家講講如何用PaddlePaddle和Tensorflow做圖像分類。所有程序都在我的github里,可以自行下載訓(xùn)練。
在卷積神經(jīng)網(wǎng)絡(luò)中,有五大經(jīng)典模型,分別是:LeNet-5,AlexNet,GoogleNet,Vgg和ResNet。本文首先自己設(shè)計一個小型CNN網(wǎng)絡(luò)結(jié)構(gòu)來對圖像進(jìn)行分類,再了解一下LeNet-5網(wǎng)絡(luò)結(jié)構(gòu)對圖像做分類,并用比較流行的Tensorflow框架和百度的PaddlePaddle實現(xiàn)LeNet-5網(wǎng)絡(luò)結(jié)構(gòu),并對結(jié)果對比。
什么是圖像分類
圖像分類是根據(jù)圖像的語義信息將不同類別圖像區(qū)分開來,是計算機(jī)視覺中重要的基本問題,也是圖像檢測、圖像分割、物體跟蹤、行為分析等其他高層視覺任務(wù)的基礎(chǔ)。圖像分類在很多領(lǐng)域有廣泛應(yīng)用,包括安防領(lǐng)域的人臉識別和智能視頻分析等,交通領(lǐng)域的交通場景識別,互聯(lián)網(wǎng)領(lǐng)域基于內(nèi)容的圖像檢索和相冊自動歸類,醫(yī)學(xué)領(lǐng)域的圖像識別等(引用自官網(wǎng))
cifar-10數(shù)據(jù)集
CIFAR-10分類問題是機(jī)器學(xué)習(xí)領(lǐng)域的一個通用基準(zhǔn),由60000張32*32的RGB彩色圖片構(gòu)成,共10個分類。50000張用于訓(xùn)練集,10000張用于測試集。其問題是將32X32像素的RGB圖像分類成10種類別:飛機(jī),手機(jī),鳥,貓,鹿,狗,青蛙,馬,船和卡車。更多信息可以參考CIFAR-10和Alex Krizhevsky的演講報告。常見的還有cifar-100,分類物體達(dá)到100類,以及ILSVRC比賽的100類。
自己設(shè)計CNN
了解CNN的基本網(wǎng)絡(luò)結(jié)構(gòu)后,首先自己設(shè)計一個簡單的CNN網(wǎng)絡(luò)結(jié)構(gòu)對cifar-10數(shù)據(jù)進(jìn)行分類。
網(wǎng)絡(luò)結(jié)構(gòu)

代碼實現(xiàn)
1. 網(wǎng)絡(luò)結(jié)構(gòu):simple_cnn.py
1 #coding:utf-8
2 '''
3 Created by huxiaoman 2017.11.27
4 simple_cnn.py:自己設(shè)計的一個簡單的cnn網(wǎng)絡(luò)結(jié)構(gòu)
5 '''
6
7 import os
8 from PIL import Image
9 import numpy as np
10 import paddle.v2 as paddle
11 from paddle.trainer_config_helpers import *
12
13 with_gpu = os.getenv('WITH_GPU', '0') != '1'
14
15 def simple_cnn(img):
16 conv_pool_1 = paddle.networks.simple_img_conv_pool(
17 input=img,
18 filter_size=5,
19 num_filters=20,
20 num_channel=3,
21 pool_size=2,
22 pool_stride=2,
23 act=paddle.activation.Relu())
24 conv_pool_2 = paddle.networks.simple_img_conv_pool(
25 input=conv_pool_1,
26 filter_size=5,
27 num_filters=50,
28 num_channel=20,
29 pool_size=2,
30 pool_stride=2,
31 act=paddle.activation.Relu())
32 fc = paddle.layer.fc(
33 input=conv_pool_2, size=512, act=paddle.activation.Softmax())
2. 訓(xùn)練程序:train_simple_cnn.py
1 #coding:utf-8
2 '''
3 Created by huxiaoman 2017.11.27
4 train_simple—_cnn.py:訓(xùn)練simple_cnn對cifar10數(shù)據(jù)集進(jìn)行分類
5 '''
6 import sys, os
7
8 import paddle.v2 as paddle
9 from simple_cnn import simple_cnn
10
11 with_gpu = os.getenv('WITH_GPU', '0') != '1'
12
13
14 def main():
15 datadim = 3 * 32 * 32
16 classdim = 10
17
18 # PaddlePaddle init
19 paddle.init(use_gpu=with_gpu, trainer_count=7)
20
21 image = paddle.layer.data(
22 name="image", type=paddle.data_type.dense_vector(datadim))
23
24 # Add neural network config
25 # option 1. resnet
26 # net = resnet_cifar10(image, depth=32)
27 # option 2. vgg
28 net = simple_cnn(image)
29
30 out = paddle.layer.fc(
31 input=net, size=classdim, act=paddle.activation.Softmax())
32
33 lbl = paddle.layer.data(
34 name="label", type=paddle.data_type.integer_value(classdim))
35 cost = paddle.layer.classification_cost(input=out, label=lbl)
36
37 # Create parameters
38 parameters = paddle.parameters.create(cost)
39
40 # Create optimizer
41 momentum_optimizer = paddle.optimizer.Momentum(
42 momentum=0.9,
43 regularization=paddle.optimizer.L2Regularization(rate=0.0002 * 128),
44 learning_rate=0.1 / 128.0,
45 learning_rate_decay_a=0.1,
46 learning_rate_decay_b=50000 * 100,
47 learning_rate_schedule='discexp')
48
49 # End batch and end pass event handler
50 def event_handler(event):
51 if isinstance(event, paddle.event.EndIteration):
52 if event.batch_id % 100 == 0:
53 print "\nPass %d, Batch %d, Cost %f, %s" % (
54 event.pass_id, event.batch_id, event.cost, event.metrics)
55 else:
56 sys.stdout.write('.')
57 sys.stdout.flush()
58 if isinstance(event, paddle.event.EndPass):
59 # save parameters
60 with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
61 parameters.to_tar(f)
62
63 result = trainer.test(
64 reader=paddle.batch(
65 paddle.dataset.cifar.test10(), batch_size=128),
66 feeding={'image': 0,
67 'label': 1})
68 print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)
69
70 # Create trainer
71 trainer = paddle.trainer.SGD(
72 cost=cost, parameters=parameters, update_equation=momentum_optimizer)
73
74 # Save the inference topology to protobuf.
75 inference_topology = paddle.topology.Topology(layers=out)
76 with open("inference_topology.pkl", 'wb') as f:
77 inference_topology.serialize_for_inference(f)
78
79 trainer.train(
80 reader=paddle.batch(
81 paddle.reader.shuffle(
82 paddle.dataset.cifar.train10(), buf_size=50000),
83 batch_size=128),
84 num_passes=200,
85 event_handler=event_handler,
86 feeding={'image': 0,
87 'label': 1})
88
89 # inference
90 from PIL import Image
91 import numpy as np
92 import os
93
94 def load_image(file):
95 im = Image.open(file)
96 im = im.resize((32, 32), Image.ANTIALIAS)
97 im = np.array(im).astype(np.float32)
98 # The storage order of the loaded image is W(widht),
99 # H(height), C(channel). PaddlePaddle requires
100 # the CHW order, so transpose them.
101 im = im.transpose((2, 0, 1)) # CHW
102 # In the training phase, the channel order of CIFAR
103 # image is B(Blue), G(green), R(Red). But PIL open
104 # image in RGB mode. It must swap the channel order.
105 im = im[(2, 1, 0), :, :] # BGR
106 im = im.flatten()
107 im = im / 255.0
108 return im
109
110 test_data = []
111 cur_dir = os.path.dirname(os.path.realpath(__file__))
112 test_data.append((load_image(cur_dir + '/image/dog.png'), ))
113
114 # users can remove the comments and change the model name
115 # with open('params_pass_50.tar', 'r') as f:
116 # parameters = paddle.parameters.Parameters.from_tar(f)
117
118 probs = paddle.infer(
119 output_layer=out, parameters=parameters, input=test_data)
120 lab = np.argsort(-probs) # probs and lab are the results of one batch data
121 print "Label of image/dog.png is: %d" % lab[0][0]
122
123
124 if __name__ == '__main__':
125 main()
3. 結(jié)果輸出
1 I1128 21:44:30.218085 14733 Util.cpp:166] commandline: --use_gpu=True --trainer_count=7
2 [INFO 2017-11-28 21:44:35,874 layers.py:2539] output for __conv_pool_0___conv: c = 20, h = 28, w = 28, size = 15680
3 [INFO 2017-11-28 21:44:35,874 layers.py:2667] output for __conv_pool_0___pool: c = 20, h = 14, w = 14, size = 3920
4 [INFO 2017-11-28 21:44:35,875 layers.py:2539] output for __conv_pool_1___conv: c = 50, h = 10, w = 10, size = 5000
5 [INFO 2017-11-28 21:44:35,876 layers.py:2667] output for __conv_pool_1___pool: c = 50, h = 5, w = 5, size = 1250
6 I1128 21:44:35.881502 14733 MultiGradientMachine.cpp:99] numLogicalDevices=1 numThreads=7 numDevices=8
7 I1128 21:44:35.928449 14733 GradientMachine.cpp:85] Initing parameters..
8 I1128 21:44:36.056259 14733 GradientMachine.cpp:92] Init parameters done.
9
10 Pass 0, Batch 0, Cost 2.302628, {'classification_error_evaluator': 0.9296875}
11 ................................................................................
12 ```
13 Pass 199, Batch 200, Cost 0.869726, {'classification_error_evaluator': 0.3671875}
14 ...................................................................................................
15 Pass 199, Batch 300, Cost 0.801396, {'classification_error_evaluator': 0.3046875}
16 ..........................................................................................I1128 23:21:39.443141 14733 MultiGradientMachine.cpp:99] numLogicalDevices=1 numThreads=7 numDevices=8
17
18 Test with Pass 199, {'classification_error_evaluator': 0.5248000025749207}
19 Label of image/dog.png is: 9
我開了7個線程,用了8個Tesla K80 GPU訓(xùn)練,batch_size = 128,迭代次數(shù)200次,耗時1h37min,錯誤分類率為0.5248,這個結(jié)果,emm,不算很高,我們可以把它作為一個baseline,后面對其進(jìn)行調(diào)優(yōu)。
LeNet-5網(wǎng)絡(luò)結(jié)構(gòu)
Lenet-5網(wǎng)絡(luò)結(jié)構(gòu)來源于Yan LeCun提出的,原文為《Gradient-based learning applied to document recognition》,論文里使用的是mnist手寫數(shù)字作為輸入數(shù)據(jù)(32 * 32)進(jìn)行驗證。我們來看一下網(wǎng)絡(luò)結(jié)構(gòu)。

LeNet-5一共有8層: 1個輸入層+3個卷積層(C1、C3、C5)+2個下采樣層(S2、S4)+1個全連接層(F6)+1個輸出層,每層有多個feature map(自動提取的多組特征)。
Input輸入層
cifar10 數(shù)據(jù)集,每一張圖片尺寸:32 * 32
C1 卷積層
- 6個feature_map,卷積核大小 5 * 5 ,feature_map尺寸:28 * 28
- 每個卷積神經(jīng)元的參數(shù)數(shù)目:5 * 5 = 25個和一個bias參數(shù)
- 連接數(shù)目:(5*5+1)* 6 *(28*28) = 122,304
- 參數(shù)共享:每個feature_map內(nèi)共享參數(shù),∴∴共(5*5+1)*6 = 156個參數(shù)
S2 下采樣層(池化層)
- 6個14*14的feature_map,pooling大小 2* 2
- 每個單元與上一層的feature_map中的一個2*2的滑動窗口連接,不重疊,因此S2每個feature_map大小是C1中feature_map大小的1/4
- 連接數(shù):(2*2+1)*1*14*14*6 = 5880個
- 參數(shù)共享:每個feature_map內(nèi)共享參數(shù),有2 * 6 = 12個訓(xùn)練參數(shù)
C3 卷積層
這層略微復(fù)雜,S2神經(jīng)元與C3是多對多的關(guān)系,比如最簡單方式:用S2的所有feature map與C3的所有feature map做全連接(也可以對S2抽樣幾個feature map出來與C3某個feature map連接),這種全連接方式下:6個S2的feature map使用6個獨立的5×5卷積核得到C3中1個feature map(生成每個feature map時對應(yīng)一個bias),C3中共有16個feature map,所以該層需要學(xué)習(xí)的參數(shù)個數(shù)為:(5×5×6+1)×16=2416個,神經(jīng)元連接數(shù)為:2416×8×8=154624個。
S4 下采樣層
同S2,如果采用Max Pooling/Mean Pooling,則該層需要學(xué)習(xí)的參數(shù)個數(shù)為0個,神經(jīng)元連接數(shù)為:(2×2+1)×16×4×4=1280個。
C5卷積層
類似C3,用S4的所有feature map與C5的所有feature map做全連接,這種全連接方式下:16個S4的feature map使用16個獨立的1×1卷積核得到C5中1個feature map(生成每個feature map時對應(yīng)一個bias),C5中共有120個feature map,所以該層需要學(xué)習(xí)的參數(shù)個數(shù)為:(1×1×16+1)×120=2040個,神經(jīng)元連接數(shù)為:2040個。
F6 全連接層
將C5層展開得到4×4×120=1920個節(jié)點,并接一個全連接層,考慮bias,該層需要學(xué)習(xí)的參數(shù)和連接個數(shù)為:(1920+1)*84=161364個。
輸出層
該問題是個10分類問題,所以有10個輸出單元,通過softmax做概率歸一化,每個分類的輸出單元對應(yīng)84個輸入。
LeNet-5的PaddlePaddle實現(xiàn)
1. 網(wǎng)絡(luò)結(jié)構(gòu) lenet.py
1 #coding:utf-8
2 '''
3 Created by huxiaoman 2017.11.27
4 lenet.py:LeNet-5
5 '''
6
7 import os
8 from PIL import Image
9 import numpy as np
10 import paddle.v2 as paddle
11 from paddle.trainer_config_helpers import *
12
13 with_gpu = os.getenv('WITH_GPU', '0') != '1'
14
15 def lenet(img):
16 conv_pool_1 = paddle.networks.simple_img_conv_pool(
17 input=img,
18 filter_size=5,
19 num_filters=6,
20 num_channel=3,
21 pool_size=2,
22 pool_stride=2,
23 act=paddle.activation.Relu())
24 conv_pool_2 = paddle.networks.simple_img_conv_pool(
25 input=conv_pool_1,
26 filter_size=5,
27 num_filters=16,
28 pool_size=2,
29 pool_stride=2,
30 act=paddle.activation.Relu())
31 conv_3 = img_conv_layer(
32 input = conv_pool_2,
33 filter_size = 1,
34 num_filters = 120,
35 stride = 1)
36 fc = paddle.layer.fc(
37 input=conv_3, size=84, act=paddle.activation.Sigmoid())
38 return fc
2. 訓(xùn)練代碼 train_lenet.py
1 #coding:utf-8
2 '''
3 Created by huxiaoman 2017.11.27
4 train_lenet.py:訓(xùn)練LeNet-5對cifar10數(shù)據(jù)集進(jìn)行分類
5 '''
6
7 import sys, os
8
9 import paddle.v2 as paddle
10 from lenet import lenet
11
12 with_gpu = os.getenv('WITH_GPU', '0') != '1'
13
14
15 def main():
16 datadim = 3 * 32 * 32
17 classdim = 10
18
19 # PaddlePaddle init
20 paddle.init(use_gpu=with_gpu, trainer_count=7)
21
22 image = paddle.layer.data(
23 name="image", type=paddle.data_type.dense_vector(datadim))
24
25 # Add neural network config
26 # option 1. resnet
27 # net = resnet_cifar10(image, depth=32)
28 # option 2. vgg
29 net = lenet(image)
30
31 out = paddle.layer.fc(
32 input=net, size=classdim, act=paddle.activation.Softmax())
33
34 lbl = paddle.layer.data(
35 name="label", type=paddle.data_type.integer_value(classdim))
36 cost = paddle.layer.classification_cost(input=out, label=lbl)
37
38 # Create parameters
39 parameters = paddle.parameters.create(cost)
40
41 # Create optimizer
42 momentum_optimizer = paddle.optimizer.Momentum(
43 momentum=0.9,
44 regularization=paddle.optimizer.L2Regularization(rate=0.0002 * 128),
45 learning_rate=0.1 / 128.0,
46 learning_rate_decay_a=0.1,
47 learning_rate_decay_b=50000 * 100,
48 learning_rate_schedule='discexp')
49
50 # End batch and end pass event handler
51 def event_handler(event):
52 if isinstance(event, paddle.event.EndIteration):
53 if event.batch_id % 100 == 0:
54 print "\nPass %d, Batch %d, Cost %f, %s" % (
55 event.pass_id, event.batch_id, event.cost, event.metrics)
56 else:
57 sys.stdout.write('.')
58 sys.stdout.flush()
59 if isinstance(event, paddle.event.EndPass):
60 # save parameters
61 with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
62 parameters.to_tar(f)
63
64 result = trainer.test(
65 reader=paddle.batch(
66 paddle.dataset.cifar.test10(), batch_size=128),
67 feeding={'image': 0,
68 'label': 1})
69 print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)
70
71 # Create trainer
72 trainer = paddle.trainer.SGD(
73 cost=cost, parameters=parameters, update_equation=momentum_optimizer)
74
75 # Save the inference topology to protobuf.
76 inference_topology = paddle.topology.Topology(layers=out)
77 with open("inference_topology.pkl", 'wb') as f:
78 inference_topology.serialize_for_inference(f)
79
80 trainer.train(
81 reader=paddle.batch(
82 paddle.reader.shuffle(
83 paddle.dataset.cifar.train10(), buf_size=50000),
84 batch_size=128),
85 num_passes=200,
86 event_handler=event_handler,
87 feeding={'image': 0,
88 'label': 1})
89
90 # inference
91 from PIL import Image
92 import numpy as np
93 import os
94
95 def load_image(file):
96 im = Image.open(file)
97 im = im.resize((32, 32), Image.ANTIALIAS)
98 im = np.array(im).astype(np.float32)
99 # The storage order of the loaded image is W(widht),
100 # H(height), C(channel). PaddlePaddle requires
101 # the CHW order, so transpose them.
102 im = im.transpose((2, 0, 1)) # CHW
103 # In the training phase, the channel order of CIFAR
104 # image is B(Blue), G(green), R(Red). But PIL open
105 # image in RGB mode. It must swap the channel order.
106 im = im[(2, 1, 0), :, :] # BGR
107 im = im.flatten()
108 im = im / 255.0
109 return im
110
111 test_data = []
112 cur_dir = os.path.dirname(os.path.realpath(__file__))
113 test_data.append((load_image(cur_dir + '/image/dog.png'), ))
114
115 # users can remove the comments and change the model name
116 # with open('params_pass_50.tar', 'r') as f:
117 # parameters = paddle.parameters.Parameters.from_tar(f)
118
119 probs = paddle.infer(
120 output_layer=out, parameters=parameters, input=test_data)
121 lab = np.argsort(-probs) # probs and lab are the results of one batch data
122 print "Label of image/dog.png is: %d" % lab[0][0]
123
124
125 if __name__ == '__main__':
126 main()
3. 結(jié)果輸出
1 I1129 14:52:44.314946 15153 Util.cpp:166] commandline: --use_gpu=True --trainer_count=7
2 [INFO 2017-11-29 14:52:50,490 layers.py:2539] output for __conv_pool_0___conv: c = 6, h = 28, w = 28, size = 4704
3 [INFO 2017-11-29 14:52:50,491 layers.py:2667] output for __conv_pool_0___pool: c = 6, h = 14, w = 14, size = 1176
4 [INFO 2017-11-29 14:52:50,491 layers.py:2539] output for __conv_pool_1___conv: c = 16, h = 10, w = 10, size = 1600
5 [INFO 2017-11-29 14:52:50,492 layers.py:2667] output for __conv_pool_1___pool: c = 16, h = 5, w = 5, size = 400
6 [INFO 2017-11-29 14:52:50,493 layers.py:2539] output for __conv_0__: c = 120, h = 5, w = 5, size = 3000
7 I1129 14:52:50.498749 15153 MultiGradientMachine.cpp:99] numLogicalDevices=1 numThreads=7 numDevices=8
8 I1129 14:52:50.545882 15153 GradientMachine.cpp:85] Initing parameters..
9 I1129 14:52:50.651103 15153 GradientMachine.cpp:92] Init parameters done.
10
11 Pass 0, Batch 0, Cost 2.331898, {'classification_error_evaluator': 0.9609375}
12 ```
13 ......
14 Pass 199, Batch 300, Cost 0.004373, {'classification_error_evaluator': 0.0}
15 ..........................................................................................I1129 16:17:08.678097 15153 MultiGradientMachine.cpp:99] numLogicalDevices=1 numThreads=7 numDevices=8
16
17 Test with Pass 199, {'classification_error_evaluator': 0.39579999446868896}
18 Label of image/dog.png is: 7
同樣是7個線程,8個Tesla K80 GPU,batch_size = 128,迭代次數(shù)200次,耗時1h25min,錯誤分類率為0.3957,相比與simple_cnn的0.5248提高了12.91%。當(dāng)然,這個結(jié)果也并不是很好,如果輸出詳細(xì)的日志,可以看到在訓(xùn)練的過程中l(wèi)oss先降后升,說明有一定程度的過擬合,對于如何防止過擬合,我們在后面會詳細(xì)講解。
有一個可視化CNN的網(wǎng)站可以對mnist和cifar10分類的網(wǎng)絡(luò)結(jié)構(gòu)進(jìn)行可視化,這是cifar-10 BaseCNN的網(wǎng)絡(luò)結(jié)構(gòu):

LeNet-5的Tensorflow實現(xiàn)
tensorflow版本的LeNet-5版本的可以參照models/tutorials/image/cifar10/(https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10)的步驟來訓(xùn)練,不過這里面的代碼包含了很多數(shù)據(jù)處理、權(quán)重衰減以及正則化的一些方法防止過擬合。按照官方寫的,batch_size=128時在Tesla K40上迭代10w次需要4小時,準(zhǔn)確率能達(dá)到86%。不過如果不對數(shù)據(jù)做處理,直接跑的話,效果應(yīng)該沒有這么好。不過可以仔細(xì)借鑒cifar10_inputs.py里的distorted_inouts函數(shù)對數(shù)據(jù)預(yù)處理增大數(shù)據(jù)集的思想,以及cifar10.py里對于權(quán)重和偏置的衰減設(shè)置等。目前迭代到1w次左右,cost是0.98,acc是78.4%
對于未進(jìn)行數(shù)據(jù)處理的cifar10我準(zhǔn)備也跑一次,看看效果如何,與paddle的結(jié)果對比一下。不過得等到周末再補(bǔ)上了 = =
總結(jié)
本節(jié)用常規(guī)的cifar-10數(shù)據(jù)集做圖像分類,用了三種實現(xiàn)方式,第一種是自己設(shè)計的一個簡單的cnn,第二種是LeNet-5,第三種是Tensorflow實現(xiàn)的LeNet-5,對比速度可以見一下表格:

可以看到LeNet-5相比于原始的simple_cnn在準(zhǔn)確率和速度方面都有一定的的提升,等tensorflow版本跑完后可以把結(jié)果加上去再對比一下。不過用Lenet-5網(wǎng)絡(luò)結(jié)構(gòu)后,結(jié)果雖然有一定的提升,但是還是不夠理想,在日志里看到loss的信息基本可以推斷出是過擬合,對于神經(jīng)網(wǎng)絡(luò)訓(xùn)練過程中出現(xiàn)的過擬合情況我們應(yīng)該如何避免,下期我們講著重講解。此外在下一節(jié)將介紹AlexNet,并對分類做一個實驗,對比其效果。
參考文獻(xiàn)
1.LeNet-5論文:《Gradient-based learning applied to document recognition》































