TF Learn : 基于Scikit-learn和TensorFlow的深度學(xué)習(xí)利器
原創(chuàng)【51CTO.com原創(chuàng)稿件】了解國外數(shù)據(jù)科學(xué)市場(chǎng)的人都知道,2017年海外數(shù)據(jù)科學(xué)最常用的三項(xiàng)技術(shù)是 Spark ,Python 和 MongoDB 。說到 Python ,做大數(shù)據(jù)的人都不會(huì)對(duì) Scikit-learn 和 Pandas 感到陌生。
Scikit-learn 是最常用的 Python 機(jī)器學(xué)習(xí)框架,在各大互聯(lián)網(wǎng)公司做算法的工程師在實(shí)現(xiàn)單機(jī)版本的算法的時(shí)候或多或少都會(huì)用到 Scikit-learn 。TensorFlow 就更是大名鼎鼎,做深度學(xué)習(xí)的人都不可能不知道 TensorFlow。
下面我們先來看一段樣例,這段樣例是傳統(tǒng)的機(jī)器學(xué)習(xí)算法邏輯回歸的實(shí)現(xiàn):
可以看到,樣例中僅僅使用了 3 行代碼就完成了邏輯回歸的主要功能。下面我們來看一下如果用 TensorFlow 來實(shí)現(xiàn)同樣的代碼,需要多少行?下面的代碼來自 GitHub :
- '''
 - A logistic regression learning algorithm example using TensorFlow library.
 - This example is using the MNIST database of handwritten digits
 - (http://yann.lecun.com/exdb/mnist/)
 - Author: Aymeric Damien
 - Project: https://github.com/aymericdamien/TensorFlow-Examples/
 - '''
 - from __future__ import print_function
 - import tensorflow as tf
 - # Import MNIST data
 - from tensorflow.examples.tutorials.mnist import input_data
 - mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
 - # Parameters
 - learning_rate = 0.01
 - training_epochs = 25
 - batch_size = 100
 - display_step = 1
 - # tf Graph Input
 - x = tf.placeholder(tf.float32, [None, 784]) # mnist data image of shape 28*28=784
 - y = tf.placeholder(tf.float32, [None, 10]) # 0-9 digits recognition => 10 classes
 - # Set model weights
 - W = tf.Variable(tf.zeros([784, 10]))
 - b = tf.Variable(tf.zeros([10]))
 - # Construct model
 - pred = tf.nn.softmax(tf.matmul(x, W) + b) # Softmax
 - # Minimize error using cross entropy
 - cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1))
 - # Gradient Descent
 - optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
 - # Initialize the variables (i.e. assign their default value)
 - init = tf.global_variables_initializer()
 - # Start training
 - with tf.Session() as sess:
 - # Run the initializer
 - sess.run(init)
 - # Training cycle
 - for epoch in range(training_epochs):
 - avg_cost = 0.
 - total_batch = int(mnist.train.num_examples/batch_size)
 - # Loop over all batches
 - for i in range(total_batch):
 - batch_xs, batch_ys = mnist.train.next_batch(batch_size)
 - # Run optimization op (backprop) and cost op (to get loss value)
 - _, c = sess.run([optimizer, cost], feed_dict={x: batch_xs,
 - y: batch_ys})
 - # Compute average loss
 - avg_cost += c / total_batch
 - # Display logs per epoch step
 - if (epoch+1) % display_step == 0:
 - print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(avg_cost))
 - print("Optimization Finished!")
 - # Test model
 - correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
 - # Calculate accuracy
 - accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
 - print("Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))
 
一個(gè)相對(duì)來說比較簡單的機(jī)器學(xué)習(xí)算法,用 Tensorflow 來實(shí)現(xiàn)卻花費(fèi)了大量的篇幅。然而 Scikit-learn 本身沒有 Tensorflow 那樣豐富的深度學(xué)習(xí)的功能。有沒有什么辦法,能夠在保證 Scikit-learn 的簡單易用性的前提下,能夠讓 Scikit-learn 像 Tensorflow 那樣支持深度學(xué)習(xí)呢?答案是有的,那就是 Scikit-Flow 開源項(xiàng)目。該項(xiàng)目后來被集成到了 Tensorflow 項(xiàng)目里,變成了現(xiàn)在的 TF Learn 模塊。
我們來看一個(gè) TF Learn 實(shí)現(xiàn)線性回歸的樣例:
- """ Linear Regression Example """
 - from __future__ import absolute_import, division, print_function
 - import tflearn
 - # Regression data
 - X = [3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,7.042,10.791,5.313,7.997,5.654,9.27,3.1]
 - Y = [1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,2.827,3.465,1.65,2.904,2.42,2.94,1.3]
 - # Linear Regression graph
 - input_ = tflearn.input_data(shape=[None])
 - linear = tflearn.single_unit(input_)
 - regression = tflearn.regression(linear, optimizer='sgd', loss='mean_square',
 - metric='R2', learning_rate=0.01)
 - m = tflearn.DNN(regression)
 - m.fit(X, Y, n_epoch=1000, show_metric=True, snapshot_epoch=False)
 - print("\nRegression result:")
 - print("Y = " + str(m.get_weights(linear.W)) +
 - "*X + " + str(m.get_weights(linear.b)))
 - print("\nTest prediction for x = 3.2, 3.3, 3.4:")
 - print(m.predict([3.2, 3.3, 3.4]))
 
我們可以看到,TF Learn 繼承了 Scikit-Learn 的簡潔編程風(fēng)格,在處理傳統(tǒng)的機(jī)器學(xué)習(xí)方法的時(shí)候非常的方便。下面我們看一段 TF Learn 實(shí)現(xiàn) CNN (MNIST數(shù)據(jù)集)的樣例:
- """ Convolutional Neural Network for MNIST dataset classification task.
 - References:
 - Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-based
 - learning applied to document recognition." Proceedings of the IEEE,
 - 86(11):2278-2324, November 1998.
 - Links:
 - [MNIST Dataset] http://yann.lecun.com/exdb/mnist/
 - """
 - from __future__ import division, print_function, absolute_import
 - import tflearn
 - from tflearn.layers.core import input_data, dropout, fully_connected
 - from tflearn.layers.conv import conv_2d, max_pool_2d
 - from tflearn.layers.normalization import local_response_normalization
 - from tflearn.layers.estimator import regression
 - # Data loading and preprocessing
 - import tflearn.datasets.mnist as mnist
 - X, Y, testX, testY = mnist.load_data(one_hot=True)
 - X = X.reshape([-1, 28, 28, 1])
 - testX = testX.reshape([-1, 28, 28, 1])
 - # Building convolutional network
 - network = input_data(shape=[None, 28, 28, 1], name='input')
 - network = conv_2d(network, 32, 3, activation='relu', regularizer="L2")
 - network = max_pool_2d(network, 2)
 - network = local_response_normalization(network)
 - network = conv_2d(network, 64, 3, activation='relu', regularizer="L2")
 - network = max_pool_2d(network, 2)
 - network = local_response_normalization(network)
 - network = fully_connected(network, 128, activation='tanh')
 - network = dropout(network, 0.8)
 - network = fully_connected(network, 256, activation='tanh')
 - network = dropout(network, 0.8)
 - network = fully_connected(network, 10, activation='softmax')
 - network = regression(network, optimizer='adam', learning_rate=0.01,
 - loss='categorical_crossentropy', name='target')
 - # Training
 - model = tflearn.DNN(network, tensorboard_verbose=0)
 - model.fit({'input': X}, {'target': Y}, n_epoch=20,
 - validation_set=({'input': testX}, {'target': testY}),
 - snapshot_step=100, show_metric=True, run_id='convnet_mnist')
 
可以看到,基于 TF Learn 的深度學(xué)習(xí)代碼也是非常的簡潔。
TF Learn 是 TensorFlow 的高層次類 Scikit-Learn 封裝,提供了原生版 TensorFlow 和 Scikit-Learn 之外的又一種選擇。對(duì)于熟悉了 Scikit-Learn 和厭倦了 TensorFlow 冗長代碼的用戶來說,不啻為一種福音,也值得機(jī)器學(xué)習(xí)和數(shù)據(jù)挖掘的從業(yè)者認(rèn)真學(xué)習(xí)和掌握。
汪昊,恒昌利通大數(shù)據(jù)部負(fù)責(zé)人/資深架構(gòu)師,美國猶他大學(xué)本科/碩士,對(duì)外經(jīng)貿(mào)大學(xué)在職MBA。曾在百度,新浪,網(wǎng)易,豆瓣等公司有多年的研發(fā)和技術(shù)管理經(jīng)驗(yàn),擅長機(jī)器學(xué)習(xí),大數(shù)據(jù),推薦系統(tǒng),社交網(wǎng)絡(luò)分析等技術(shù)。在 TVCG 和 ASONAM 等國際會(huì)議和期刊發(fā)表論文 8 篇。本科畢業(yè)論文獲國際會(huì)議 IEEE SMI 2008 ***論文獎(jiǎng)。
【51CTO原創(chuàng)稿件,合作站點(diǎn)轉(zhuǎn)載請(qǐng)注明原文作者和出處為51CTO.com】
















 
 
 








 
 
 
 