圖解機(jī)器學(xué)習(xí)：神經(jīng)網(wǎng)絡(luò)和TensorFlow的文本分類

作者：佚名 2017-08-04 14:23:04

人工智能機(jī)器學(xué)習(xí)

在本文中，我們將創(chuàng)建一個機(jī)器學(xué)習(xí)模型來將文本分類到類別中。

開發(fā)人員經(jīng)常說，如果你想開始機(jī)器學(xué)習(xí)，你應(yīng)該首先學(xué)習(xí)算法。但是我的經(jīng)驗則不是。

我說你應(yīng)該首先了解：應(yīng)用程序如何工作。一旦了解了這一點，深入探索算法的內(nèi)部工作就會變得更加容易。

那么，你如何開發(fā)直覺學(xué)習(xí)，并實現(xiàn)理解機(jī)器學(xué)習(xí)這個目的?一個很好的方法是創(chuàng)建機(jī)器學(xué)習(xí)模型。

假設(shè)您仍然不知道如何從頭開始創(chuàng)建所有這些算法，您可以使用一個已經(jīng)為您實現(xiàn)所有這些算法的庫。那個庫是 TensorFlow。

在本文中，我們將創(chuàng)建一個機(jī)器學(xué)習(xí)模型來將文本分類到類別中。我們將介紹以下主題：

TensorFlow 的工作原理
什么是機(jī)器學(xué)習(xí)模型
什么是神經(jīng)網(wǎng)絡(luò)
神經(jīng)網(wǎng)絡(luò)如何學(xué)習(xí)
如何操作數(shù)據(jù)并將其傳遞給神經(jīng)網(wǎng)絡(luò)
如何運行模型并獲得預(yù)測結(jié)果

你可能會學(xué)到很多新東西，所以讓我們開始吧!

TensorFlow

TensorFlow 是一個機(jī)器學(xué)習(xí)的開源庫，由 Google 首創(chuàng)。庫的名稱幫助我們理解我們怎樣使用它：tensors 是通過圖的節(jié)點流轉(zhuǎn)的多維數(shù)組。

tf.Graph

在 TensorFlow 中的每一個計算都表示為數(shù)據(jù)流圖，這個圖有兩類元素：

一類 tf.Operation，表示計算單元
一類 tf.Tensor，表示數(shù)據(jù)單元

要查看這些是怎么工作的，你需要創(chuàng)建這個數(shù)據(jù)流圖：

(計算x+y的圖)

你需要定義 x = [1,3,6] 和 y = [1,1,1]。由于圖用 tf.Tensor 表示數(shù)據(jù)單元，你需要創(chuàng)建常量 Tensors：

import tensorflow as tf 
 
x = tf.constant([1,3,6]) 
 
y = tf.constant([1,1,1])

現(xiàn)在你將定義操作單元：

import tensorflow as tf 
 
x = tf.constant([1,3,6]) 
 
y = tf.constant([1,1,1]) 
 
op = tf.add(x,y)

你有了所有的圖元素。現(xiàn)在你需要構(gòu)建圖：

import tensorflow as tf 
 
my_graph = tf.Graph() 
 
with my_graph.as_default(): 
 
x = tf.constant([1,3,6]) 
 
y = tf.constant([1,1,1]) 
 
op = tf.add(x,y)

這是 TensorFlow 工作流的工作原理：你首先要創(chuàng)建一個圖，然后你才能計算(實際上是用操作‘運行’圖節(jié)點)。你需要創(chuàng)建一個 tf.Session 運行圖。

tf.Session

tf.Session 對象封裝了 Operation 對象的執(zhí)行環(huán)境。Tensor 對象是被計算過的(從文檔中)。為了做到這些，我們需要在 Session 中定義哪個圖將被使用到：

import tensorflow as tf 
 
my_graph = tf.Graph() 
 
with tf.Session(graph=my_graph) as sess: 
 
  x = tf.constant([1,3,6]) 
 
  y = tf.constant([1,1,1]) 
 
  op = tf.add(x,y)

為了執(zhí)行操作，你需要使用方法 tf.Session.run()。這個方法通過運行必要的圖段去執(zhí)行每個 Operation 對象并通過參數(shù) fetches 計算每一個 Tensor 的值的方式執(zhí)行 TensorFlow 計算的一’步’：

import tensorflow as tf 
 
my_graph = tf.Graph() 
 
with tf.Session(graph=my_graph) as sess: 
 
  x = tf.constant([1,3,6]) 
 
  y = tf.constant([1,1,1]) 
 
  op = tf.add(x,y) 
 
  result = sess.run(fetches=op) 
 
  print(result) 
 
>>> [2 4 7]

預(yù)測模型

現(xiàn)在你知道了 TensorFlow 的工作原理，那么你得知道怎樣創(chuàng)建預(yù)測模型。簡而言之

機(jī)器學(xué)習(xí)算法+數(shù)據(jù)=預(yù)測模型

構(gòu)建模型的過程就是這樣：

(構(gòu)建預(yù)測模型的過程)

正如你能看到的，模型由數(shù)據(jù)“訓(xùn)練過的”機(jī)器學(xué)習(xí)算法組成。當(dāng)你有了模型，你就會得到這樣的結(jié)果：

(預(yù)測工作流)

你創(chuàng)建的模型的目的是對文本分類，我們定義了：

input: text, result: category

我們有一個使用已經(jīng)標(biāo)記過的文本(每個文本都有了它屬于哪個分類的標(biāo)記)訓(xùn)練的數(shù)據(jù)集。在機(jī)器學(xué)習(xí)中，這種任務(wù)的類型是被稱為監(jiān)督學(xué)習(xí)。

“我們知道正確的答案。該算法迭代的預(yù)測訓(xùn)練數(shù)據(jù)，并由老師糾正

” — Jason Brownlee

你會把數(shù)據(jù)分成類，因此它也是一個分類任務(wù)。

為了創(chuàng)建這個模型，我們將會用到神經(jīng)網(wǎng)絡(luò)。

神經(jīng)網(wǎng)絡(luò)

神經(jīng)網(wǎng)絡(luò)是一個計算模型(一種描述使用機(jī)器語言和數(shù)學(xué)概念的系統(tǒng)的方式)。這些系統(tǒng)是自主學(xué)習(xí)和被訓(xùn)練的，而不是明確編程的。

神經(jīng)網(wǎng)絡(luò)是也從我們的中樞神經(jīng)系統(tǒng)受到的啟發(fā)。他們有與我們神經(jīng)相似的連接節(jié)點。

感知器是***個神經(jīng)網(wǎng)絡(luò)算法。這篇文章很好地解釋了感知器的內(nèi)部工作原理(“人工神經(jīng)元內(nèi)部” 的動畫非常棒)。

為了理解神經(jīng)網(wǎng)絡(luò)的工作原理，我們將會使用 TensorFlow 建立一個神經(jīng)網(wǎng)絡(luò)架構(gòu)。在這個例子中，這個架構(gòu)被 Aymeric Damien 使用過。

神經(jīng)網(wǎng)絡(luò)架構(gòu)

神經(jīng)網(wǎng)絡(luò)有兩個隱藏層(你得選擇網(wǎng)絡(luò)會有多少隱藏層，這是結(jié)構(gòu)設(shè)計的一部分)。每一個隱藏層的任務(wù)是把輸入的東西轉(zhuǎn)換成輸出層可以使用的東西。

隱藏層 1

(輸入層和***個隱藏層)

你也需要定義***個隱藏層會有多少節(jié)點。這些節(jié)點也被稱為特征或神經(jīng)元，在上面的例子中我們用每一個圓圈表示一個節(jié)點。

輸入層的每個節(jié)點都對應(yīng)著數(shù)據(jù)集中的一個詞(之后我們會看到這是怎么運行的)

如這里所述，每個節(jié)點(神經(jīng)元)乘以一個權(quán)重。每個節(jié)點都有一個權(quán)重值，在訓(xùn)練階段，神經(jīng)網(wǎng)絡(luò)會調(diào)整這些值以產(chǎn)生正確的輸出(過會，我們將會學(xué)習(xí)更多關(guān)于這個的信息)

除了乘以沒有輸入的權(quán)重，網(wǎng)絡(luò)也會增加一個誤差 (在神經(jīng)網(wǎng)絡(luò)中誤差的角色)。

在你的架構(gòu)中，將輸入乘以權(quán)重并將值與偏差相加，這些數(shù)據(jù)也要通過激活函數(shù)傳遞。這個激活函數(shù)定義了每個節(jié)點的最終輸出。比如說：想象一下，每一個節(jié)點是一盞燈，激活函數(shù)決定燈是否會亮。

有很多類型的激活函數(shù)。你將會使用 Rectified Linear Unit (ReLu)。這個函數(shù)是這樣定義的：

f(x) = max(0,x) [輸出 x 或者 0(零)中***的數(shù)]

例如：如果 x = -1, f(x) = 0(zero); 如果 x = 0.7, f(x) = 0.7.

隱藏層 2

第二個隱藏層做的完全是***個隱藏層做的事情，但現(xiàn)在第二層的輸入是***層的輸出。

(***和第二隱藏層)

輸出層

現(xiàn)在終于到了***一層，輸出層。你將會使用 One-Hot 編碼得到這個層的結(jié)果。在這個編碼中，只有一個比特的值是 1，其他比特的值都是 0。例如，如果我們想對三個分類編碼(sports, space 和computer graphics)編碼：

因此輸出節(jié)點的編號是輸入的數(shù)據(jù)集的分類的編號。

輸出層的值也要乘以權(quán)重，并我們也要加上誤差，但是現(xiàn)在激活函數(shù)不一樣。

你想用分類對每一個文本進(jìn)行標(biāo)記，并且這些分類相互獨立(一個文本不能同時屬于兩個分類)?？紤]到這點，你將使用 Softmax 函數(shù)而不是 ReLu 激活函數(shù)。這個函數(shù)把每一個完整的輸出轉(zhuǎn)換成 0 和 1 之間的值，并且確保所有單元的和等于一。這樣，輸出將告訴我們每個分類中每個文本的概率。

| 1.2 0.46| 
 
| 0.9 -> [softmax] -> 0.34| 
 
| 0.4 0.20|

現(xiàn)在有了神經(jīng)網(wǎng)絡(luò)的數(shù)據(jù)流圖。把我們所看到的都轉(zhuǎn)換為代碼，結(jié)果是：

# Network Parameters 
 
n_hidden_1 = 10        # 1st layer number of features 
 
n_hidden_2 = 5         # 2nd layer number of features 
 
n_input = total_words  # Words in vocab 
 
n_classes = 3          # Categories: graphics, space and baseball 
 
def multilayer_perceptron(input_tensor, weights, biases): 
 
    layer_1_multiplication = tf.matmul(input_tensor, weights['h1']) 
 
    layer_1_addition = tf.add(layer_1_multiplication, biases['b1']) 
 
    layer_1_activation = tf.nn.relu(layer_1_addition) 
 
# Hidden layer with RELU activation 
 
    layer_2_multiplication = tf.matmul(layer_1_activation, weights['h2']) 
 
    layer_2_addition = tf.add(layer_2_multiplication, biases['b2']) 
 
    layer_2_activation = tf.nn.relu(layer_2_addition) 
 
# Output layer with linear activation 
 
    out_layer_multiplication = tf.matmul(layer_2_activation, weights['out']) 
 
    out_layer_addition = out_layer_multiplication + biases['out']return out_layer_addition

(我們將會在后面討論輸出層的激活函數(shù))

神經(jīng)網(wǎng)絡(luò)怎么學(xué)習(xí)

就像我們前面看到的那樣，神經(jīng)網(wǎng)絡(luò)訓(xùn)練時會更新權(quán)重值?，F(xiàn)在我們將看到在 TensorFlow 環(huán)境下這是怎么發(fā)生的。

tf.Variable

權(quán)重和誤差存儲在變量(tf.Variable)中。這些變量通過調(diào)用 run() 保持在圖中的狀態(tài)。在機(jī)器學(xué)習(xí)中我們一般通過正太分布來啟動權(quán)重和偏差值。

weights = { 
 
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])), 
 
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])), 
 
    'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes])) 
 
} 
 
biases = { 
 
    'b1': tf.Variable(tf.random_normal([n_hidden_1])), 
 
    'b2': tf.Variable(tf.random_normal([n_hidden_2])), 
 
    'out': tf.Variable(tf.random_normal([n_classes])) 
 
}

當(dāng)我們***次運行神經(jīng)網(wǎng)絡(luò)的時候(也就是說，權(quán)重值是由正態(tài)分布定義的):

input values: x 
 
weights: w 
 
bias: b 
 
output values: z 
 
expected values: expected

為了知道網(wǎng)絡(luò)是否正在學(xué)習(xí)，你需要比較一下輸出值(Z)和期望值(expected)。我們要怎么計算這個的不同(損耗)呢?有很多方法去解決這個問題。因為我們正在進(jìn)行分類任務(wù)，測量損耗的***的方式是交叉熵誤差。

James D. McCaffrey 寫了一個精彩的解釋，說明為什么這是這種類型任務(wù)的***方法。

通過 TensorFlow 你將使用 tf.nn.softmax_cross_entropy_with_logits() 方法計算交叉熵誤差(這個是 softmax 激活函數(shù))并計算平均誤差 (tf.reduced_mean())。

# Construct model 
 
prediction = multilayer_perceptron(input_tensor, weights, biases) 
 
# Define loss 
 
entropy_loss = tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=output_tensor) 
 
loss = tf.reduce_mean(entropy_loss)

你希望通過權(quán)重和誤差的***值，以便最小化輸出誤差(實際得到的值和正確的值之間的區(qū)別)。要做到這一點，將需使用梯度下降法。更具體些是，需要使用隨機(jī)梯度下降。

(梯度下降。源: https://sebastianraschka.com/faq/docs/closed-form-vs-gd.html)

為了計算梯度下降，將要使用 Adaptive Moment Estimation (Adam)。要在 TensorFlow 中使用此算法，需要傳遞 learning_rate 值，該值可確定值的增量步長以找到***權(quán)重值。

方法 tf.train.AdamOptimizer(learning_rate).minimize(loss) 是一個語法糖，它做了兩件事情：

compute_gradients(loss, <list of variables>)
apply_gradients(<list of variables>)

這個方法用新的值更新了所有的 tf.Variables ，因此我們不需要傳遞變量列表?，F(xiàn)在你有了訓(xùn)練網(wǎng)絡(luò)的代碼：

learning_rate = 0.001 
 
# Construct model 
 
prediction = multilayer_perceptron(input_tensor, weights, biases) 
 
# Define loss 
 
entropy_loss = tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=output_tensor) 
 
loss = tf.reduce_mean(entropy_loss) 
 
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

數(shù)據(jù)操作

將要使用的數(shù)據(jù)集有很多英文文本，我們需要操作這些數(shù)據(jù)將其傳遞給神經(jīng)網(wǎng)絡(luò)。要做到這一點，需要做兩件事：

為每一個工作創(chuàng)建索引
為每一個文本創(chuàng)建矩陣，在矩陣?yán)?，如果單詞在文本中則值為 1，否則值為 0

讓我們看著代碼來理解這個過程：

import numpy as np    #numpy is a package for scientific computing 
 
from collections import Counter 
 
vocab = Counter() 
 
text = "Hi from Brazil"#Get all wordsfor word in text.split(' '): 
 
    vocab[word]+=1 
 
        #Convert words to indexes 
 
def get_word_2_index(vocab): 
 
    word2index = {}    for i,word in enumerate(vocab): 
 
        word2index[word] = i         
 
    return word2index 
 
#Now we have an index 
 
word2index = get_word_2_index(vocab) 
 
total_words = len(vocab) 
 
#This is how we create a numpy array (our matrix) 
 
matrix = np.zeros((total_words),dtype=float) 
 
#Now we fill the valuesfor word in text.split(): 
 
    matrix[word2index[word]] += 1print(matrix) 
 
>>> [ 1.  1.  1.]

上面例子中的文本是‘Hi from Brazil’，矩陣是 [ 1. 1. 1.]。如果文本僅是‘Hi’會怎么樣?

matrix = np.zeros((total_words),dtype=float) 
 
text = "Hi"for word in text.split(): 
 
matrix[word2index[word.lower()]] += 1print(matrix) 
 
>>> [ 1. 0. 0.]

將會與標(biāo)簽(文本的分類)相同，但是現(xiàn)在得使用獨熱編碼(one-hot encoding)：

y = np.zeros((3),dtype=float)if category == 0: 
 
y[0] = 1. # [ 1. 0. 0.] 
 
elif category == 1: 
 
y[1] = 1. # [ 0. 1. 0.]else: 
 
y[2] = 1. # [ 0. 0. 1.]

運行圖并獲取結(jié)果

現(xiàn)在進(jìn)入最精彩的部分：從模型中獲取結(jié)果。先仔細(xì)看看輸入的數(shù)據(jù)集。

數(shù)據(jù)集

對于一個有 18.000 個帖子大約有 20 個主題的數(shù)據(jù)集，將會使用到 20個新聞組。要加載這些數(shù)據(jù)集將會用到 scikit-learn 庫。我們只使用 3 種類別：comp.graphics, sci.space 和 rec.sport.baseball。scikit-learn 有兩個子集：一個用于訓(xùn)練，另一個用于測試。建議不要查看測試數(shù)據(jù)，因為這可能會在創(chuàng)建模型時干擾你的選擇。你不會希望創(chuàng)建一個模型來預(yù)測這個特定的測試數(shù)據(jù)，因為你希望創(chuàng)建一個具有很好的泛化性能的模型。

這里是如何加載數(shù)據(jù)集的代碼：

from sklearn.datasets import fetch_20newsgroups 
 
categories = ["comp.graphics","sci.space","rec.sport.baseball"] 
 
newsgroups_train = fetch_20newsgroups(subset='train', categories=categories) 
 
newsgroups_test = fetch_20newsgroups(subset='test', categories=categories

訓(xùn)練模型

在神經(jīng)網(wǎng)絡(luò)的術(shù)語里，一次 epoch = 一個向前傳遞(得到輸出的值)和一個所有訓(xùn)練示例的向后傳遞(更新權(quán)重)。

還記得 tf.Session.run() 方法嗎?讓我們仔細(xì)看看它：

tf.Session.run(fetches, feed_dict=None, options=None, run_metadata=None)

在這篇文章開始的數(shù)據(jù)流圖里，你用到了和操作，但是我們也可以傳遞一個事情的列表用于運行。在這個神經(jīng)網(wǎng)絡(luò)運行中將傳遞兩個事情：損耗計算和優(yōu)化步驟。

feed_dict 參數(shù)是我們?yōu)槊坎竭\行所輸入的數(shù)據(jù)。為了傳遞這個數(shù)據(jù)，我們需要定義tf.placeholders(提供給 feed_dict)

正如 TensorFlow 文檔中說的：

“占位符的存在只作為輸入的目標(biāo)，它不需要初始化，也不包含數(shù)據(jù)。” — Source

因此將要像這樣定義占位符：

n_input = total_words # Words in vocab 
 
n_classes = 3 # Categories: graphics, sci.space and baseball 
 
input_tensor = tf.placeholder(tf.float32,[None, n_input],name="input") 
 
output_tensor = tf.placeholder(tf.float32,[None, n_classes],name="output")

還將要批量分離你的訓(xùn)練數(shù)據(jù)：

“如果為了能夠輸入而使用占位符，可通過使用 tf.placeholder(…, shape=[None, …]) 創(chuàng)建占位符來指定變量批量維度。shape 的 None 元素對應(yīng)于大小可變的維度。” — Source

在測試模型時，我們將用更大的批處理來提供字典，這就是為什么需要定義一個可變的批處理維度。

get_batches() 函數(shù)為我們提供了批處理大小的文本數(shù)?，F(xiàn)在我們可以運行模型：

training_epochs = 10# Launch the graph 
 
with tf.Session() as sess: 
 
    sess.run(init) #inits the variables (normal distribution, remember?) 
 
    # Training cycle    for epoch in range(training_epochs): 
 
        avg_cost = 0. 
 
        total_batch = int(len(newsgroups_train.data)/batch_size) 
 
        # Loop over all batches        for i in range(total_batch): 
 
            batch_x,batch_y = get_batch(newsgroups_train,i,batch_size) 
 
            # Run optimization op (backprop) and cost op (to get loss value) 
 
            c,_ = sess.run([loss,optimizer], feed_dict={input_tensor: batch_x, output_tensor:batch_y})

現(xiàn)在有了這個經(jīng)過訓(xùn)練的模型。為了測試它，還需要創(chuàng)建圖元素。我們將測量模型的準(zhǔn)確性，因此需要獲取預(yù)測值的索引和正確值的索引(因為我們使用的是獨熱編碼)，檢查它們是否相等，并計算所有測試數(shù)據(jù)集的平均值：

# Test model 
 
index_prediction = tf.argmax(prediction, 1) 
 
index_correct = tf.argmax(output_tensor, 1) 
 
correct_prediction = tf.equal(index_prediction, index_correct) 
 
# Calculate accuracy 
 
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) 
 
total_test_data = len(newsgroups_test.target) 
 
batch_x_test,batch_y_test = get_batch(newsgroups_test,0,total_test_data) 
 
print("Accuracy:", accuracy.eval({input_tensor: batch_x_test, output_tensor: batch_y_test})) 
 
Epoch: 0001 loss= 1133.908114347 
 
Epoch: 0002 loss= 329.093700409 
 
Epoch: 0003 loss= 111.876660109 
 
Epoch: 0004 loss= 72.552971845 
 
Epoch: 0005 loss= 16.673050320 
 
Epoch: 0006 loss= 16.481995190 
 
Epoch: 0007 loss= 4.848220565 
 
Epoch: 0008 loss= 0.759822878 
 
Epoch: 0009 loss= 0.000000000 
 
Epoch: 0010 loss= 0.079848485 
 
Optimization Finished! 
 
Accuracy: 0.75

就是這樣!你使用神經(jīng)網(wǎng)絡(luò)創(chuàng)建了一個模型來將文本分類到不同的類別中。恭喜!

可在這里(https://github.com/dmesquita/understanding_tensorflow_nn) 看到包含最終代碼的筆記本。

提示：修改我們定義的值，以查看更改如何影響訓(xùn)練時間和模型精度。

還有其他問題或建議?留下你們的評論。謝謝閱讀!

責(zé)任編輯：龐桂玉來源： Python開發(fā)者

機(jī)器學(xué)習(xí)神經(jīng)網(wǎng)絡(luò)TensorFlow

偷偷摘套内射激情视频,久久精品99国产国产精,中文字幕无线乱码人妻,中文在线中文a,性爽19p

圖解機(jī)器學(xué)習(xí)：神經(jīng)網(wǎng)絡(luò)和TensorFlow的文本分類