偷偷摘套内射激情视频,久久精品99国产国产精,中文字幕无线乱码人妻,中文在线中文a,性爽19p

51CTO首頁(yè)

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開(kāi)發(fā)者社區(qū)

信創(chuàng)認(rèn)證

公眾號(hào)矩陣

移動(dòng)端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考信創(chuàng)認(rèn)證華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫(kù)

在線學(xué)習(xí)

文章資源問(wèn)答課堂專欄直播

51CTO

鴻蒙開(kāi)發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營(yíng)

鴻蒙開(kāi)發(fā)者社區(qū)訂閱號(hào)

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開(kāi)發(fā)者社區(qū)視頻號(hào)

51CTO軟考題庫(kù)

賬號(hào)設(shè)置退出

使用機(jī)器學(xué)習(xí)生成圖像描述

作者：deephub 2021-04-25 16:21:32

人工智能機(jī)器學(xué)習(xí)

圖像描述是為圖像提供適當(dāng)文字描述的過(guò)程。作為人類，這似乎是一件容易的任務(wù)，即使是五歲的孩子也可以輕松完成，但是我們?nèi)绾尉帉?xiě)一個(gè)將輸入作為圖像并生成標(biāo)題作為輸出的計(jì)算機(jī)程序呢？

在深度神經(jīng)網(wǎng)絡(luò)的最新發(fā)展之前，業(yè)內(nèi)最聰明的人都無(wú)法解決這個(gè)問(wèn)題，但是在深度神經(jīng)網(wǎng)絡(luò)問(wèn)世之后，考慮到我們擁有所需的數(shù)據(jù)集，這樣做是完全有可能的。

例如，網(wǎng)絡(luò)模型可以生成與下圖相關(guān)的以下任何標(biāo)題，即“A white dog in a grassy area”，“white dog with brown spots”甚至“A dog on grass and some pink flowers ”。

數(shù)據(jù)集

我們選擇的數(shù)據(jù)集為“ Flickr 8k”。我們之所以選擇此數(shù)據(jù)，是因?yàn)樗子谠L問(wèn)且具有可以在普通PC上進(jìn)行訓(xùn)練的完美大小，也足夠訓(xùn)練網(wǎng)絡(luò)生成適當(dāng)?shù)臉?biāo)題。數(shù)據(jù)分為三組，主要是包含6k圖像的訓(xùn)練集，包含1k圖像的開(kāi)發(fā)集和包含1k圖像的測(cè)試集。每個(gè)圖像包含5個(gè)標(biāo)題。示例之一如下：

使用機(jī)器學(xué)習(xí)生成圖像描述

A child in a pink dress is climbing up a set of stairs in an entryway.

A girl going into a wooden building.

A little girl climbing into a wooden playhouse.

A little girl climbing the stairs to her playhouse.

A little girl in a pink dress going into a wooden cabin.

數(shù)據(jù)清理

任何機(jī)器學(xué)習(xí)程序的第一步也是最重要的一步是清理數(shù)據(jù)并清除所有不需要的數(shù)據(jù)。在處理標(biāo)題中的文本數(shù)據(jù)時(shí)，我們將執(zhí)行基本的清理步驟，例如將計(jì)算機(jī)中的所有字母都轉(zhuǎn)換為小寫(xiě)字母“ Hey”和“ hey”是兩個(gè)完全不同的單詞，刪除特殊標(biāo)記和標(biāo)點(diǎn)符號(hào)，例如*， (，£，$，%等)，并消除所有包含數(shù)字的單詞。

我們首先為數(shù)據(jù)集中的所有唯一內(nèi)容創(chuàng)建詞匯表，即8000(圖片數(shù)量)* 5(每個(gè)圖像的標(biāo)題)= 40000標(biāo)題。我們發(fā)現(xiàn)它等于8763。但是這些詞中的大多數(shù)只出現(xiàn)了1到2次，我們不希望它們出現(xiàn)在我們的模型中，因?yàn)檫@不會(huì)使我們的模型對(duì)異常值具有魯棒性。因此，我們將詞匯中包含的單詞的最少出現(xiàn)次數(shù)設(shè)置為10個(gè)閾值，該閾值等于1652個(gè)唯一單詞。

我們要做的另一件事是在每個(gè)描述中添加兩個(gè)標(biāo)記，以指示字幕的開(kāi)始和結(jié)束。這兩個(gè)標(biāo)記分別是“ startseq”和“ endseq”，分別表示字幕的開(kāi)始和結(jié)尾。

首先，導(dǎo)入所有必需的庫(kù)：

import numpy as np   
from numpy import array   
import pandas as pd   
import matplotlib.pyplot as plt   
import string   
import os   
from PIL import Image   
import glob   
import pickle   
from time import time   
from keras.preprocessing import sequence   
from keras.models import Sequential   
from keras.layers import LSTM, Embedding, Dense, Flatten, Reshape, concatenate, Dropout   
from keras.optimizers import Adam   
from keras.layers.merge import add   
from keras.applications.inception_v3 import InceptionV3   
from keras.preprocessing import image   
from keras.models import Model   
from keras import Input, layers   
from keras.applications.inception_v3 import preprocess_input   
from keras.preprocessing.sequence import pad_sequences   
from keras.utils import to_categorical

讓我們定義一些輔助函數(shù)：

# load descriptions   
def load_doc(filename):   
file = open(filename, 'r')   
text = file.read()   
file.close()   
return text   
  
  
def load_descriptions(doc):   
mapping = dict()   
for line in doc.split('\n'):   
tokens = line.split()   
if len(line) < 2:   
continue   
image_id, image_desc = tokens[0], tokens[1:]   
image_id = image_id.split('.')[0]   
image_desc = ' '.join(image_desc)   
if image_id not in mapping:   
mapping[image_id] = list()   
mapping[image_id].append(image_desc)   
return mapping   
  
def clean_descriptions(descriptions):   
table = str.maketrans('', '', string.punctuation)   
for key, desc_list in descriptions.items():   
for i in range(len(desc_list)):   
desc = desc_list[i]   
desc = desc.split()   
desc = [word.lower() for word in desc]   
desc = [w.translate(table) for w in desc]   
desc = [word for word in desc if len(word)>1]   
desc = [word for word in desc if word.isalpha()]   
desc_list[i] = ' '.join(desc)   
  
return descriptions   
  
# save descriptions to file, one per line   
def save_descriptions(descriptions, filename):   
lines = list()   
for key, desc_list in descriptions.items():   
for desc in desc_list:   
lines.append(key + ' ' + desc)   
data = '\n'.join(lines)   
file = open(filename, 'w')   
file.write(data)   
file.close()   
  
  
# load clean descriptions into memory   
def load_clean_descriptions(filename, dataset):   
doc = load_doc(filename)   
descriptions = dict()   
for line in doc.split('\n'):   
tokens = line.split()   
image_id, image_desc = tokens[0], tokens[1:]   
if image_id in dataset:   
if image_id not in descriptions:   
descriptions[image_id] = list()   
desc = 'startseq ' + ' '.join(image_desc) + ' endseq'   
descriptions[image_id].append(desc)   
return descriptions   
  
def load_set(filename):   
doc = load_doc(filename)   
dataset = list()   
for line in doc.split('\n'):   
if len(line) < 1:   
continue   
identifier = line.split('.')[0]   
dataset.append(identifier)   
return set(dataset)   
  
# load training dataset   
  
  
filename = "dataset/Flickr8k_text/Flickr8k.token.txt"   
doc = load_doc(filename)   
descriptions = load_descriptions(doc)   
descriptions = clean_descriptions(descriptions)   
save_descriptions(descriptions, 'descriptions.txt')   
filename = 'dataset/Flickr8k_text/Flickr_8k.trainImages.txt'   
train = load_set(filename)   
train_descriptions = load_clean_descriptions('descriptions.txt', train)

讓我們一一解釋：

load_doc：獲取文件的路徑并返回該文件內(nèi)的內(nèi)容

load_descriptions：獲取包含描述的文件的內(nèi)容，并生成一個(gè)字典，其中以圖像id為鍵，以描述為值列表

clean_descriptions：通過(guò)將所有字母都轉(zhuǎn)換為小寫(xiě)字母，忽略數(shù)字和標(biāo)點(diǎn)符號(hào)以及僅包含一個(gè)字符的單詞來(lái)清理描述

save_descriptions：將描述字典作為文本文件保存到內(nèi)存中

load_set：從文本文件加載圖像的所有唯一標(biāo)識(shí)符

load_clean_descriptions：使用上面提取的唯一標(biāo)識(shí)符加載所有已清理的描述

數(shù)據(jù)預(yù)處理

接下來(lái)，我們對(duì)圖像和字幕進(jìn)行一些數(shù)據(jù)預(yù)處理。圖像基本上是我們的特征向量，即我們對(duì)網(wǎng)絡(luò)的輸入。因此，我們需要先將它們轉(zhuǎn)換為固定大小的向量，然后再將其傳遞到神經(jīng)網(wǎng)絡(luò)中。為此，我們使用了由Google Research [3]創(chuàng)建的Inception V3模型(卷積神經(jīng)網(wǎng)絡(luò))進(jìn)行遷移學(xué)習(xí)。該模型在'ImageNet'數(shù)據(jù)集[4]上進(jìn)行了訓(xùn)練，可以對(duì)1000張圖像進(jìn)行圖像分類，但是我們的目標(biāo)不是進(jìn)行分類，因此我們刪除了最后一個(gè)softmax層，并為每張圖像提取了2048個(gè)固定矢量，如圖所示以下：

使用機(jī)器學(xué)習(xí)生成圖像描述

標(biāo)題文字是我們模型的輸出，即我們必須預(yù)測(cè)的內(nèi)容。但是預(yù)測(cè)并不會(huì)一次全部發(fā)生，而是會(huì)逐字預(yù)測(cè)字幕。為此，我們需要將每個(gè)單詞編碼為固定大小的向量(將在下一部分中完成)。為此，我們首先需要?jiǎng)?chuàng)建兩個(gè)字典，即“單詞到索引”將每個(gè)單詞映射到一個(gè)索引(在我們的情況下為1到1652)，以及“索引到單詞”將字典將每個(gè)索引映射到其對(duì)應(yīng)的單詞字典。我們要做的最后一件事是計(jì)算在數(shù)據(jù)集中具有最大長(zhǎng)度的描述的長(zhǎng)度，以便我們可以填充所有其他內(nèi)容以保持固定長(zhǎng)度。在我們的情況下，該長(zhǎng)度等于34。

字詞嵌入

如前所述，我們將每個(gè)單詞映射到固定大小的向量(即200)中，我們將使用預(yù)訓(xùn)練的GLOVE模型。最后，我們?yōu)樵~匯表中的所有1652個(gè)單詞創(chuàng)建一個(gè)嵌入矩陣，其中為詞匯表中的每個(gè)單詞包含一個(gè)固定大小的向量。

# Create a list of all the training captions   
all_train_captions = []   
for key, val in train_descriptions.items():   
for cap in val:   
all_train_captions.append(cap)   
  
  
# Consider only words which occur at least 10 times in the corpus   
word_count_threshold = 10   
word_counts = {}   
nsents = 0   
for sent in all_train_captions:   
nsents += 1   
for w in sent.split(' '):   
word_counts[w] = word_counts.get(w, 0) + 1   
  
vocab = [w for w in word_counts if word_counts[w] >= word_count_threshold]   
print('Preprocessed words {} -> {}'.format(len(word_counts), len(vocab)))   
  
  
ixtoword = {}   
wordtoix = {}   
  
ix = 1   
for w in vocab:   
wordtoix[w] = ix   
ixtoword[ix] = w   
ix += 1   
  
vocab_size = len(ixtoword) + 1 # one for appended 0's   
  
# Load Glove vectors   
glove_dir = 'glove.6B'   
embeddings_index = {}   
f = open(os.path.join(glove_dir, 'glove.6B.200d.txt'), encoding="utf-8")   
  
for line in f:   
values = line.split()   
word = values[0]   
coefs = np.asarray(values[1:], dtype='float32')   
embeddings_index[word] = coefs   
f.close()   
  
embedding_dim = 200   
  
# Get 200-dim dense vector for each of the words in out vocabulary   
embedding_matrix = np.zeros((vocab_size, embedding_dim))   
  
for word, i in wordtoix.items():   
embedding_vector = embeddings_index.get(word)   
if embedding_vector is not None:   
embedding_matrix[i] = embedding_vector

讓我們接收下這段代碼：

第1至5行：將所有訓(xùn)練圖像的所有描述提取到一個(gè)列表中

第9-18行：僅選擇詞匯中出現(xiàn)次數(shù)超過(guò)10次的單詞

第21–30行：創(chuàng)建一個(gè)要索引的單詞和一個(gè)對(duì)單詞詞典的索引。

第33–42行：將Glove Embeddings加載到字典中，以單詞作為鍵，將vector嵌入為值

第44–52行：使用上面加載的嵌入為詞匯表中的單詞創(chuàng)建嵌入矩陣

數(shù)據(jù)準(zhǔn)備

這是該項(xiàng)目最重要的方面之一。對(duì)于圖像，我們需要使用Inception V3模型將它們轉(zhuǎn)換為固定大小的矢量，如前所述。

# Below path contains all the images   
all_images_path = 'dataset/Flickr8k_Dataset/Flicker8k_Dataset/'   
# Create a list of all image names in the directory   
all_images = glob.glob(all_images_path + '*.jpg')   
  
# Create a list of all the training and testing images with their full path names   
def create_list_of_images(file_path):   
images_names = set(open(file_path, 'r').read().strip().split('\n'))   
images = []   
  
for image in all_images:   
if image[len(all_images_path):] in image_names:   
images.append(image)   
  
return images   
  
  
train_images_path = 'dataset/Flickr8k_text/Flickr_8k.trainImages.txt'   
test_images_path = 'dataset/Flickr8k_text/Flickr_8k.testImages.txt'   
  
train_images = create_list_of_images(train_images_path)   
test_images = create_list_of_images(test_images_path)   
  
#preprocessing the images   
def preprocess(image_path):   
img = image.load_img(image_path, target_size=(299, 299))   
x = image.img_to_array(img)   
x = np.expand_dims(x, axis=0)   
x = preprocess_input(x)   
return x   
  
# Load the inception v3 model   
model = InceptionV3(weights='imagenet')   
  
# Create a new model, by removing the last layer (output layer) from the inception v3   
model_new = Model(model.input, model.layers[-2].output)   
  
# Encoding a given image into a vector of size (2048, )   
def encode(image):   
image = preprocess(image)   
fea_vec = model_new.predict(image)   
fea_vec = np.reshape(fea_vec, fea_vec.shape[1])   
return fea_vec   
  
  
encoding_train = {}   
for img in train_images:   
encoding_train[img[len(all_images_path):]] = encode(img)   
  
  
encoding_test = {}   
for img in test_images:   
encoding_test[img[len(all_images_path):]] = encode(img)   
  
# Save the bottleneck features to disk   
with open("encoded_files/encoded_train_images.pkl", "wb") as encoded_pickle:   
pickle.dump(encoding_train, encoded_pickle)   
  
with open("encoded_files/encoded_test_images.pkl", "wb") as encoded_pickle:   
pickle.dump(encoding_test, encoded_pickle)   
  
  
train_features = load(open("encoded_files/encoded_train_images.pkl", "rb"))

第1-22行：將訓(xùn)練和測(cè)試圖像的路徑加載到單獨(dú)的列表中
第25–53行：循環(huán)訓(xùn)練和測(cè)試集中的每個(gè)圖像，將它們加載為固定大小，對(duì)其進(jìn)行預(yù)處理，使用InceptionV3模型提取特征，最后對(duì)其進(jìn)行重塑。
第56–63行：將提取的特征保存到磁盤(pán)

現(xiàn)在，我們不會(huì)一次預(yù)測(cè)所有的標(biāo)題文字，因?yàn)槲覀儾恢皇菍D像提供給計(jì)算機(jī)，并要求它為其生成文字。我們要做的就是給它圖像的特征向量，以及標(biāo)題的第一個(gè)單詞，并讓它預(yù)測(cè)第二個(gè)單詞。然后我們給它給出前兩個(gè)單詞，并讓它預(yù)測(cè)第三個(gè)單詞。讓我們考慮數(shù)據(jù)集部分中給出的圖像和標(biāo)題“一個(gè)女孩正在進(jìn)入木結(jié)構(gòu)建筑”。在這種情況下，在添加令牌“ startseq”和“ endseq”之后，以下分別是我們的輸入(Xi)和輸出(Yi)。

使用機(jī)器學(xué)習(xí)生成圖像描述

此后，我們將使用我們創(chuàng)建的“索引”字典來(lái)更改輸入和輸出中的每個(gè)詞以映射索引。在進(jìn)行批處理時(shí)，我們希望所有序列的長(zhǎng)度均等，這就是為什么要在每個(gè)序列后附加0直到它們成為最大長(zhǎng)度(如上所述計(jì)算為34)的原因。正如人們所看到的那樣，這是大量的數(shù)據(jù)，將其立即加載到內(nèi)存中是根本不可行的，為此，我們將使用一個(gè)數(shù)據(jù)生成器將其加載到小塊中降低是用的內(nèi)存。

# data generator, intended to be used in a call to model.fit_generator()   
def data_generator(descriptions, photos, wordtoix, max_length, num_photos_per_batch):   
X1, X2, y = list(), list(), list()   
n=0   
# loop for ever over images   
while 1:   
for key, desc_list in descriptions.items():   
n+=1   
# retrieve the photo feature   
photo = photos[key+'.jpg']   
for desc in desc_list:   
# encode the sequence   
seq = [wordtoix[word] for word in desc.split(' ') if word in wordtoix]   
# split one sequence into multiple X, y pairs   
for i in range(1, len(seq)):   
# split into input and output pair   
in_seq, out_seq = seq[:i], seq[i]   
# pad input sequence   
in_seq = pad_sequences([in_seq], maxlen=max_length)[0]   
# encode output sequence   
out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]   
# store   
X1.append(photo)   
X2.append(in_seq)   
y.append(out_seq)   
# yield the batch data   
if n==num_photos_per_batch:   
yield [[array(X1), array(X2)], array(y)]   
X1, X2, y = list(), list(), list()   
n=0

上面的代碼遍歷所有圖像和描述，并生成表中的數(shù)據(jù)項(xiàng)。 yield將使函數(shù)再次從同一行運(yùn)行，因此，讓我們分批加載數(shù)據(jù)

模型架構(gòu)和訓(xùn)練

如前所述，我們的模型在每個(gè)點(diǎn)都有兩個(gè)輸入，一個(gè)輸入特征圖像矢量，另一個(gè)輸入部分文字。我們首先將0.5的Dropout應(yīng)用于圖像矢量，然后將其與256個(gè)神經(jīng)元層連接。對(duì)于部分文字，我們首先將其連接到嵌入層，并使用如上所述經(jīng)過(guò)GLOVE訓(xùn)練的嵌入矩陣的權(quán)重。然后，我們應(yīng)用Dropout 0.5和LSTM(長(zhǎng)期短期記憶)。最后，我們將這兩種方法結(jié)合在一起，并將它們連接到256個(gè)神經(jīng)元層，最后是一個(gè)softmax層，該層預(yù)測(cè)我們?cè)~匯中每個(gè)單詞的概率。可以使用下圖概括高級(jí)體系結(jié)構(gòu)：

使用機(jī)器學(xué)習(xí)生成圖像描述

以下是訓(xùn)練期間選擇的超參數(shù)：損失被選擇為“categorical-loss entropy”，優(yōu)化器為“Adam”。該模型總共訓(xùn)練了30輪，但對(duì)于前20輪，批次大小和學(xué)習(xí)率分別為0.001和3，而接下來(lái)的10輪分別為0.0001和6。

inputs1 = Input(shape=(2048,))   
fe1 = Dropout(0.5)(inputs1)   
fe2 = Dense(256, activation='relu')(fe1)   
inputs2 = Input(shape=(max_length1,))   
se1 = Embedding(vocab_size, embedding_dim, mask_zero=True)(inputs2)   
se2 = Dropout(0.5)(se1)   
se3 = LSTM(256)(se2)   
decoder1 = add([fe2, se3])   
decoder2 = Dense(256, activation='relu')(decoder1)   
outputs = Dense(vocab_size, activation='softmax')(decoder2)   
model = Model(inputs=[inputs1, inputs2], outputs=outputs)   
  
model.layers[2].set_weights([embedding_matrix])   
model.layers[2].trainable = False   
  
model.compile(loss='categorical_crossentropy', optimizer='adam')   
  
epochs = 20   
number_pics_per_batch = 3   
steps = len(train_descriptions)//number_pics_per_batch   
  
generator = data_generator(train_descriptions, train_features, wordtoix, max_length1, number_pics_per_batch)   
history = model.fit_generator(generator, epochs=20, steps_per_epoch=steps, verbose=1)   
  
  
model.optimizer.lr = 0.0001   
epochs = 10   
number_pics_per_batch = 6   
steps = len(train_descriptions)//number_pics_per_batch   
  
generator = data_generator(train_descriptions, train_features, wordtoix, max_length1, number_pics_per_batch)   
history1 = model.fit_generator(generator, epochs=10, steps_per_epoch=steps, verbose=1)   
model.save('saved_model/model_' + str(30) + '.h5')

讓我們來(lái)解釋一下代碼：

第1-11行：定義模型架構(gòu)

第13–14行：將嵌入層的權(quán)重設(shè)置為上面創(chuàng)建的嵌入矩陣，并且還設(shè)置trainable = False，因此該層將不再受任何訓(xùn)練

第16–33行：如上所述，使用超參數(shù)在兩個(gè)單獨(dú)的間隔中訓(xùn)練模型

推理

下面顯示了前20輪的訓(xùn)練損失，然后是接下來(lái)的10輪的訓(xùn)練損失：

使用機(jī)器學(xué)習(xí)生成圖像描述

為了進(jìn)行推斷，我們編寫(xiě)了一個(gè)函數(shù)，該函數(shù)根據(jù)我們的模型(即貪心)將下一個(gè)單詞預(yù)測(cè)為具有最大概率的單詞

def greedySearch(photo):   
in_text = 'startseq'   
for i in range(max_length1):   
sequence = [wordtoix[w] for w in in_text.split() if w in wordtoix]   
sequence = pad_sequences([sequence], maxlen=max_length1)   
yhat = model.predict([photo,sequence], verbose=0)   
yhat = np.argmax(yhat)   
word = ixtoword[yhat]   
in_text += ' ' + word   
if word == 'endseq':   
break   
final = in_text.split()   
final = final[1:-1]   
final = ' '.join(final)   
return final   
  
z=1   
pic = list(encoding_test.keys())[999]   
image = encoding_test[pic].reshape((1,2048))   
x=plt.imread(images+pic)   
plt.imshow(x)   
plt.show()   
print("Greedy:",greedySearch(image))

使用機(jī)器學(xué)習(xí)生成圖像描述

效果還不錯(cuò)

責(zé)任編輯：華軒來(lái)源：今日頭條

機(jī)器學(xué)習(xí)圖像程序

點(diǎn)贊

51CTO技術(shù)棧公眾號(hào)

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開(kāi)發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營(yíng)

<style id="8lh6h"></style>

<cite id="8lh6h"></cite>