偷偷摘套内射激情视频,久久精品99国产国产精,中文字幕无线乱码人妻,中文在线中文a,性爽19p

終于把 Seq2Seq 算法搞懂了??!

人工智能
編碼器是一個(gè)循環(huán)神經(jīng)網(wǎng)絡(luò)(RNN)或其變體,如LSTM或GRU,用于接收輸入序列并將其轉(zhuǎn)換為一個(gè)固定大小的上下文向量。

Seq2Seq(Sequence-to-Sequence)模型是一種用于處理序列數(shù)據(jù)的神經(jīng)網(wǎng)絡(luò)架構(gòu),廣泛應(yīng)用于自然語(yǔ)言處理(NLP)任務(wù),如機(jī)器翻譯、文本生成、對(duì)話系統(tǒng)等。

它通過(guò)編碼器-解碼器架構(gòu)將輸入序列(如一個(gè)句子)映射到輸出序列(另一個(gè)句子或序列)。

圖片圖片

模型結(jié)構(gòu)

Seq2Seq 模型由兩個(gè)主要部分組成。

編碼器(Encoder)

編碼器是一個(gè)循環(huán)神經(jīng)網(wǎng)絡(luò)(RNN)或其變體,如LSTM或GRU,用于接收輸入序列并將其轉(zhuǎn)換為一個(gè)固定大小的上下文向量。

編碼器逐步處理輸入序列的每個(gè)時(shí)間步,通過(guò)隱藏層狀態(tài)不斷更新輸入信息的表示,直到編碼到達(dá)輸入序列的結(jié)尾。

這一過(guò)程的最后一個(gè)隱藏狀態(tài)通常被認(rèn)為是整個(gè)輸入序列的摘要,傳遞給解碼器。

圖片圖片

class Encoder(nn.Module):
    def __init__(self,input_dim,embedding_dim,hidden_size,num_layers,dropout):
        super(Encoder,self).__init__()
        #note hidden size and num layers 
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        #create a dropout layer
        self.dropout = nn.Dropout(dropout)
        #embedding to convert input token into dense vectors
        self.embedding = nn.Embedding(input_dim,embedding_dim)
        #bilstm layer 
        self.lstm = nn.LSTM(embedding_dim,hidden_size,num_layers=num_layers,bidirectinotallow=True,dropout=dropout)
    def forward(self,src):
        embedded = self.dropout(self.embedding(src))
        out,(hidden,cell) = self.lstm(embedded)
        return hidden,cell

解碼器(Decoder)

解碼器也是一個(gè)RNN網(wǎng)絡(luò),接受編碼器輸出的上下文向量,并生成目標(biāo)序列。

解碼器在每一步會(huì)生成一個(gè)輸出,并將上一步的輸出作為下一步的輸入,直到產(chǎn)生特定的終止符。

解碼器的初始狀態(tài)來(lái)自編碼器的最后一個(gè)隱藏狀態(tài),因此可以理解為解碼器是基于編碼器生成的全局信息來(lái)預(yù)測(cè)輸出序列。

圖片

class Decoder(nn.Module):
    def __init__(self,output_dim,embedding_dim,hidden_size,num_layers,dropout):
        super(Decoder,self).__init__()
        self.output_dim = output_dim
        #note hidden size and num layers for seq2seq class
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.dropout = nn.Dropout(dropout)
        #note inputs of embedding layer
        self.embedding = nn.Embedding(output_dim,embedding_dim)
        self.lstm = nn.LSTM(embedding_dim,hidden_size,num_layers=num_layers,bidirectinotallow=True,dropout=dropout)
        #we apply softmax over target vocab size
        self.fc = nn.Linear(hidden_size*2,output_dim)
    def forward(self,input_token,hidden,cell):
        #adjust dimensions of input token
        input_token = input_token.unsqueeze(0)
        emb = self.embedding(input_token)
        emb = self.dropout(emb)
        #note hidden and cell along with output
        out,(hidden,cell) = self.lstm(emb,(hidden,cell))
        out = out.squeeze(0)
        pred = self.fc(out)
        return pred,hidden,cell

工作流程

Seq2Seq 模型的基本工作流程如下

  1. 輸入處理
    將輸入序列(如源語(yǔ)言句子)逐步傳入編碼器的 RNN 層,編碼器的最后一層的隱藏狀態(tài)會(huì)保留輸入序列的上下文信息。
  2. 生成上下文向量
    編碼器輸出的隱藏狀態(tài)向量(通常是最后一個(gè)隱藏狀態(tài))稱為上下文向量,它包含了輸入序列的信息。
  3. 解碼過(guò)程
    解碼器接收上下文向量作為初始狀態(tài),然后通過(guò)自身的RNN結(jié)構(gòu)逐步生成目標(biāo)序列。
    每一步解碼器生成一個(gè)輸出token,并將其作為下一步的輸入,直到生成結(jié)束token。
  4. 序列生成
    解碼器生成的序列作為模型的最終輸出。

優(yōu)缺點(diǎn)

優(yōu)點(diǎn)

  • 通用性強(qiáng)
    Seq2Seq 模型可以處理可變長(zhǎng)度的輸入和輸出序列,適用于許多任務(wù),例如機(jī)器翻譯、文本摘要、對(duì)話生成、語(yǔ)音識(shí)別等。
    它的編碼器-解碼器結(jié)構(gòu)使得輸入和輸出不必同長(zhǎng),具有高度的靈活性。
  • 適應(yīng)復(fù)雜序列任務(wù)
    Seq2Seq 模型通過(guò)編碼器-解碼器的分離,能夠更好地學(xué)習(xí)序列映射關(guān)系。
    編碼器負(fù)責(zé)捕獲輸入序列的信息,而解碼器則生成符合輸出序列特征的內(nèi)容。

缺點(diǎn)

  • 信息壓縮損失
    傳統(tǒng) Seq2Seq 模型通過(guò)編碼器最后一個(gè)隱藏狀態(tài)來(lái)表示整個(gè)輸入序列信息,當(dāng)輸入序列較長(zhǎng)時(shí),這種單一的上下文向量難以全面表示輸入內(nèi)容,導(dǎo)致信息丟失。這會(huì)導(dǎo)致模型在長(zhǎng)序列任務(wù)上表現(xiàn)欠佳。
  • 對(duì)長(zhǎng)序列敏感
    在沒(méi)有注意力機(jī)制的情況下,Seq2Seq模型難以有效處理長(zhǎng)序列,因?yàn)榻獯a器需要依賴于編碼器的固定向量,而這個(gè)向量可能無(wú)法完全涵蓋長(zhǎng)序列的細(xì)節(jié)。
  • 訓(xùn)練難度大
    Seq2Seq 模型在訓(xùn)練時(shí)面臨梯度消失和爆炸的問(wèn)題,尤其是在長(zhǎng)序列的情況下。

案例分享

下面是一個(gè)使用 Seq2Seq 進(jìn)行機(jī)器翻譯的示例代碼。

首先,我們從 HuggingFace 導(dǎo)入了數(shù)據(jù)集,并將其分為訓(xùn)練集和測(cè)試集

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader,Dataset
import tqdm,datasets
from torchtext.vocab import build_vocab_from_iterator
from torch.nn.utils.rnn import pad_sequence
import spacy

dataset = datasets.load_dataset('bentrevett/multi30k')
train_data,val_data,test_data = dataset['train'],dataset['validation'],dataset['test']

加載源語(yǔ)言和目標(biāo)語(yǔ)言的 spaCy 模型。

spaCy 是一個(gè)功能強(qiáng)大、可用于生產(chǎn)的 Python 高級(jí)自然語(yǔ)言處理庫(kù)。

與許多其他 NLP 庫(kù)不同,spaCy 專為實(shí)際使用而設(shè)計(jì),而非研究實(shí)驗(yàn)。

它擅長(zhǎng)使用預(yù)先訓(xùn)練的模型進(jìn)行高效的文本處理,可完成標(biāo)記化、詞性標(biāo)記、命名實(shí)體識(shí)別和依賴性解析等任務(wù)。

en_nlp = spacy.load('en_core_web_sm')
de_nlp = spacy.load('de_core_news_sm')

#tokenizer
def sample_tokenizer(sample,en_nlp,de_nlp,lower,max_length,sos_token,eos_token):
    en_tokens = [token.text for token in en_nlp.tokenizer(sample["en"])][:max_length]
    de_tokens = [token.text for token in de_nlp.tokenizer(sample["de"])][:max_length]
    if lower == True:
        en_tokens = [token.lower() for token in en_tokens]
        de_tokens = [token.lower() for token in de_tokens]
    en_tokens = [sos_token] + en_tokens + [eos_token]
    de_tokens = [sos_token] + de_tokens + [eos_token]
    return {"en_tokens":en_tokens,"de_tokens":de_tokens}

fn_kwargs = {
    "en_nlp":en_nlp,
    "de_nlp":de_nlp,
    "lower":True,
    "max_length":1000,
    "sos_token":'<sos>',
    "eos_token":'<eos>'
}
train_data = train_data.map(sample_tokenizer,fn_kwargs=fn_kwargs)
val_data = val_data.map(sample_tokenizer,fn_kwargs=fn_kwargs)
test_data = test_data.map(sample_tokenizer,fn_kwargs=fn_kwargs)

min_freq = 2
specials = ['<unk>','<pad>','<sos>','<eos>']
en_vocab = build_vocab_from_iterator(train_data['en_tokens'],specials=specials,min_freq=min_freq)
de_vocab = build_vocab_from_iterator(train_data['de_tokens'],specials=specials,min_freq=min_freq)

assert en_vocab['<unk>'] == de_vocab['<unk>']
assert en_vocab['<pad>'] == de_vocab['<pad>']

unk_index = en_vocab['<unk>']
pad_index = en_vocab['<pad>']
en_vocab.set_default_index(unk_index)
de_vocab.set_default_index(unk_index)

def sample_num(sample,en_vocab,de_vocab):
    en_ids = en_vocab.lookup_indices(sample["en_tokens"])
    de_ids = de_vocab.lookup_indices(sample["de_tokens"])
    return {"en_ids":en_ids,"de_ids":de_ids}
    
fn_kwargs = {"en_vocab":en_vocab,"de_vocab":de_vocab}
train_data = train_data.map(sample_num,fn_kwargs=fn_kwargs)
val_data = val_data.map(sample_num,fn_kwargs=fn_kwargs)
test_data = test_data.map(sample_num,fn_kwargs=fn_kwargs)

train_data = train_data.with_format(type="torch",columns=['en_ids','de_ids'],output_all_columns=True)
val_data = val_data.with_format(type="torch",columns=['en_ids','de_ids'],output_all_columns=True)
test_data = test_data.with_format(type="torch",columns=['en_ids','de_ids'],output_all_columns=True)

def get_collate_fn(pad_index):
    def collate_fn(batch):
        batch_en_ids = [sample["en_ids"] for sample in batch]
        batch_de_ids = [sample["de_ids"] for sample in batch]
        batch_en_ids = pad_sequence(batch_en_ids,padding_value=pad_index)
        batch_de_ids = pad_sequence(batch_de_ids,padding_value=pad_index)
        batch = {"en_ids":batch_en_ids,"de_ids":batch_de_ids}
        return batch
    return collate_fn
def get_dataloader(dataset,batch_size,shuffle,pad_index):
    collate_fn = get_collate_fn(pad_index)
    dataloader = DataLoader(dataset=dataset,
                            batch_size=batch_size,
                            shuffle=shuffle,
                           collate_fn=collate_fn)
    return dataloader

train_loader = get_dataloader(train_data,batch_size=512,shuffle=True,pad_index=pad_index)
val_loader = get_dataloader(val_data,batch_size=512,shuffle=True,pad_index=pad_index)
test_loader = get_dataloader(test_data,batch_size=512,shuffle=True,pad_index=pad_index)

接下來(lái)構(gòu)建 seq2seq 模型。

class Seq2Seq(nn.Module):
    def __init__(self,encoder,decoder,device):
        super(Seq2Seq,self).__init__()
        self.encoder = encoder
        self.decoder = decoder
        assert encoder.num_layers == decoder.num_layers
        assert encoder.hidden_size == decoder.hidden_size
    def forward(self,src,trg,teacher_forcing_ratio):
        #exctract dim for out vector
        trg_len = trg.shape[0]
        batch_size = trg.shape[1]
        vocab_size = self.decoder.output_dim
        outputs = torch.zeros(trg_len,batch_size,vocab_size).to(device)
        #get input and hidden
        input_token = trg[0,:]
        hidden,cell = self.encoder(src)
        for t in range(1,trg_len):
            out,hidden,cell = self.decoder(input_token,hidden,cell)
            outputs[t] = out
            #decide what passes as input
            top1 = out.argmax(1)
            teacher_force = np.random.randn()<teacher_forcing_ratio
            input_token = trg[t] if teacher_force else top1
        return outputs

input_dim = len(de_vocab)
output_dim = len(en_vocab)
encoder_embedding_dim = 256
decoder_embedding_dim = 256
hidden_size = 512
num_layers = 3
encoder_dropout = 0.5
decoder_dropout = 0.5
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

encoder = Encoder(
    input_dim,
    encoder_embedding_dim,
    hidden_size=hidden_size,
    num_layers=num_layers,
    dropout=encoder_dropout,
)

decoder = Decoder(
    output_dim,
    decoder_embedding_dim,
    hidden_size=hidden_size,
    num_layers=num_layers,
    dropout=decoder_dropout,
)

model = Seq2Seq(encoder, decoder, device).to(device)

模型訓(xùn)練

optimizer = torch.optim.Adam(model.parameters())
criterion = nn.CrossEntropyLoss(ignore_index=pad_index)

def train_fn(
    model, data_loader, optimizer, criterion, clip, teacher_forcing_ratio, device):
    model.train()
    epoch_loss = 0
    for i, batch in enumerate(data_loader):
        src = batch["de_ids"].to(device)
        trg = batch["en_ids"].to(device)
        # src = [src length, batch size]
        # trg = [trg length, batch size]
        optimizer.zero_grad()
        output = model(src, trg, teacher_forcing_ratio)
        # output = [trg length, batch size, trg vocab size]
        output_dim = output.shape[-1]
        output = output[1:].view(-1, output_dim)
        # output = [(trg length - 1) * batch size, trg vocab size]
        trg = trg[1:].view(-1)
        # trg = [(trg length - 1) * batch size]
        loss = criterion(output, trg)
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), clip)
        optimizer.step()
        epoch_loss += loss.item()
    return epoch_loss / len(data_loader)
    
def evaluate_fn(model, data_loader, criterion, device):
    model.eval()
    epoch_loss = 0
    with torch.no_grad():
        for i, batch in enumerate(data_loader):
            src = batch["de_ids"].to(device)
            trg = batch["en_ids"].to(device)
            # src = [src length, batch size]
            # trg = [trg length, batch size]
            output = model(src, trg, 0)  # turn off teacher forcing
            # output = [trg length, batch size, trg vocab size]
            output_dim = output.shape[-1]
            output = output[1:].view(-1, output_dim)
            # output = [(trg length - 1) * batch size, trg vocab size]
            trg = trg[1:].view(-1)
            # trg = [(trg length - 1) * batch size]
            loss = criterion(output, trg)
            epoch_loss += loss.item()
    return epoch_loss / len(data_loader)

n_epochs = 10
clip = 1.0
teacher_forcing_ratio = 1

best_valid_loss = float("inf")

for epoch in tqdm.tqdm(range(n_epochs)):
    train_loss = train_fn(
        model,
        train_loader,
        optimizer,
        criterion,
        clip,
        teacher_forcing_ratio,
        device,
    )
    valid_loss = evaluate_fn(
        model,
        val_loader,
        criterion,
        device,
    )
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), "tut1-model.pt")
    print(f"\tTrain Loss: {train_loss:7.3f} | Train PPL: {np.exp(train_loss):7.3f}")
    print(f"\tValid Loss: {valid_loss:7.3f} | Valid PPL: {np.exp(valid_loss):7.3f}")

model.load_state_dict(torch.load("tut1-model.pt"))

test_loss = evaluate_fn(model, test_loader, criterion, device)

print(f"| Test Loss: {test_loss:.3f} | Test PPL: {np.exp(test_loss):7.3f} |")

接下來(lái),我們看一下最終的效果

def translate_sentence(
    sentence,
    model,
    en_nlp,
    de_nlp,
    en_vocab,
    de_vocab,
    lower,
    sos_token,
    eos_token,
    device,
    max_output_length=25,
):
    model.eval()
    with torch.no_grad():
        if isinstance(sentence, str):
            tokens = [token.text for token in de_nlp.tokenizer(sentence)]
        else:
            tokens = [token for token in sentence]
        if lower:
            tokens = [token.lower() for token in tokens]
        tokens = [sos_token] + tokens + [eos_token]
        ids = de_vocab.lookup_indices(tokens)
        tensor = torch.LongTensor(ids).unsqueeze(-1).to(device)
        hidden, cell = model.encoder(tensor)
        inputs = en_vocab.lookup_indices([sos_token])
        for _ in range(max_output_length):
            inputs_tensor = torch.LongTensor([inputs[-1]]).to(device)
            output, hidden, cell = model.decoder(inputs_tensor, hidden, cell)
            predicted_token = output.argmax(-1).item()
            inputs.append(predicted_token)
            if predicted_token == en_vocab[eos_token]:
                break
        tokens = en_vocab.lookup_tokens(inputs)
    return tokens

sentence ='Der Mann ist am Weisheitsspross'
sos_token='<sos>'
eos_token='<eos>'
lower=True
translation = translate_sentence(
    sentence,
    model,
    en_nlp,
    de_nlp,
    en_vocab,
    de_vocab,
    lower,
    sos_token,
    eos_token,
    device,
)
print(translation)
#['<sos>', 'the', 'woman', 'is', 'looking', 'at', 'the', 'camera', '.', '<eos>']

圖片

責(zé)任編輯:武曉燕 來(lái)源: 程序員學(xué)長(zhǎng)
相關(guān)推薦

2024-10-16 07:58:48

2024-12-03 08:16:57

2024-09-23 09:12:20

2024-09-12 08:28:32

2024-10-17 13:05:35

神經(jīng)網(wǎng)絡(luò)算法機(jī)器學(xué)習(xí)深度學(xué)習(xí)

2024-10-05 23:00:35

2024-10-28 00:38:10

2024-11-15 13:20:02

2025-02-21 08:29:07

2024-12-12 00:29:03

2024-09-20 07:36:12

2024-07-17 09:32:19

2021-05-06 16:06:20

Google AI技術(shù)

2024-11-28 12:37:07

2023-12-19 17:41:38

AI模型

2023-12-18 14:05:39

Facebook評(píng)議

2024-08-01 08:41:08

2024-11-05 12:56:06

機(jī)器學(xué)習(xí)函數(shù)MSE

2025-02-17 13:09:59

深度學(xué)習(xí)模型壓縮量化

2024-08-23 09:06:35

機(jī)器學(xué)習(xí)混淆矩陣預(yù)測(cè)
點(diǎn)贊
收藏

51CTO技術(shù)棧公眾號(hào)