偷偷摘套内射激情视频,久久精品99国产国产精,中文字幕无线乱码人妻,中文在线中文a,性爽19p

<blockquote id="wbeob"><p id="wbeob"></p></blockquote>

<sub id="wbeob"><p id="wbeob"></p></sub>

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

信創(chuàng)認(rèn)證

公眾號矩陣

移動端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考信創(chuàng)認(rèn)證華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營

鴻蒙開發(fā)者社區(qū)訂閱號

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號

51CTO軟考題庫

賬號設(shè)置退出

終于把 Transformer 中的注意力機(jī)制搞懂了?。?！

作者：程序員小寒 2024-10-31 10:00:39

想象一下，當(dāng)你讀到 “The cat sat on the mat” 這句話時，人類可以立即理解單詞之間的關(guān)系，可以知道 “sat” 與 “cat” 的關(guān)系比與“mat”的關(guān)系更密切。

大家好，我是小寒

注意力機(jī)制是深度學(xué)習(xí)領(lǐng)域中廣泛應(yīng)用的技術(shù)，特別是在自然語言處理和計(jì)算機(jī)視覺任務(wù)中。它使模型能夠有選擇地關(guān)注輸入數(shù)據(jù)的特定部分，以此提升模型的性能。

想象一下，當(dāng)你讀到 “The cat sat on the mat” 這句話時，人類可以立即理解單詞之間的關(guān)系，可以知道 “sat” 與 “cat” 的關(guān)系比與“mat”的關(guān)系更密切。

注意力機(jī)制使機(jī)器能夠捕捉類似的關(guān)系，幫助它們專注于輸入數(shù)據(jù)的特定部分。

Transformer 中的注意力機(jī)制

在 Transformer 模型中，注意力機(jī)制是其核心組件，它使得模型可以在處理輸入序列的過程中關(guān)注到最重要的信息，從而大幅提高了模型在長序列中的表現(xiàn)。

自注意力機(jī)制

在自注意力機(jī)制中，每個輸入向量可以“關(guān)注”同一序列中的其他向量，這使得模型能夠靈活地關(guān)注整個序列的不同部分。

圖片

下面，我們一起來看一下如何使用代碼來實(shí)現(xiàn)上述過程。

import numpy as np

word_embeddings = {
    'she':    np.array([0.2, 0.9, 0.1, 0.5]),
    'likes':  np.array([0.8, 0.3, 0.7, 0.2]),
    'coffee': np.array([0.4, 0.6, 0.3, 0.9])
}

X = np.vstack([word_embeddings['she'], 
               word_embeddings['likes'], 
               word_embeddings['coffee']])
               
W_q = np.array([[0.9, 0.1, 0.1, 0.1],
                [0.1, 0.9, 0.1, 0.1],
                [0.1, 0.1, 0.9, 0.1],
                [0.1, 0.1, 0.1, 0.9]])

W_k = np.array([[0.9, 0.1, 0.1, 0.1],
                [0.1, 0.9, 0.1, 0.1],
                [0.1, 0.1, 0.9, 0.1],
                [0.1, 0.1, 0.1, 0.9]])
W_v = np.array([[0.8, 0.2, 0.1, 0.1],
                [0.2, 0.8, 0.2, 0.1],
                [0.1, 0.2, 0.8, 0.1],
                [0.1, 0.1, 0.1, 0.9]])
                
Q = np.dot(X, W_q)
K = np.dot(X, W_k)
V = np.dot(X, W_v)

scores = np.dot(Q, K.T)

d_k = K.shape[1]
scaled_scores = scores / np.sqrt(d_k)

exp_scores = np.exp(scaled_scores)
attention_weights = exp_scores / exp_scores.sum(axis=1, keepdims=True)

output = np.dot(attention_weights, V)

print(output)

多頭注意力機(jī)制（Multi-Head Attention）

多頭注意力機(jī)制進(jìn)一步擴(kuò)展了自注意力的表達(dá)能力。

通過設(shè)置多個注意力頭（head），每個頭從不同的子空間中獲取信息，最后將各頭的結(jié)果拼接起來并進(jìn)行線性變換。

這樣模型可以更好地捕捉多維度的依賴關(guān)系，使其在復(fù)雜任務(wù)中表現(xiàn)更為優(yōu)異。

圖片

多頭注意力的計(jì)算流程

多頭注意力機(jī)制增加了模型的靈活性，能讓模型從不同角度學(xué)習(xí)到序列中詞匯間的關(guān)系。

class MultiHeadAttention(nn.Module):    
    
    def __init__(self, d_model, num_heads):
        super(MultiHeadAttention, self).__init__()
        # Ensure that the model dimension (d_model) is divisible by the number of heads
        assert d_model % num_heads == 0 
        
        # Initialize dimensions
        self.d_model = d_model # Model's dimension
        self.num_heads = num_heads # Number of attention heads
        self.d_k = d_model // num_heads # Dimension of each head's key, query, and value
        
        # Linear layers for transforming inputs
        self.W_q = nn.Linear(d_model, d_model) # Query transformation
        self.W_k = nn.Linear(d_model, d_model) # Key transformation
        self.W_v = nn.Linear(d_model, d_model) # Value transformation
        self.W_o = nn.Linear(d_model, d_model) # Output transformation
    
    # 縮放點(diǎn)積注意力機(jī)制
    def scaled_dot_product_attention(self, Q, K, V, mask=None):
        # Calculate attention scores
        attn_scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(self.d_k)
        
        # Apply mask if provided (useful for preventing attention to certain parts like padding)
        if mask is not None:
            attn_scores = attn_scores.masked_fill(mask == 0, -1e9)
        
        # Softmax is applied to obtain attention probabilities
        attn_probs = torch.softmax(attn_scores, dim=-1)
        
        # Multiply by values to obtain the final output
        output = torch.matmul(attn_probs, V)
        return output
        
    def split_heads(self, x):
        # Reshape the input to have num_heads for multi-head attention
        batch_size, seq_length, d_model = x.size()
        return x.view(batch_size, seq_length, self.num_heads, self.d_k).transpose(1, 2)
        
    def combine_heads(self, x):
        # Combine the multiple heads back to original shape
        batch_size, _, seq_length, d_k = x.size()
        return x.transpose(1, 2).contiguous().view(batch_size, seq_length, self.d_model)
        
    def forward(self, Q, K, V, mask=None):
        # Apply linear transformations and split heads
        Q = self.split_heads(self.W_q(Q))
        K = self.split_heads(self.W_k(K))
        V = self.split_heads(self.W_v(V))
        
        # Perform scaled dot-product attention
        attn_output = self.scaled_dot_product_attention(Q, K, V, mask)
        
        # Combine heads and apply output transformation
        output = self.W_o(self.combine_heads(attn_output))
        return output

責(zé)任編輯：武曉燕來源：程序員學(xué)長

注意力機(jī)制核心組件

點(diǎn)贊

51CTO技術(shù)棧公眾號

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營