偷偷摘套内射激情视频,久久精品99国产国产精,中文字幕无线乱码人妻,中文在线中文a,性爽19p

<pre id="wvs63"><button id="wvs63"></button></pre>

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

信創(chuàng)認(rèn)證

公眾號矩陣

移動端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考信創(chuàng)認(rèn)證華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營

鴻蒙開發(fā)者社區(qū)訂閱號

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號

51CTO軟考題庫

賬號設(shè)置退出

如何在UNSW-NB15數(shù)據(jù)集上使用去噪自編碼器進(jìn)行零日攻擊檢測

作者：李睿 2025-10-17 09:00:00

本文探討了去噪自編碼器（DAE）在UNSW-NB15數(shù)據(jù)集上的零日攻擊檢測方法。通過僅對正常流量進(jìn)行訓(xùn)練，該模型學(xué)習(xí)穩(wěn)健特征，并利用重建誤差識別異常。實(shí)驗(yàn)表明，該方法對Shellcode零日攻擊的檢測率達(dá)到91.5%，AUC值為0.93，驗(yàn)證了DAE在未知威脅檢測中的有效性。

譯者 | 李睿

審校 | 重樓

零日攻擊是當(dāng)前網(wǎng)絡(luò)安全領(lǐng)域最具破壞性的威脅之一，它們利用此前未發(fā)現(xiàn)的漏洞入侵，能夠繞過現(xiàn)有的入侵檢測系統(tǒng)（IDS）。傳統(tǒng)的基于簽名的入侵檢測系統(tǒng)（IDS）依賴于已知攻擊模式構(gòu)建防御規(guī)則，因此在此類攻擊面前往往失效。為了檢測這種零日攻擊，人工智能模型需要了解正常的網(wǎng)絡(luò)行為模式，并自動識別并標(biāo)記偏離正常模式的異常行為。

去噪自編碼器（DAE）是一個很有應(yīng)用前景的解決方案，作為一種無監(jiān)督深度學(xué)習(xí)模型，DAE 的核心目標(biāo)是學(xué)習(xí)正常網(wǎng)絡(luò)流量的穩(wěn)健特征表示。其核心理念是：在模型訓(xùn)練過程中，先對輸入的正常網(wǎng)絡(luò)流量數(shù)據(jù)加入輕微噪聲（即“破壞”數(shù)據(jù)），再迫使模型學(xué)習(xí)從帶噪數(shù)據(jù)中重建出原始的“干凈數(shù)據(jù)”。這迫使其捕捉數(shù)據(jù)的本質(zhì)特征，而不是記憶噪聲。一旦遭遇未知的零日攻擊，損失函數(shù)（即重建誤差）將會激增，從而實(shí)現(xiàn)異常檢測。本文將探討在UNSW-NB15數(shù)據(jù)集上如何使用DAE進(jìn)行零日攻擊檢測。

去噪自動編碼器的核心理念

在去噪自編碼器的運(yùn)作機(jī)制中，我們在將輸入數(shù)據(jù)傳入編碼器之前，會主動向其注入噪聲。隨后，模型的目標(biāo)是學(xué)習(xí)從含噪輸入中重構(gòu)出純凈的原始數(shù)據(jù)。為了鼓勵模型關(guān)注有意義的特征而不是細(xì)節(jié)，使用隨機(jī)噪聲破壞輸入數(shù)據(jù)。其數(shù)學(xué)表達(dá)式如下：

圖1損失函數(shù)

重建損失也稱為損失函數(shù)，它評估原始輸入數(shù)據(jù)x和重構(gòu)輸出數(shù)據(jù)x?之間的差異。重建誤差越低，表明模型越能忽略噪聲干擾，并保留輸入數(shù)據(jù)的核心特征。下圖展示了去噪自編碼器（DAE）的結(jié)構(gòu)示意圖。

圖2 去噪自編碼器的結(jié)構(gòu)示意圖

示例：二元輸入案例

對于二元輸入（x∈{0,1}），以概率q隨機(jī)翻轉(zhuǎn)某一位或?qū)⑵渲昧悖駝t保持不變。如果允許模型以含噪輸入x為目標(biāo)最小化誤差，模型將只學(xué)會簡單復(fù)制噪聲。但由于強(qiáng)制其重構(gòu)真實(shí)值x，模型必須從特征間的關(guān)聯(lián)中推斷缺失信息。這使得去噪自編碼器能夠突破單純記憶的局限，學(xué)習(xí)輸入數(shù)據(jù)的深層結(jié)構(gòu)，從而構(gòu)建出具有噪聲穩(wěn)健性的模型，并在測試階段展現(xiàn)出更強(qiáng)的泛化能力。在網(wǎng)絡(luò)安全領(lǐng)域，去噪自編碼器可以有效檢測偏離正常模式的未知攻擊或零日攻擊。

案例研究：使用去噪自編碼器檢測零日攻擊

這個示例演示了去噪自動編碼器如何檢測UNSW-NB15數(shù)據(jù)集中的零日攻擊。訓(xùn)練模型在不受異常數(shù)據(jù)影響的情況下學(xué)習(xí)正常流量的底層結(jié)構(gòu)。在推理階段，模型可以評估顯著偏離正常模式的網(wǎng)絡(luò)流量（例如零日攻擊相關(guān)流量），這些異常流量會產(chǎn)生高重建誤差，從而實(shí)現(xiàn)異常檢測。

步驟1.數(shù)據(jù)集概述

UNSW-NB15數(shù)據(jù)集是用于評估入侵檢測系統(tǒng)性能的一個基準(zhǔn)數(shù)據(jù)集，包含正常流量樣本及九類攻擊流量（如Fuzzers、Shellcode、Exploits等）。為了模擬零日攻擊，只使用正常流量進(jìn)行訓(xùn)練，并單獨(dú)保留Shellcode攻擊用于測試，從而確保模型能夠針對未知攻擊行為進(jìn)行評估。

步驟2.導(dǎo)入庫并加載數(shù)據(jù)集

導(dǎo)入必要的庫并加載UNSW-NB15數(shù)據(jù)集。然后執(zhí)行數(shù)字預(yù)處理，分離標(biāo)簽和分類特征，并僅聚焦正常流量進(jìn)行訓(xùn)練。

python

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.metrics import roc_curve, auc
import tensorflow as tf
from tensorflow. keras import layers, Model
from tensorflow. keras.callbacks import EarlyStopping
# Load UNSW-NB15 dataset
df = pd. read_csv("UNSW_NB15.csv")
print ("Dataset shape:", df. shape)
print (df [['label’, ‘a(chǎn)ttack cat']].head())

輸出：

Dataset shape: (254004, 43)
First five rows of ['label','attack_cat']
     label     attack_cat
 0      0          Normal
 1      0          Normal
 2      0          Normal
 3      0          Normal
 4      1         Shellcode

輸出顯示數(shù)據(jù)集有254,004行和43列。標(biāo)簽0表示正常流量，1表示攻擊流量。第五行是Shellcode攻擊，使用它來檢測零日攻擊。

步驟3.預(yù)處理數(shù)據(jù)

python

# Define target
y = df['label']
X = df.drop(columns=['label'])
# Normal traffic for training
normal_data = X[y == 0]
# Zero-day traffic (Shellcode) for testing
zero_day_data = df[df['attack_cat'] == 'Shellcode'].drop(columns=['label','attack_cat'])
# Identify numeric and categorical features
numeric_features = normal_data.select_dtypes(include=['int64','float64']).columns
categorical_features = normal_data.select_dtypes(include=['object']).columns
# Preprocessing pipeline: scale numerics, one-hot encode categoricals
preprocessor = ColumnTransformer([
    ("num", StandardScaler(), numeric_features),
    ("cat", OneHotEncoder(handle_unknown="ignore", sparse=False), categorical_features)
])
# Fit only on normal traffic
X_normal = preprocessor.fit_transform(normal_data)
# Train-validation split
X_train, X_val = train_test_split(X_normal, test_size=0.2, random_state=42)
print("Training data shape:", X_train.shape)
print("Validation data shape:", X_val.shape)

輸出：

Training data shape:    (160000, 71)
Validation data shape:  ( 40000, 71)

在移除數(shù)據(jù)標(biāo)簽之后，僅保留良性樣本（即標(biāo)簽i==0的樣本）。數(shù)據(jù)集中包含37個數(shù)值型特征，以及4個經(jīng)過獨(dú)熱編碼處理的分類型特征——經(jīng)編碼后，分類型特征轉(zhuǎn)化為多個二元特征，最終使得輸入數(shù)據(jù)的總維度達(dá)到71維。這些特征共同構(gòu)成了總計(jì)71個維度的輸入。

步驟4.定義優(yōu)化后的去噪自編碼器（DAE）

在輸入中加入高斯噪聲，以迫使網(wǎng)絡(luò)學(xué)習(xí)具有穩(wěn)健的特征。批量歸一化可以穩(wěn)定訓(xùn)練過程，而小型瓶頸層（16個單元）則有助于形成緊湊的潛在表征。

Python

input_dim = X_train. shape [1]
inp = layers.Input(shape=(input_dim,))
noisy = layers. GaussianNoise(0.1)(inp)  # Corrupt input slightly
# Encoder
x = layers.Dense(64, activation='relu')(noisy)
x = layers. BatchNormalization()(x)  # Stabilize training
bottleneck = layers.Dense(16, activation='relu')(x)
# Decoder
x = layers.Dense(64, activation='relu')(bottleneck)
x = layers. BatchNormalization()(x)
out = layers.Dense(input_dim, activation='linear')(x)  # Use linear for standardized input
autoencoder = Model(inputs=inp, outputs=out)
autoencoder. compile(optimizer='adam', loss='mse')
autoencoder.summary()

輸出：

Model: "model"
_________________________________________________________________
Layer (type)                        Output Shape                          Param #
=================================================================
input_1 (InputLayer)                [(None, 71)]                             0
gaussian_noise (GaussianNoise)      (None, 71)                        0
dense (Dense)                       (None, 64)                                4,608
batch_normalization (BatchNormalization) (None, 64)        128
dense_1 (Dense)                     (None, 16)                              1,040
dense_2 (Dense)                     (None, 64)                               1,088
batch_normalization_1 (BatchNormalization) (None, 64)     128
dense_3 (Dense)                     (None, 71)                               4,615
=================================================================
Total params: 11,607  
Trainable params: 11,351  
Non-trainable params:   256

步驟5.使用提前停止法訓(xùn)練模型

Early stopping to avoid overfitting
es = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)
print("Training started...")
history = autoencoder.fit (
    X_train, X_train,
    epochs=50,
    batch_size=512,  # larger batch for faster training    validation_data=(X_val, X_val),
    shuffle=True,
    callbacks=[es]
)
print ("Training completed!")

訓(xùn)練損失曲線

plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Val Loss')
plt.xlabel("Epochs")
plt.ylabel("MSE Loss")
plt.legend()
plt.title("Training vs Validation Loss")
plt.show()

輸出：

Training started...
Epoch 1/50
313/313 [==============================] - 2s  6ms/step - loss: 0.0254 - val_loss: 0.0181
Epoch 2/50
313/313 [==============================] - 2s  6ms/step - loss: 0.0158 - val_loss: 0.0145
Epoch 3/50
313/313 [==============================] - 2s  6ms/step - loss: 0.0123 - val_loss: 0.0127
Epoch 4/50
313/313 [==============================] - 2s  6ms/step - loss: 0.0106 - val_loss: 0.0108
Epoch 5/50
313/313 [==============================] - 2s  6ms/step - loss: 0.0094 - val_loss: 0.0097
Epoch 6/50
313/313 [==============================] - 2s  6ms/step - loss: 0.0086 - val_loss: 0.0085
Epoch 7/50
313/313 [==============================] - 2s  6ms/step - loss: 0.0082 - val_loss: 0.0083
Epoch 8/50
313/313 [==============================] - 2s  6ms/step - loss: 0.0080 - val_loss: 0.0086
Restoring model weights from the end of the best epoch: 7.
Epoch 00008: early stopping
Training completed!

步驟6.零日檢測

# Transform datasets
X_normal_test = preprocessor.transform(normal_data)
X_zero_day_test = preprocessor.transform(zero_day_data)
# Compute reconstruction errors
recon_normal = np.mean(np.square(X_normal_test - autoencoder.predict(X_normal_test, batch_size=512)), axis=1)
recon_zero = np.mean(np.square(X_zero_day_test - autoencoder.predict(X_zero_day_test, batch_size=512)), axis=1)
# Threshold: 95th percentile of normal errors
threshold = np.percentile(recon_normal, 95)
print("Threshold:", threshold)
print("False Alarm Rate (Normal flagged as anomaly):", np.mean(recon_normal > threshold))
print("Detection Rate (Zero-Day detected):", np.mean(recon_zero > threshold))

輸出：

Threshold: 0.0121
False Alarm Rate (normal→anomaly): 0.0480
Detection Rate (Shellcode zero-day): 0.9150

將檢測閾值設(shè)置為良性流量重建誤差的95%。這意味著在模型對正常網(wǎng)絡(luò)流量的檢測中，只有4.8%的正常流量因重建誤差超過閾值而被誤標(biāo)記為異常（即假陽性）。與此同時，在對Shellcode攻擊流量的檢測中，約91.5%的攻擊流量的重建誤差超過了該閾值，從而被模型準(zhǔn)確識別為異常（即真陽性）。

步驟7.可視化

重建誤差直方圖

plt. figure(figsize=(8,5))
plt.hist(recon_normal, bins=50, alpha=0.6, label="Normal")
plt.hist(recon_zero, bins=50, alpha=0.6, label="Zero-Day (Shellcode)")
plt.axvline(threshold, color='red', linestyle='--', label='Threshold')
plt.xlabel("Reconstruction Error")
plt.ylabel("Frequency")
plt.legend()
plt.title("Normal vs Zero-Day Error Distribution")
plt.show()

輸出：

圖3良性流量（藍(lán)色）和零日流量（橙色）重建誤差的疊加直方圖

ROC曲線

python

y_true = np.concatenate([np.zeros_like(recon_normal), np.ones_like(recon_zero)])
y_scores = np.concatenate([recon_normal, recon_zero])
fpr, tpr, _ = roc_curve(y_true, y_scores)
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.2f}")
plt.plot([0,1],[0,1],'--')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.legend()
plt.title("ROC Curve for Zero-Day Detection")
plt.show()

輸出：

圖3 ROC曲線展示真陽性率與假陽性率的關(guān)系，AUC = 0.93

局限性

以下是這種方法的局限性：

去噪自編碼器（DAE）可以檢測異常，但無法對攻擊類型進(jìn)行分類。
選擇合適的閾值取決于數(shù)據(jù)集的選擇，并且可能需要微調(diào)。
只有在完全使用正常流量訓(xùn)練時，效果最好。

關(guān)鍵要點(diǎn)

去噪自編碼器在檢測未見的零日攻擊方面非常有效。
批量歸一化、更大的批次大小以及提前停止法提高了訓(xùn)練穩(wěn)定性。
可視化（損失曲線、誤差直方圖、ROC）使模型行為可解釋。
這種方法能夠以混合方式實(shí)現(xiàn)，用于攻擊分類或?qū)崟r網(wǎng)絡(luò)入侵檢測系統(tǒng)。

結(jié)論

本文展示了如何使用去噪自編碼器（DAE）在UNSW-NB15數(shù)據(jù)集中檢測零日攻擊。該模型通過學(xué)習(xí)正常網(wǎng)絡(luò)流量的穩(wěn)健模式，能夠?qū)ξ匆娺^的攻擊數(shù)據(jù)中的異常行為進(jìn)行標(biāo)記。去噪自編碼器（DAE）為構(gòu)建現(xiàn)代入侵檢測系統(tǒng)提供了強(qiáng)大的基礎(chǔ)，并可與先進(jìn)架構(gòu)或監(jiān)督分類器結(jié)合，構(gòu)建全面的入侵檢測系統(tǒng)。

常見問題解答

Q1：在UNSW-NB15數(shù)據(jù)集上使用去噪自動編碼器（DAE）的目的是什么？

A：在UNSW-NB15 數(shù)據(jù)集上使用去噪自編碼器，目的是檢測網(wǎng)絡(luò)流量中的零日攻擊。去噪自動編碼器（DAE）僅在正常流量上訓(xùn)練，基于高重建誤差識別異?；蚬袅髁?。

Q2：如何在去噪自動編碼器中添加噪聲？

A：.在訓(xùn)練過程中，通過向輸入數(shù)據(jù)添加高斯噪聲來輸入數(shù)據(jù)。盡管輸入數(shù)據(jù)被輕微破壞，但訓(xùn)練自編碼器重建原始的、干凈的輸入數(shù)據(jù)，從而使其能夠捕捉更穩(wěn)健和有意義的數(shù)據(jù)特征表示。

Q3：自編碼器能否對不同的攻擊類型進(jìn)行分類？

A：自編碼器屬于無監(jiān)督學(xué)習(xí)模型，其功能僅為檢測異常，無法對攻擊類型進(jìn)行分類。它不會區(qū)分具體是哪種攻擊，只會識別出偏離正常網(wǎng)絡(luò)行為的流量——這類異常流量可能意味著零日攻擊的發(fā)生。

Q4：如何進(jìn)行零日攻擊檢測？

A：在訓(xùn)練完成后，評估測試樣本的重建誤差。如果流量的誤差超過了設(shè)定的閾值（例如正常誤差的95%），就將其標(biāo)記為異常。在本文的示例中，將Shellcode 攻擊流量視為零日攻擊流量進(jìn)行檢測。

Q5：在這個例子中為什么稱其為去噪自編碼器

A：之所以稱為去噪自編碼器，主要原因是模型在訓(xùn)練階段會向輸入數(shù)據(jù)添加噪聲。這種方法增強(qiáng)了模型的泛化和識別偏差的能力，這是去噪自編碼器的核心理念。

原文標(biāo)題：Zero-Day Attack Detection using Denoising Autoencoder on UNSW-NB15，作者：Nitin Wankhade

責(zé)任編輯：龐桂玉來源： 51CTO

去噪自編碼器 DAE 零日攻擊網(wǎng)絡(luò)安全無監(jiān)督深度學(xué)習(xí)模型

點(diǎn)贊

51CTO技術(shù)棧公眾號

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營

<form id="ye2ir"></form>

<big id="ye2ir"></big>