偷偷摘套内射激情视频,久久精品99国产国产精,中文字幕无线乱码人妻,中文在线中文a,性爽19p

<form id="t8ard"><dl id="t8ard"><sup id="t8ard"></sup></dl></form>

AI.x社區(qū)

軟考社區(qū)

企業(yè)培訓

鴻蒙開發(fā)者社區(qū)

信創(chuàng)認證

公眾號矩陣

移動端

視頻課免費課排行榜短視頻直播課軟考學堂

全部課程軟考信創(chuàng)認證華為認證廠商認證 IT技術(shù)PMP項目管理免費題庫

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學堂

51CTO博客

CTO訓練營

鴻蒙開發(fā)者社區(qū)訂閱號

51CTO軟考

51CTO學堂APP

51CTO學堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號

51CTO軟考題庫

賬號設(shè)置退出

5分鐘掌握手動優(yōu)化機器學習模型超參數(shù)

作者：佚名 2021-04-27 10:16:51

人工智能機器學習

在本教程中，您將發(fā)現(xiàn)如何手動優(yōu)化機器學習算法的超參數(shù)。

機器學習算法具有超參數(shù)，可讓這些算法針對特定的數(shù)據(jù)集進行量身定制。

盡管通?？梢岳斫獬瑓?shù)的影響，但是可能不知道它們對數(shù)據(jù)集的特定影響以及它們在學習期間的交互作用。因此，作為機器學習項目的一部分，調(diào)整算法超參數(shù)的值很重要。

通常使用簡單的優(yōu)化算法來調(diào)整超參數(shù)，例如網(wǎng)格搜索和隨機搜索。另一種方法是使用隨機優(yōu)化算法，例如隨機爬山算法。

在本教程中，您將發(fā)現(xiàn)如何手動優(yōu)化機器學習算法的超參數(shù)。完成本教程后，您將知道：

可以使用隨機優(yōu)化算法代替網(wǎng)格和隨機搜索來進行超參數(shù)優(yōu)化。
如何使用隨機爬山算法調(diào)整 Perceptron 算法的超參數(shù)。
如何手動優(yōu)化 XGBoost 梯度提升算法的超參數(shù)。

教程概述

本教程分為三個部分：他們是：

手動超參數(shù)優(yōu)化
感知器超參數(shù)優(yōu)化
XGBoost 超參數(shù)優(yōu)化

手動超參數(shù)優(yōu)化

機器學習模型具有必須設(shè)置的超參數(shù)，以便針對數(shù)據(jù)集自定義模型。通常，超參數(shù)對模型的一般影響是已知的，但是如何為給定的數(shù)據(jù)集最佳地設(shè)置超參數(shù)以及相互作用的超參數(shù)的組合具有挑戰(zhàn)性。更好的方法是客觀地搜索模型超參數(shù)的不同值，然后選擇一個子集，以使模型在給定的數(shù)據(jù)集上獲得最佳性能。這稱為超參數(shù)優(yōu)化或超參數(shù)調(diào)整。盡管最簡單和最常見的兩種方法是隨機搜索和網(wǎng)格搜索，但是可以使用一系列不同的優(yōu)化算法。

隨機搜索。將搜索空間定義為超參數(shù)值的有界域，并在該域中隨機采樣點。

網(wǎng)格搜索。將搜索空間定義為超參數(shù)值的網(wǎng)格，并評估網(wǎng)格中的每個位置。

網(wǎng)格搜索非常適用于抽簽檢查組合，這些組合通常表現(xiàn)良好。隨機搜索非常適合發(fā)現(xiàn)和獲取您可能不會直觀地猜到的超參數(shù)組合，盡管它通常需要更多時間來執(zhí)行。

有關(guān)網(wǎng)格和隨機搜索以進行超參數(shù)調(diào)整的更多信息，請參見教程：

隨機搜索和網(wǎng)格搜索的超參數(shù)優(yōu)化

https://machinelearningmastery.com/hyperparameter-optimization-with-random-search-and-grid-search/

網(wǎng)格和隨機搜索是原始的優(yōu)化算法，可以使用我們喜歡的任何優(yōu)化來調(diào)整機器學習算法的性能。例如，可以使用隨機優(yōu)化算法。當需要良好或出色的性能并且有足夠的資源可用于調(diào)整模型時，這可能是理想的。接下來，讓我們看看如何使用

感知器超參數(shù)優(yōu)化

Perceptron 算法是最簡單的人工神經(jīng)網(wǎng)絡(luò)類型。它是單個神經(jīng)元的模型，可用于兩類分類問題，并為以后開發(fā)更大的網(wǎng)絡(luò)提供了基礎(chǔ)。在本節(jié)中，我們將探索如何手動優(yōu)化 Perceptron 模型的超參數(shù)。首先，讓我們定義一個綜合二進制分類問題，我們可以將其用作優(yōu)化模型的重點。我們可以使用make_classification（）函數(shù)來定義一個包含1,000行和五個輸入變量的二進制分類問題。下面的示例創(chuàng)建數(shù)據(jù)集并總結(jié)數(shù)據(jù)的形狀。

# define a binary classification dataset  
from sklearn.datasets import make_classification  
# define dataset  
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)  
# summarize the shape of the dataset  
print(X.shape, y.shape)

運行示例將打印創(chuàng)建的數(shù)據(jù)集的形狀，從而確認我們的期望。

(1000, 5) (1000,)

scikit-learn 通過 Perceptron 類提供了 Perceptron 模型的實現(xiàn)。

在調(diào)整模型的超參數(shù)之前，我們可以使用默認的超參數(shù)建立性能基準。

我們將通過 RepeatedStratifiedKFold 類使用重復分層k折交叉驗證的良好實踐來評估模型。下面列出了在我們的合成二進制分類數(shù)據(jù)集中使用默認超參數(shù)評估 Perceptron 模型的完整示例。

# perceptron default hyperparameters for binary classification  
from numpy import mean  
from numpy import std  
from sklearn.datasets import make_classification  
from sklearn.model_selection import cross_val_score  
from sklearn.model_selection import RepeatedStratifiedKFold  
from sklearn.linear_model import Perceptron  
# define dataset  
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)  
# define model  
model = Perceptron()  
# define evaluation procedure 
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)  
# evaluate model 
scores = cross_val_score(model, X, y, scoring='accuracy', cvcv=cv, n_jobs=-1)  
# report result  
print('Mean Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))

運行示例報告將評估模型，并報告分類準確性的平均值和標準偏差。

注意：由于算法或評估程序的隨機性，或者數(shù)值精度的差異，您的結(jié)果可能會有所不同。考慮運行該示例幾次并比較平均結(jié)果。

在這種情況下，我們可以看到具有默認超參數(shù)的模型實現(xiàn)了約78.5％的分類精度。

我們希望通過優(yōu)化的超參數(shù)可以實現(xiàn)比此更好的性能。

Mean Accuracy: 0.786 (0.069)

接下來，我們可以使用隨機爬山算法優(yōu)化 Perceptron 模型的超參數(shù)。我們可以優(yōu)化許多超參數(shù)，盡管我們將重點放在可能對模型的學習行為影響最大的兩個參數(shù)上。他們是：

學習率（eta0）
正則化（alpha）

學習率控制基于預測誤差的模型更新量，并控制學習速度。eta的默認值為1.0。合理的值應大于零（例如，大于1e-8或1e-10），并且可能小于1.0默認情況下，Perceptron 不使用任何正則化但是我們將啟用“彈性網(wǎng)”正則化，在學習過程中同時應用L1和L2正則化。這將鼓勵模型尋求較小的模型權(quán)重，從而往往獲得更好的性能。我們將調(diào)整用于控制正則化權(quán)重的“ alpha”超參數(shù)，例如它影響學習的數(shù)量。如果設(shè)置為0.0，則好像沒有使用正則化。合理的值在0.0到1.0之間。首先，我們需要為優(yōu)化算法定義目標函數(shù)。我們將使用平均分類精度和重復的分層k折交叉驗證來評估配置。我們將努力使配置的準確性最大化。下面的 Objective（）函數(shù)實現(xiàn)了這一點，采用了數(shù)據(jù)集和配置值列表。將配置值（學習率和正則化權(quán)重）解壓縮，用于配置模型，然后對模型進行評估，并返回平均準確度。

# objective function  
def objective(X, y, cfg):  
 # unpack config  
 eta, alpha = cfg  
 # define model  
 model = Perceptron(penalty='elasticnet', alphaalpha=alpha, etaeta0=eta)  
 # define evaluation procedure 
 cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)  
 # evaluate model  
 scores = cross_val_score(model, X, y, scoring='accuracy', cvcv=cv, n_jobs=-1)  
 # calculate mean accuracy  
 result = mean(scores)  
 return result

接下來，我們需要一個函數(shù)來在搜索空間中邁出一步。搜索空間由兩個變量（eta和alpha）定義。搜索空間中的某個步驟必須與先前的值有一定關(guān)系，并且必須綁定到合理的值（例如0到1之間）。我們將使用“步長”超參數(shù)來控制允許算法從現(xiàn)有配置移動多遠。使用高斯分布以當前值作為分布的平均值，以步長作為分布的標準偏差來概率地選擇新的配置。我們可以使用randn（），NumPy函數(shù)生成具有高斯分布的隨機數(shù)。下面的step（）函數(shù)實現(xiàn)了這一點，并將在搜索空間中邁出一步，并使用現(xiàn)有配置生成新配置。

# take a step in the search space  
def step(cfg, step_size):  
 # unpack the configuration  
 eta, alpha = cfg  
 # step eta  
 new_eta = eta + randn() * step_size  
 # check the bounds of eta  
 if new_eta <= 0.0:  
  new_eta = 1e-8  
 # step alpha  
 new_alpha = alpha + randn() * step_size  
 # check the bounds of alpha  
 if new_alpha < 0.0:  
  new_alpha = 0.0  
 # return the new configuration  
 return [new_eta, new_alpha]

接下來，我們需要實現(xiàn)隨機爬山算法，該算法將調(diào)用我們的Objective（）函數(shù)來評估候選解，而我們的step（）函數(shù)將在搜索空間中邁出一步。搜索首先生成一個隨機初始解，在這種情況下，eta和alpha值在0到1范圍內(nèi)。然后評估初始解并將其視為當前最佳工作解。

# starting point for the search  
solution = [rand(), rand()]  
# evaluate the initial point  
solution_eval = objective(X, y, solution)

接下來，該算法將迭代進行固定次數(shù)的迭代，作為提供給搜索的超參數(shù)。每次迭代都需要采取步驟并評估新的候選解決方案。

# take a step 
candidate = step(solution, step_size)  
# evaluate candidate point  
candidte_eval = objective(X, y, candidate)

如果新解決方案比當前工作解決方案好，則將其視為新的當前工作解決方案。

# check if we should keep the new point  
if candidte_eval >= solution_eval:  
 # store the new point  
 solution, solution_eval = candidate, candidte_eval  
 # report progress  
 print('>%d, cfg=%s %.5f' % (i, solution, solution_eval))

搜索結(jié)束時，將返回最佳解決方案及其性能。結(jié)合在一起，下面的hillclimbing（）函數(shù)以數(shù)據(jù)集，目標函數(shù)，迭代次數(shù)和步長為參數(shù)，實現(xiàn)了用于調(diào)整 Perceptron 算法的隨機爬山算法。

# hill climbing local search algorithm  
def hillclimbing(X, y, objective, n_iter, step_size):  
 # starting point for the search  
 solution = [rand(), rand()]  
 # evaluate the initial point  
 solution_eval = objective(X, y, solution)  
 # run the hill climb  
 for i in range(n_iter): 
  # take a step  
  candidate = step(solution, step_size)  
  # evaluate candidate point  
  candidte_eval = objective(X, y, candidate)  
  # check if we should keep the new point  
  if candidte_eval >= solution_eval:  
   # store the new point  
   solution, solution_eval = candidate, candidte_eval  
   # report progress 
   print('>%d, cfg=%s %.5f' % (i, solution, solution_eval))  
 return [solution, solution_eval]

然后，我們可以調(diào)用算法并報告搜索結(jié)果。在這種情況下，我們將運行該算法100次迭代，并使用0.1步長，這是在經(jīng)過反復試驗后選擇的。

# define the total iterations  
n_iter = 100  
# step size in the search space  
step_size = 0.1  
# perform the hill climbing search  
cfg, score = hillclimbing(X, y, objective, n_iter, step_size)  
print('Done!')  
print('cfg=%s: Mean Accuracy: %f' % (cfg, score))

結(jié)合在一起，下面列出了手動調(diào)整 Perceptron 算法的完整示例。

# manually search perceptron hyperparameters for binary classification  
from numpy import mean  
from numpy.random import randn  
from numpy.random import rand  
from sklearn.datasets import make_classification  
from sklearn.model_selection import cross_val_score  
from sklearn.model_selection import RepeatedStratifiedKFold  
from sklearn.linear_model import Perceptron   
# objective function  
def objective(X, y, cfg):  
 # unpack config  
 eta, alpha = cfg  
 # define model  
 model = Perceptron(penalty='elasticnet', alphaalpha=alpha, etaeta0=eta)  
 # define evaluation procedure  
 cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)  
 # evaluate model  
 scores = cross_val_score(model, X, y, scoring='accuracy', cvcv=cv, n_jobs=-1)  
 # calculate mean accuracy  
 result = mean(scores)  
 return result  
# take a step in the search space  
def step(cfg, step_size):  
 # unpack the configuration 
 eta, alpha = cfg  
 # step eta  
 new_eta = eta + randn() * step_size  
 # check the bounds of eta  
 if new_eta <= 0.0:  
  new_eta = 1e-8  
 # step alpha  
 new_alpha = alpha + randn() * step_size  
 # check the bounds of alpha  
 if new_alpha < 0.0:  
  new_alpha = 0.0  
 # return the new configuration  
 return [new_eta, new_alpha]   
# hill climbing local search algorithm  
def hillclimbing(X, y, objective, n_iter, step_size):  
 # starting point for the search  
 solution = [rand(), rand()] 
 # evaluate the initial point  
 solution_eval = objective(X, y, solution)  
 # run the hill climb  
 for i in range(n_iter):  
  # take a step  
  candidate = step(solution, step_size)  
  # evaluate candidate point 
   candidte_eval = objective(X, y, candidate)  
  # check if we should keep the new point  
  if candidte_eval >= solution_eval:  
   # store the new point  
   solution, solution_eval = candidate, candidte_eval  
   # report progress  
   print('>%d, cfg=%s %.5f' % (i, solution, solution_eval))  
 return [solution, solution_eval]  
# define dataset  
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1) 
 # define the total iterations  
n_iter = 100  
# step size in the search space 
step_size = 0.1  
# perform the hill climbing search  
cfg, score = hillclimbing(X, y, objective, n_iter, step_size)  
print('Done!') 
print('cfg=%s: Mean Accuracy: %f' % (cfg, score))

運行示例將在每次搜索過程中看到改進時報告配置和結(jié)果。運行結(jié)束時，將報告最佳配置和結(jié)果。

注意：由于算法或評估程序的隨機性，或者數(shù)值精度的差異，您的結(jié)果可能會有所不同?？紤]運行該示例幾次并比較平均結(jié)果。

在這種情況下，我們可以看到，最好的結(jié)果涉及在1.004處使用略高于1的學習率和約0.002的正則化權(quán)重，從而獲得約79.1％的平均準確度，比默認配置好于約78.5％的準確度。

>0, cfg=[0.5827274503894747, 0.260872709578015] 0.70533  
>4, cfg=[0.5449820307807399, 0.3017271170801444] 0.70567  
>6, cfg=[0.6286475606495414, 0.17499090243915086] 0.71933  
>7, cfg=[0.5956196828965779, 0.0] 0.78633  
>8, cfg=[0.5878361167354715, 0.0] 0.78633  
>10, cfg=[0.6353507984485595, 0.0] 0.78633  
>13, cfg=[0.5690530537610675, 0.0] 0.78633  
>17, cfg=[0.6650936023999641, 0.0] 0.78633  
>22, cfg=[0.9070451625704087, 0.0] 0.78633  
>23, cfg=[0.9253366187387938, 0.0] 0.78633  
>26, cfg=[0.9966143540220266, 0.0] 0.78633  
>31, cfg=[1.0048613895650054, 0.002162219228449132] 0.79133  
Done!  
cfg=[1.0048613895650054, 0.002162219228449132]: Mean Accuracy: 0.791333

既然我們已經(jīng)熟悉了如何使用隨機爬山算法來調(diào)整簡單的機器學習算法的超參數(shù)，那么讓我們來看看如何調(diào)整更高級的算法，例如 XGBoost 。

XGBoost超參數(shù)優(yōu)化

XGBoost 是 Extreme Gradient Boosting 的縮寫，是隨機梯度提升機器學習算法的有效實現(xiàn)。隨機梯度增強算法（也稱為梯度增強機或樹增強）是一種功能強大的機器學習技術(shù)，可在各種具有挑戰(zhàn)性的機器學習問題上表現(xiàn)出色，甚至表現(xiàn)最佳。首先，必須安裝XGBoost庫。您可以使用pip安裝它，如下所示：

sudo pip install xgboost

一旦安裝，您可以通過運行以下代碼來確認它已成功安裝，并且您正在使用現(xiàn)代版本：

# xgboost  
import xgboost  
print("xgboost", xgboost.__version__)

運行代碼，您應該看到以下版本號或更高版本

xgboost 1.0.1

盡管XGBoost庫具有自己的 Python API，但我們可以通過 XGBClassifier 包裝器類將 XGBoost 模型與 scikit-learn API 結(jié)合使用?？梢詫嵗Ｐ偷膶嵗拖駥⑵溆糜谀Ｐ驮u估的任何其他 scikit-learn 類一樣使用。例如：

# define model  
model = XGBClassifier()

在調(diào)整 XGBoost 的超參數(shù)之前，我們可以使用默認的超參數(shù)建立性能基準。我們將使用與上一節(jié)相同的合成二進制分類數(shù)據(jù)集，并使用重復分層k折交叉驗證的相同測試工具。下面列出了使用默認超參數(shù)評估 XGBoost 性能的完整示例。

# xgboost with default hyperparameters for binary classification  
from numpy import mean  
from numpy import std  
from sklearn.datasets import make_classification  
from sklearn.model_selection import cross_val_score  
from sklearn.model_selection import RepeatedStratifiedKFold  
from xgboost import XGBClassifier  
# define dataset  
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)  
# define model  
model = XGBClassifier()  
# define evaluation procedure  
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)  
# evaluate model  
scores = cross_val_score(model, X, y, scoring='accuracy', cvcv=cv, n_jobs=-1)  
# report result  
print('Mean Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))

通過運行示例，可以評估模型并報告分類精度的平均值和標準偏差。

注意：由于算法或評估程序的隨機性，或者數(shù)值精度的差異，您的結(jié)果可能會有所不同?？紤]運行該示例幾次并比較平均結(jié)果。在這種情況下，我們可以看到具有默認超參數(shù)的模型實現(xiàn)了約84.9％的分類精度。我們希望通過優(yōu)化的超參數(shù)可以實現(xiàn)比此更好的性能。

Mean Accuracy: 0.849 (0.040)

接下來，我們可以采用隨機爬山優(yōu)化算法來調(diào)整 XGBoost 模型的超參數(shù)。我們可能要針對 XGBoost 模型優(yōu)化許多超參數(shù)。

有關(guān)如何調(diào)優(yōu) XGBoost 模型的概述，請參見教程：

如何配置梯度提升算法

https://machinelearningmastery.com/configure-gradient-boosting-algorithm/

我們將關(guān)注四個關(guān)鍵的超參數(shù)。他們是：

學習率（learning_rate）
樹數(shù)（n_estimators）
子樣本百分比（子樣本）
樹深（最大深度）

學習速度控制著每棵樹對整體的貢獻。明智的值應小于1.0，而應稍高于0.0（例如1e-8）。樹木的數(shù)量控制著合奏的大小，通常，越多的樹木越好，以至于收益遞減。合理的值在1棵樹與數(shù)百或數(shù)千棵樹之間。子樣本百分比定義用于訓練每棵樹的隨機樣本大小，定義為原始數(shù)據(jù)集大小的百分比。值介于略高于0.0（例如1e-8）和1.0的值之間樹的深度是每棵樹中的級別數(shù)。較深的樹更特定于訓練數(shù)據(jù)集，并且可能過度擬合。較短的樹通常可以更好地概括。明智的值是1到10或20之間。首先，我們必須更新Objective（）函數(shù)以解包XGBoost模型的超參數(shù)，對其進行配置，然后評估平均分類精度。

# objective function  
def objective(X, y, cfg):  
 # unpack config  
 lrate, n_tree, subsam, depth = cfg  
 # define model  
 model = XGBClassifier(learning_rate=lrate, n_estimators=n_tree, subsamsubsample=subsam, max_depth=depth)  
 # define evaluation procedure  
 cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)  
 # evaluate model  
 scores = cross_val_score(model, X, y, scoring='accuracy', cvcv=cv, n_jobs=-1)  
 # calculate mean accuracy  
 result = mean(scores)  
 return result

接下來，我們需要定義用于在搜索空間中邁出一步的step（）函數(shù)。

每個超參數(shù)的范圍都非常不同，因此，我們將分別為每個超參數(shù)定義步長（分布的標準偏差）。為了使事情保持簡單，我們還將在線定義步長，而不是將其定義為函數(shù)的參數(shù)。

樹的數(shù)量和深度是整數(shù)，因此步進值是四舍五入的。選定的步長是任意的，是在經(jīng)過反復試驗后選擇的。下面列出了更新的步進功能。

# take a step in the search space  
def step(cfg):  
 # unpack config  
 lrate, n_tree, subsam, depth = cfg  
 # learning rate  
 lratelrate = lrate + randn() * 0.01  
 if lrate <= 0.0: 
  lrate = 1e-8  
 if lrate > 1:  
  lrate = 1.0  
 # number of trees  
 n_tree = round(n_tree + randn() * 50)  
 if n_tree <= 0.0:  
  n_tree = 1  
 # subsample percentage  
 subsamsubsam = subsam + randn() * 0.1  
 if subsam <= 0.0:  
  subsam = 1e-8  
 if subsam > 1:  
  subsam = 1.0  
 # max tree depth  
 depth = round(depth + randn() * 7)  
 if depth <= 1: 
  depth = 1  
 # return new config  
 return [lrate, n_tree, subsam, depth]

最后，必須更新hillclimbing（）算法，以定義具有適當值的初始解。在這種情況下，我們將使用合理的默認值，匹配默認的超參數(shù)或接近它們來定義初始解決方案。

# starting point for the search  
solution = step([0.1, 100, 1.0, 7])

結(jié)合在一起，下面列出了使用隨機爬山算法手動調(diào)整 XGBoost 算法的超參數(shù)的完整示例。

# xgboost manual hyperparameter optimization for binary classification  
from numpy import mean  
from numpy.random import randn 
from numpy.random import rand  
from numpy.random import randint  
from sklearn.datasets import make_classification  
from sklearn.model_selection import cross_val_score  
from sklearn.model_selection import RepeatedStratifiedKFold  
from xgboost import XGBClassifier   
# objective function  
def objective(X, y, cfg): 
 # unpack config  
 lrate, n_tree, subsam, depth = cfg  
 # define model  
 model = XGBClassifier(learning_rate=lrate, n_estimators=n_tree, subsamsubsample=subsam, max_depth=depth)  
 # define evaluation procedure  
 cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)  
 # evaluate model 
  scores = cross_val_score(model, X, y, scoring='accuracy', cvcv=cv, n_jobs=-1)  
 # calculate mean accuracy  
 result = mean(scores)  
 return result  
# take a step in the search space  
def step(cfg):  
 # unpack config  
 lrate, n_tree, subsam, depth = cfg  
 # learning rate  
 lratelrate = lrate + randn() * 0.01  
 if lrate <= 0.0:  
  lrate = 1e-8  
 if lrate > 1:  
  lrate = 1.0  
 # number of trees  
 n_tree = round(n_tree + randn() * 50)  
 if n_tree <= 0.0:  
  n_tree = 1  
 # subsample percentage  
 subsamsubsam = subsam + randn() * 0.1  
 if subsam <= 0.0:  
  subsam = 1e-8  
 if subsam > 1:  
  subsam = 1.0  
 # max tree depth  
 depth = round(depth + randn() * 7)  
 if depth <= 1:  
  depth = 1  
 # return new config  
 return [lrate, n_tree, subsam, depth]  
# hill climbing local search algorithm  
def hillclimbing(X, y, objective, n_iter):  
 # starting point for the search  
 solution = step([0.1, 100, 1.0, 7])  
 # evaluate the initial point  
 solution_eval = objective(X, y, solution)  
 # run the hill climb  
 for i in range(n_iter):  
  # take a step  
  candidate = step(solution)  
  # evaluate candidate point  
  candidte_eval = objective(X, y, candidate)  
  # check if we should keep the new point  
  if candidte_eval >= solution_eval:  
   # store the new point  
   solution, solution_eval = candidate, candidte_eval  
   # report progress  
   print('>%d, cfg=[%s] %.5f' % (i, solution, solution_eval))  
 return [solution, solution_eval]  
# define dataset  
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1) 
# define the total iterations  
n_iter = 200  
# perform the hill climbing search  
cfg, score = hillclimbing(X, y, objective, n_iter)  
print('Done!') 
print('cfg=[%s]: Mean Accuracy: %f' % (cfg, score))

運行示例將在每次搜索過程中看到改進時報告配置和結(jié)果。運行結(jié)束時，將報告最佳配置和結(jié)果。

注意：由于算法或評估程序的隨機性，或者數(shù)值精度的差異，您的結(jié)果可能會有所不同?？紤]運行該示例幾次并比較平均結(jié)果。

在這種情況下，我們可以看到最好的結(jié)果涉及使用大約0.02的學習率，52棵樹，大約50％的子采樣率以及53個級別的較大深度。此配置產(chǎn)生的平均準確度約為87.3％，優(yōu)于默認配置的平均準確度約為84.9％。

>0, cfg=[[0.1058242692126418, 67, 0.9228490731610172, 12]] 0.85933  
>1, cfg=[[0.11060813799692253, 51, 0.859353656735739, 13]] 0.86100  
>4, cfg=[[0.11890247679234153, 58, 0.7135275461723894, 12]] 0.86167  
>5, cfg=[[0.10226257987735601, 61, 0.6086462443373852, 17]] 0.86400  
>15, cfg=[[0.11176962034280596, 106, 0.5592742266405146, 13]] 0.86500  
>19, cfg=[[0.09493587069112454, 153, 0.5049124222437619, 34]] 0.86533  
>23, cfg=[[0.08516531024154426, 88, 0.5895201311518876, 31]] 0.86733  
>46, cfg=[[0.10092590898175327, 32, 0.5982811365027455, 30]] 0.86867  
>75, cfg=[[0.099469211050998, 20, 0.36372573610040404, 32]] 0.86900  
>96, cfg=[[0.09021536590375884, 38, 0.4725379807796971, 20]] 0.86900  
>100, cfg=[[0.08979482274655906, 65, 0.3697395430835758, 14]] 0.87000  
>110, cfg=[[0.06792737273465625, 89, 0.33827505722318224, 17]] 0.87000  
>118, cfg=[[0.05544969684589669, 72, 0.2989721608535262, 23]] 0.87200  
>122, cfg=[[0.050102976159097, 128, 0.2043203965148931, 24]] 0.87200  
>123, cfg=[[0.031493266763680444, 120, 0.2998819062922256, 30]] 0.87333  
>128, cfg=[[0.023324201169625292, 84, 0.4017169945431015, 42]] 0.87333  
>140, cfg=[[0.020224220443108752, 52, 0.5088096815056933, 53]] 0.87367  
Done!  
cfg=[[0.020224220443108752, 52, 0.5088096815056933, 53]]: Mean Accuracy: 0.873667

責任編輯：龐桂玉來源： Python中文社區(qū) (ID:python-china)

優(yōu)化機器學習人工智能

51CTO技術(shù)棧公眾號

業(yè)務
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學堂精培企業(yè)培訓 CTO訓練營

<tfoot id="5uvk9"><strong id="5uvk9"></strong></tfoot>

<del id="5uvk9"><sup id="5uvk9"></sup></del>

<strike id="5uvk9"><source id="5uvk9"></source></strike>