一種基于學(xué)習(xí)的電池壽命預(yù)測(cè)(Python)
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import pandas as pd
import glob
import time
from qore_sdk.client import WebQoreClient
from tqdm import tqdm_notebook as tqdm
from ipywidgets import IntProgress
df_35 = pd.read_csv('CS2_35.csv',index_col=0).dropna()
df_36 = pd.read_csv('CS2_36.csv',index_col=0).dropna()
df_37 = pd.read_csv('CS2_37.csv',index_col=0).dropna()
df_38 = pd.read_csv('CS2_38.csv',index_col=0).dropna()
Plot Parameters
Only discharge capacity is used for prediction
fig = plt.figure(figsize=(9,8),dpi=150)
names = ['capacity','resistance','CCCT','CVCT']
titles = ['Discharge Capacity (mAh)', 'Internal Resistance (Ohm)',
'Constant Current Charging Time (s)','Constant Voltage Charging Time (s)']
plt.subplots_adjust(hspace=0.25)
for i in range(4):
plt.subplot(2,2,i+1)
plt.plot(df_35[names[i]],'o',ms=2,label='#35')
plt.plot(df_36[names[i]],'o',ms=2,label='#36')
plt.plot(df_37[names[i]],'o',ms=2,label='#37')
plt.plot(df_38[names[i]],'o',ms=2,label='#38')
plt.title(titles[i],fontsize=14)
plt.legend(loc='upper right')
if(i == 3):
plt.ylim(1000,5000)
plt.xlim(-20,1100)
X_train,y_train = ConvertData([df_35,df_37,df_38],50)
X_test,y_test = ConvertData([df_36],50)
print(X_train.shape,X_test.shape,y_train.shape,y_test.shape)
(2588, 50) (873, 50) (2588,) (873,)
idx = np.arange(0,X_train.shape[0],1)
idx = np.random.permutation(idx)
idx_lim = idx[:500]
X_train = X_train[idx_lim]
y_train = y_train[idx_lim]
X_train = X_train.reshape([X_train.shape[0], X_train.shape[1], 1])
X_test = X_test.reshape([X_test.shape[0], X_test.shape[1], 1])
print(X_train.shape,X_test.shape,y_train.shape,y_test.shape)
(500, 50, 1) (873, 50, 1) (500,) (873,)
%%time
client = WebQoreClient(username, password, endpoint=endpoint)
time_ = client.regression_train(X_train, y_train)
print('Time:', time_['train_time'], 's')
Time: 1.8583974838256836 s
Wall time: 2.7 s
res = client.regression_predict(X_train)
fig=plt.figure(figsize=(12, 4),dpi=150)
plt.plot(res['Y'],alpha=0.7,label='Prediction')
plt.plot(y_train,alpha=0.7,label='True')
plt.legend(loc='upper right',fnotallow=12)
res = client.regression_predict(X_test[300:800])
x_range = np.linspace(301,800,500)
fig=plt.figure(figsize=(12, 4),dpi=150)
plt.plot(x_range,y_test[300:800],label='true',c='gray')
plt.plot(x_range,res['Y'],label='Qore',c='red',alpha=0.8)
plt.xlabel('Number of Cycle',fnotallow=13)
plt.ylabel('DIscharge Capacity (Ah)',fnotallow=13)
plt.title('Prediction of Discharge Capacity of Test Data (CS2-36)',fnotallow=14)
plt.legend(loc='upper right',fnotallow=12)
from sklearn.metrics import mean_squared_error
mean_squared_error(y_test[300:800], res['Y'])
initial = X_test[500]
results = []
for i in tqdm(range(50)):
if(i == 0):
res = client.regression_predict([initial])['Y']
results.append(res[0])
time.sleep(1)
else:
initial = np.vstack((initial[1:],np.array(res)))
res = client.regression_predict([initial])['Y']
results.append(res[0])
time.sleep(1)
HBox(children=(IntProgress(value=0, max=50), HTML(value='')))
fig=plt.figure(figsize=(12,4),dpi=150)
plt.plot(np.linspace(501,550,50),results,'o-',ms=4,lw=1,label='predict')
plt.plot(np.linspace(401,550,150),y_test[400:550],'o-',lw=1,ms=4,label='true')
plt.legend(loc='upper right',fnotallow=12)
plt.xlabel('Number of Cycle',fnotallow=13)
plt.ylabel('Discharge Capacity (Ah)',fnotallow=13)
知乎學(xué)術(shù)咨詢:
https://www.zhihu.com/consult/people/792359672131756032?isMe=1
擔(dān)任《Mechanical System and Signal Processing》《中國(guó)電機(jī)工程學(xué)報(bào)》等期刊審稿專家,擅長(zhǎng)領(lǐng)域:信號(hào)濾波/降噪,機(jī)器學(xué)習(xí)/深度學(xué)習(xí),時(shí)間序列預(yù)分析/預(yù)測(cè),設(shè)備故障診斷/缺陷檢測(cè)/異常檢測(cè)。
分割線分割線分割線
渦扇發(fā)動(dòng)機(jī)全稱為渦輪風(fēng)扇發(fā)動(dòng)機(jī),是一種先進(jìn)的空中引擎,由渦輪噴氣發(fā)動(dòng)機(jī)發(fā)展而來(lái)。渦扇發(fā)動(dòng)機(jī)主要特點(diǎn)是首級(jí)壓縮機(jī)的面積比渦輪噴氣發(fā)動(dòng)機(jī)大。同時(shí),空氣螺旋槳(扇)將部分吸入的空氣從噴射引擎噴射出來(lái),并將其向后推動(dòng),以達(dá)到推進(jìn)的目的。
渦扇發(fā)動(dòng)機(jī)結(jié)構(gòu)是一種特殊的引擎結(jié)構(gòu),它在渦輪噴氣發(fā)動(dòng)機(jī)的基礎(chǔ)上再增加了 1-2 級(jí)低壓(低速)渦輪,這些渦輪可以驅(qū)動(dòng)一定數(shù)量的風(fēng)扇,消耗掉一部分渦噴發(fā)動(dòng)機(jī)(核心機(jī))的燃?xì)馀艢鈩?dòng)能,從而進(jìn)一步降低燃?xì)馀懦鏊俣?,提高引擎的性能。從前至后,渦扇發(fā)動(dòng)機(jī)由風(fēng)扇,壓氣機(jī),燃燒室,導(dǎo)向葉片,高壓低壓渦輪,加力燃燒室和尾噴管組成。
目前大多數(shù)論文的實(shí)驗(yàn)中使用的數(shù)據(jù)集是美國(guó)國(guó)家航空航天局的 C-MAPSS 數(shù)據(jù)集。C-MAPSS 數(shù)據(jù)集是由模擬航空發(fā)動(dòng)機(jī)的模擬軟件生成。模擬發(fā)動(dòng)機(jī)的結(jié)構(gòu)圖如下:
主要部件有低壓渦輪(low pressure turbine,LPT)、高壓渦輪(high pressure turbine,HPT)、高壓壓氣機(jī)(high pressure compressor,HPC)、低壓壓氣機(jī)(low pressure compressor,LPC)、風(fēng)扇(fan)、噴嘴(nozzle)、燃燒室(combustor)、低壓轉(zhuǎn)子轉(zhuǎn)速(N1)和高壓轉(zhuǎn)子轉(zhuǎn)速(N2)等。風(fēng)扇在正常飛行條件下工作,將空氣分流至內(nèi)、外涵道。LPC 和 HPC 通過(guò)增溫增壓技術(shù)將高溫、壓力混合氣體輸送到 combustor。LPT 可以有效減少空氣的流速,大大提高飛機(jī)煤油的化學(xué)能轉(zhuǎn)換效率。而 HPT 則是利用高溫、高壓氣體對(duì)渦輪葉片施加做功,從而產(chǎn)生機(jī)械能。N1、N2 和 nozzle 的應(yīng)用大大提升了發(fā)動(dòng)機(jī)的燃燒效率。
監(jiān)控渦扇發(fā)動(dòng)機(jī)狀況為21個(gè)傳感器。由于傳感器的單位不同,傳感器記錄的數(shù)值的量級(jí)也有所差異,位于10的-2次方到10的3次方之間。例如,燃燒室油氣比數(shù)值的量級(jí)是10的-2次方;低壓渦輪冷氣流量數(shù)值的量級(jí)是10的1次方。表2-2描述了NASA的C-MAPSS數(shù)據(jù)集。由于不同的操作條件和故障模式,數(shù)據(jù)集可以分為四個(gè)子數(shù)據(jù)集,依次是FD001、FD002、FD003和FD004。每個(gè)子數(shù)據(jù)分為訓(xùn)練集和測(cè)試集,記錄了發(fā)動(dòng)機(jī)的3種操作設(shè)置和21個(gè)傳感器數(shù)據(jù)。每個(gè)子數(shù)據(jù)集通過(guò).txt文件單獨(dú)保存。在.txt文件中,每一行記錄了一個(gè)引擎某個(gè)時(shí)間刻的3種操作設(shè)置和21個(gè)傳感器數(shù)據(jù)。關(guān)于故障模式和操作條件方面,F(xiàn)D001和FD002子數(shù)據(jù)集包含一種故障模式(高壓壓氣機(jī)退化),F(xiàn)D003和FD004包含兩種故障模式(高壓壓氣機(jī)退化和風(fēng)扇退化);FD001和FD003只有一種操作條件,F(xiàn)D002和FD004有六種操作條件。由于FD002和FD004子數(shù)據(jù)集引擎的操作環(huán)境復(fù)雜多變,F(xiàn)D002和FD004子數(shù)據(jù)集中RUL的預(yù)測(cè)更加困難。
可見(jiàn),渦扇發(fā)動(dòng)機(jī)數(shù)據(jù)集包含大量的按時(shí)間順序采集的傳感器數(shù)據(jù),這些數(shù)據(jù)包括發(fā)動(dòng)機(jī)溫度、壓力、振動(dòng)等多個(gè)方面的指標(biāo)。并且渦扇發(fā)動(dòng)機(jī)可能會(huì)出現(xiàn)多種故障模式。這適合用時(shí)間序列模型去對(duì)渦扇發(fā)動(dòng)機(jī)的剩余使用壽命做預(yù)測(cè)。
鑒于此,采用機(jī)器學(xué)習(xí)和深度學(xué)習(xí)對(duì)C-MAPSS渦扇發(fā)動(dòng)機(jī)進(jìn)行剩余使用壽命RUL預(yù)測(cè),Python代碼,Jupyter Notebook環(huán)境,方法如下:
(1)VAR with LSTM
(2)VAR with Logistic Regression
(3)LSTM (Lookback=20,10,5,1)
LSTM Lookback=10的結(jié)果如下:
final_pred = []
count = 0
for i in range(100):
j = max_cycles[i]
temp = pred[count+j-1]
count=count+j
final_pred.append(int(temp))
print(final_pred)
fig = plt.figure(figsize=(18,10))
plt.plot(final_pred,color='red', label='prediction')
plt.plot(y_test,color='blue', label='y_test')
plt.legend(loc='upper left')
plt.grid()
plt.show()
print("mean_squared_error >> ", mean_squared_error(y_test,final_pred))
print("root_mean_squared_error >> ", math.sqrt(mean_squared_error(y_test,final_pred)))
print("mean_absolute_error >>",mean_absolute_error(y_test,final_pred))
VAR with Logistic Regression結(jié)果:
fig = plt.figure(figsize=(18,10))
plt.plot(logistic_pred,color='red', label='prediction')
plt.plot(y_test,color='blue', label='y_test')
plt.legend(loc='upper left')
plt.grid()
plt.show()
VAR with LSTM結(jié)果:
fig = plt.figure(figsize=(18,10))
plt.plot(lstm_pred,color='red', label='prediction')
plt.plot(y_test,color='blue', label='y_test')
plt.legend(loc='upper left')
plt.grid()
plt.show()
完整代碼可通過(guò)知乎學(xué)術(shù)咨詢獲得:
https://www.zhihu.com/consult/people/792359672131756032?isMe=1
基于機(jī)器學(xué)習(xí)和深度學(xué)習(xí)的NASA渦扇發(fā)動(dòng)機(jī)剩余使用壽命預(yù)測(cè)(C-MAPSS數(shù)據(jù)集,Python代碼,ipynb 文件)
以美國(guó)航空航天局提供的航空渦扇發(fā)動(dòng)機(jī)退化數(shù)據(jù)集為研究對(duì)象,該數(shù)據(jù)集包含多臺(tái)發(fā)動(dòng)機(jī)從啟動(dòng)到失效期間多個(gè)運(yùn)行周期的多源傳感器時(shí)序狀態(tài)監(jiān)測(cè)數(shù)據(jù),它們共同表征了發(fā)動(dòng)機(jī)的性能退化情況。為減小計(jì)算成本,需要對(duì)原始多源傳感器監(jiān)測(cè)數(shù)據(jù)進(jìn)行數(shù)據(jù)篩選,剔除與發(fā)動(dòng)機(jī)性能退化情況無(wú)關(guān)的傳感器數(shù)據(jù)項(xiàng),保留有用數(shù)據(jù),為對(duì)多源傳感器數(shù)據(jù)進(jìn)行有效甄別,考慮綜合多種數(shù)據(jù)篩選方式,以保證篩選結(jié)果的準(zhǔn)確性。
主要內(nèi)容如下:
Data Visualization:
- Maximum life chart and engine life distribution chart for each unit.
- Correlation coefficient chart between sensors and RUL.
- Line chart showing the relationship between sensors and RUL for each engine.
- Value distribution chart for each sensor.
Feature Engineering:
- Based on the line chart showing the relationship between sensors and engine RUL, sensors 1, 5, 10, 16, 18, and 19 are found to be constant. Hence, these features are removed. Finally, the data is normalized.
Machine Learning Model:
- "Rolling mean feature" is added to the data, representing the average value of features over 10 time periods.
- Seven models are built: Linear regression, Light GBM, Random Forest, KNN, XGBoost, SVR, and Extra Tree.
- MAE, RMSE, and R2 are used as evaluation metrics. SVR performs the best with an R2 of 0.61 and RMSE = 25.7.
Deep Learning Model:
- The time window length is set to 30, and the shift length is set to 1. The training and test data are processed to be in a three-dimensional format for input to the models.
- Six deep learning models are built: CNN, LSTM, Stacked LSTM, Bi-LSTM, GRU, and a hybrid model combining CNN and LSTM.
- Convergence charts and evaluation of test data predictions are plotted. Each model has an R2 higher than 0.85, with Bi-LSTM achieving an R2 of 0.89 and RMSE of 13.5.?
機(jī)器學(xué)習(xí)模型所用模塊:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import random
import warnings
warnings.filterwarnings('ignore')
from sklearn.metrics import mean_squared_error, r2_score,mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler,MinMaxScaler
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.ensemble import RandomForestRegressor,ExtraTreesRegressor
from sklearn.neighbors import KNeighborsRegressor
from xgboost import XGBRegressor
from lightgbm import LGBMRegressor
結(jié)果如下:
深度學(xué)習(xí)所用模塊:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import random
import time
import warnings
warnings.filterwarnings('ignore')
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler,MinMaxScaler
#from google.colab import drive
#drive.mount('/content/drive')
# model
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Conv1D
from tensorflow.keras.layers import BatchNormalization, Dropout
from tensorflow.keras.layers import TimeDistributed, Flatten
from tensorflow.keras.layers.experimental import preprocessing
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ReduceLROnPlateau,EarlyStopping
完整代碼可通過(guò)知乎學(xué)術(shù)咨詢獲得:
https://www.zhihu.com/consult/people/792359672131756032?isMe=1
Python環(huán)境下基于機(jī)器學(xué)習(xí)的NASA渦輪風(fēng)扇發(fā)動(dòng)機(jī)剩余使用壽命RUL預(yù)測(cè)
本例所用的數(shù)據(jù)集為C-MAPSS數(shù)據(jù)集,C-MAPSS數(shù)據(jù)集是美國(guó)NASA發(fā)布的渦輪風(fēng)扇發(fā)動(dòng)機(jī)數(shù)據(jù)集,其中包含不同工作條件和故障模式下渦輪風(fēng)扇發(fā)動(dòng)機(jī)多源性能的退化數(shù)據(jù),共有 4 個(gè)子數(shù)據(jù)集,每個(gè)子集又可分為訓(xùn)練集、 測(cè)試集和RUL標(biāo)簽。其中,訓(xùn)練集包含航空發(fā)動(dòng)機(jī)從開(kāi)始運(yùn)行到發(fā)生故障的所有狀態(tài)參數(shù);測(cè)試集包含一定數(shù)量發(fā)動(dòng)機(jī)從開(kāi)始運(yùn)行到發(fā)生故障前某一時(shí)間點(diǎn)的全部狀態(tài)參數(shù);RUL標(biāo)簽記錄測(cè)試集中發(fā)動(dòng)機(jī)的 RUL 值,可用于評(píng)估模 型的RUL預(yù)測(cè)能力。C-MAPSS數(shù)據(jù)集包含的基本信息如下:
本例只采用FD001子數(shù)據(jù)集:
關(guān)于python的集成環(huán)境,我一般Anaconda 和 winpython 都用,windows下主要用Winpython,IDE為spyder(類MATLAB界面)。
正如peng wang老師所說(shuō)
winpython, anaconda 哪個(gè)更好?- peng wang的回答 - 知乎 https://www.zhihu.com/question/27615938/answer/71207511
winpython脫胎于pythonxy,面向科學(xué)計(jì)算,兼顧數(shù)據(jù)分析與挖掘;Anaconda主要面向數(shù)據(jù)分析與挖掘方面,在大數(shù)據(jù)處理方面有自己特色的一些包;winpython強(qiáng)調(diào)便攜性,被做成綠色軟件,不寫(xiě)入注冊(cè)表,安裝其實(shí)就是解壓到某個(gè)文件夾,移動(dòng)文件夾甚至放到U盤里在其他電腦上也能用;Anaconda則算是傳統(tǒng)的軟件模式。winpython是由個(gè)人維護(hù);Anaconda由數(shù)據(jù)分析服務(wù)公司維護(hù),意味著Winpython在很多方面都從簡(jiǎn),而Anaconda會(huì)提供一些人性化設(shè)置。Winpython 只能在windows上用,Anaconda則有l(wèi)inux的版本。
拋開(kāi)軟件包的差異,我個(gè)人也推薦初學(xué)者用winpython,正因?yàn)槠浜?jiǎn)單,問(wèn)題也少點(diǎn),由于便攜性的特點(diǎn)系統(tǒng)壞了,重裝后也能直接用。
請(qǐng)直接安裝、使用winPython:WinPython download因?yàn)楹芏嗄K以及集成的模塊
可以選擇版本,不一定要用最新版本,否則可能出現(xiàn)不兼容問(wèn)題。
下載、解壓后如下
打開(kāi)spyder就可以用了。
采用8種機(jī)器學(xué)習(xí)方法對(duì)NASA渦輪風(fēng)扇發(fā)動(dòng)機(jī)進(jìn)行剩余使用壽命RUL預(yù)測(cè),8種方法分別為:Linear Regression,SVM regression,Decision Tree regression,KNN model,Random Forest,Gradient Boosting Regressor,Voting Regressor,ANN Model。
首先導(dǎo)入相關(guān)模塊
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
import tensorflow as tf
from tensorflow.keras.layers import Dense
版本如下:
tensorflow=2.8.0
keras=2.8.0
sklearn=1.0.2
導(dǎo)入數(shù)據(jù)
path = ''
# define column names
col_names=["unit_nb","time_cycle"]+["set_1","set_2","set_3"] + [f's_{i}' for i in range(1,22)]
# read data
df_train = train_data = pd.read_csv(path+"train_FD001.txt", index_col=False,
sep= "\s+", header = None,nam
es=col_names )
df_test和y_test同理導(dǎo)入,看一下訓(xùn)練樣本
df_train.head()
進(jìn)行探索性數(shù)據(jù)分析
df_train[col_names[1:]].describe().T
數(shù)據(jù)可視化分析:
sns.set_style("darkgrid")
plt.figure(figsize=(16,10))
k = 1
for col in col_names[2:] :
plt.subplot(6,4,k)
sns.histplot(df_train[col],color='Green')
k+=1
plt.tight_layout()
plt.show()
def remaining_useful_life(df):
# Get the total number of cycles for each unit
grouped_by_unit = df.groupby(by="unit_nb")
max_cycle = grouped_by_unit["time_cycle"].max()
# Merge the max cycle back into the original frame
result_frame = df.merge(max_cycle.to_frame(name='max_cycle'), left_notallow='unit_nb', right_index=True)
# Calculate remaining useful life for each row
remaining_useful_life = result_frame["max_cycle"] - result_frame["time_cycle"]
result_frame["RUL"] = remaining_useful_life
# drop max_cycle as it's no longer needed
result_frame = result_frame.drop("max_cycle", axis=1)
return result_frame
df_train = remaining_useful_life(df_train)
df_train.head()
繪制最大RUL的直方圖分布
plt.figure(figsize=(10,5))
sns.histplot(max_ruls.RUL, color='r')
plt.xlabel('RUL')
plt.ylabel('Frequency')
plt.axvline(x=max_ruls.RUL.mean(), ls='--',color='k',label=f'mean={max_ruls.RUL.mean()}')
plt.axvline(x=max_ruls.RUL.median(),color='b',label=f'median={max_ruls.RUL.median()}')
plt.legend()
plt.show()
plt.figure(figsize=(20, 8))
cor_matrix = df_train.corr()
heatmap = sns.heatmap(cor_matrix, vmin=-1, vmax=1, annot=True)
heatmap.set_title('Correlation Heatmap', fnotallow={'fontsize':12}, pad=10);
添加圖片注釋,不超過(guò) 140 字(可選)
col = df_train.describe().columns
#we drop colummns with standard deviation is less than 0.0001
sensors_to_drop = list(col[df_train.describe().loc['std']<0.001]) + ['s_14']
print(sensors_to_drop)
#
df_train.drop(sensors_to_drop,axis=1,inplace=True)
df_test.drop(sensors_to_drop,axis=1,inplace=True)
sns.set_style("darkgrid")
fig, axs = plt.subplots(4,4, figsize=(25, 18), facecolor='w', edgecolor='k')
fig.subplots_adjust(hspace = .22, wspace=.2)
i=0
axs = axs.ravel()
index = list(df_train.unit_nb.unique())
for sensor in df_train.columns[1:-1]:
for idx in index[1:-1:15]:
axs[i].plot('RUL', sensor,data=df_train[df_train.unit_nb==idx])
axs[i].set_xlim(350,0)
axs[i].set(xticks=np.arange(0, 350, 25))
axs[i].set_ylabel(sensor)
axs[i].set_xlabel('Remaining Use Life')
i=i+1
X_train = df_train[df_train.columns[3:-1]]
y_train = df_train.RUL
X_test = df_test.groupby('unit_nb').last().reset_index()[df_train.columns[3:-1]]
y_train = y_train.clip(upper=155)
# create evalute function for train and test data
def evaluate(y_true, y_hat):
RMSE = np.sqrt(mean_squared_error(y_true, y_hat))
R2_score = r2_score(y_true, y_hat)
return [RMSE,R2_score];
#Make Dataframe which will contain results
Results = pd.DataFrame(columns=['RMSE-Train','R2-Train','RMSE-Test','R
2-Test','time-train (s)'])
訓(xùn)練線性回歸模型
import time
Sc = StandardScaler()
X_train1 = Sc.fit_transform(X_train)
X_test1 = Sc.transform(X_test)
# create and fit model
start = time.time()
lm = LinearRegression()
lm.fit(X_train1, y_train)
end_fit = time.time()- start
# predict and evaluate
y_pred_train = lm.predict(X_train1)
y_pred_test = lm.predict(X_test1)
Results.loc['LR']=evaluate(y_train, y_pred_train)+evaluate(y_test, y_pred_test)+[end_fit]
Results
def plot_prediction(y_test,y_pred_test,score):
plt.style.use("ggplot")
fig, ax = plt.subplots(1, 2, figsize=(17, 4), gridspec_kw={'width_ratios': [1.2, 3]})
fig.subplots_adjust(wspace=.12)
ax[0].plot([min(y_test),max(y_test)],
[min(y_test),max(y_test)],lw=3,c='r')
ax[0].scatter(y_test,y_pred_test,lw=3,c='g')
ax[0].annotate(text=('RMSE: ' + "{:.2f}".format(score[0]) +'\n' +
'R2: ' + "{:.2%}".format(score[1])), xy=(0,140), size='large');
ax[0].set_title('Actual vs predicted RUL')
ax[0].set_xlabel('Actual')
ax[0].set_ylabel('Predicted');
ax[1].plot(range(0,100),y_test,lw=2,c='r',label = 'actual')
ax[1].plot(range(0,100),y_pred_test,lw=1,ls='--', c='b',label = 'prediction')
ax[1].legend()
ax[1].set_title('Actual vs predicted RUL')
ax[1].set_xlabel('Engine num')
ax[1].set_ylabel('RUL');
plot_prediction(y_test.RUL,y_pred_test,evaluate(y_test, y_pred_test))
訓(xùn)練支持向量機(jī)模型
# create and fit model
start = time.time()
svr = SVR(kernel="rbf", gamma=0.25, epsilnotallow=0.05)
svr.fit(X_train1, y_train)
end_fit = time.time()-start
# predict and evaluate
y_pred_train = svr.predict(X_train1)
y_pred_test = svr.predict(X_test1)
Results.loc['SVM']=evaluate(y_train, y_pred_train)+evaluate(y_test, y_pred_test)+[end_fit]
Results
plot_prediction(y_test.RUL,y_pred_test,evaluate(y_test, y_pred_test))
訓(xùn)練決策樹(shù)模型
start=time.time()
dtr = DecisionTreeRegressor(random_state=42, max_features='sqrt', max_depth=10, min_samples_split=10)
dtr.fit(X_train1, y_train)
end_fit =time.time()-start
# predict and evaluate
y_pred_train = dtr.predict(X_train1)
y_pred_test = dtr.predict(X_test1)
Results.loc['DTree']=evaluate(y_train, y_pred_train)+evaluate(y_test, y_pred_test)+[end_fit]
Results
plot_prediction(y_test.RUL,y_pred_test,evaluate(y_test, y_pred_test
))
訓(xùn)練KNN模型
from sklearn.neighbors import KNeighborsRegressor
# Evaluating on Train Data Set
start = time.time()
Kneigh = KNeighborsRegressor(n_neighbors=7)
Kneigh.fit(X_train1, y_train)
end_fit =time.time()-start
# predict and evaluate
y_pred_train = Kneigh.predict(X_train1)
y_pred_test = Kneigh.predict(X_test1)
Results.loc['KNeigh']=evaluate(y_train, y_pred_train)+evaluate(y_test, y_pred_test)+[end_fit]
Results
plot_prediction(y_test.RUL,y_pred_test,evaluate(y_test, y_pred_test))
訓(xùn)練隨機(jī)森林模型
start = time.time()
rf = RandomForestRegressor(n_jobs=-1, n_estimators=130,max_features='sqrt', min_samples_split= 2, max_depth=10, random_state=42)
rf.fit(X_train1, y_train)
y_hat_train1 = rf.predict(X_train1)
end_fit = time.time()-start
# predict and evaluate
y_pred_train = rf.predict(X_train1)
y_pred_test = rf.predict(X_test1)
Results.loc['RF']=evaluate(y_train, y_pred_train)+evaluate(y_test, y_pred_test)+[end_fit]
Results
plot_prediction(y_test.RUL,y_pred_test,evaluate(y_test, y_pred_test))
訓(xùn)練Gradient Boosting Regressor模型
from sklearn.ensemble import GradientBoostingRegressor
# Evaluating on Train Data Set
start = time.time()
xgb_r = GradientBoostingRegressor(n_estimators=45, max_depth=10, min_samples_leaf=7, max_features='sqrt', random_state=42,learning_rate=0.11)
xgb_r.fit(X_train1, y_train)
end_fit =time.time()-start
# predict and evaluate
y_pred_train = xgb_r.predict(X_train1)
y_pred_test = xgb_r.predict(X_test1)
Results.loc['XGboost']=evaluate(y_train, y_pred_train)+evaluate(y_test, y_pred_test)+[end_fit]
Results
plot_prediction(y_test.RUL,y_pred_test,evaluate(y_test, y_pred_test))
訓(xùn)練Voting Regressor模型
from sklearn.ensemble import VotingRegressor
start=time.time()
Vot_R = VotingRegressor([("rf", rf), ("xgb", xgb_r)],weights=[1.5,1],n_jobs=-1)
Vot_R.fit(X_train1, y_train)
end_fit =time.time()-start
# predict and evaluate
y_pred_train = Vot_R.predict(X_train1)
y_pred_test = Vot_R.predict(X_test1)
Results.loc['VotingR']=evaluate(y_train, y_pred_train)+evaluate(y_test, y_pred_test)+[end_fit]
Results
plot_prediction(y_test.RUL,y_pred_test,evaluate(y_test, y_pred_test))
訓(xùn)練ANN模型
star=time.time()
model = tf.keras.models.Sequential()
model.add(Dense(32, activatinotallow='relu'))
model.add(Dense(64, activatinotallow='relu'))
model.add(Dense(128, activatinotallow='relu'))
model.add(Dense(128, activatinotallow='relu'))
model.add(Dense(1, activatinotallow='linear'))
model.compile(loss= 'msle', optimizer='adam', metrics=['msle'])
history = model.fit(x=X_train1,y=y_train, epochs = 40, batch_size = 64)
end_fit = time.time()-star
# predict and evaluate
y_pred_train = model.predict(X_train1)
y_pred_test = model.predict(X_test1)
Results.loc['ANN']=evaluate(y_train, y_pred_train)+evaluate(y_test, y_pred_test)+[end_fit]
Results
本文轉(zhuǎn)載自??高斯的手稿??
