偷偷摘套内射激情视频,久久精品99国产国产精,中文字幕无线乱码人妻,中文在线中文a,性爽19p

<u id="tmakn"></u>

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

信創(chuàng)認(rèn)證

公眾號(hào)矩陣

移動(dòng)端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考信創(chuàng)認(rèn)證華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營(yíng)

鴻蒙開發(fā)者社區(qū)訂閱號(hào)

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號(hào)

51CTO軟考題庫

賬號(hào)設(shè)置退出

知識(shí)圖譜入門：使用Python創(chuàng)建知識(shí)圖，分析并訓(xùn)練嵌入模型

作者：Diego Lopez Yse 2023-08-22 15:34:01

開發(fā) 前端

本文中我們將解釋如何構(gòu)建KG、分析它以及創(chuàng)建嵌入模型。

本文中我們將解釋如何構(gòu)建KG、分析它以及創(chuàng)建嵌入模型。

構(gòu)建知識(shí)圖譜

加載我們的數(shù)據(jù)。在本文中我們將從頭創(chuàng)建一個(gè)簡(jiǎn)單的KG。

import pandas as pd
 
 # Define the heads, relations, and tails
 head = ['drugA', 'drugB', 'drugC', 'drugD', 'drugA', 'drugC', 'drugD', 'drugE', 'gene1', 'gene2','gene3', 'gene4', 'gene50', 'gene2', 'gene3', 'gene4']
 relation = ['treats', 'treats', 'treats', 'treats', 'inhibits', 'inhibits', 'inhibits', 'inhibits', 'associated', 'associated', 'associated', 'associated', 'associated', 'interacts', 'interacts', 'interacts']
 tail = ['fever', 'hepatitis', 'bleeding', 'pain', 'gene1', 'gene2', 'gene4', 'gene20', 'obesity', 'heart_attack', 'hepatitis', 'bleeding', 'cancer', 'gene1', 'gene20', 'gene50']
 
 # Create a dataframe
 df = pd.DataFrame({'head': head, 'relation': relation, 'tail': tail})
 df

接下來，創(chuàng)建一個(gè)NetworkX圖(G)來表示KG。DataFrame (df)中的每一行都對(duì)應(yīng)于KG中的三元組(頭、關(guān)系、尾)。add_edge函數(shù)在頭部和尾部實(shí)體之間添加邊，關(guān)系作為標(biāo)簽。

import networkx as nx
 import matplotlib.pyplot as plt
 
 # Create a knowledge graph
 G = nx.Graph()
 for _, row in df.iterrows():
 G.add_edge(row['head'], row['tail'], label=row['relation'])

然后，繪制節(jié)點(diǎn)(實(shí)體)和邊(關(guān)系)以及它們的標(biāo)簽。

# Visualize the knowledge graph
 pos = nx.spring_layout(G, seed=42, k=0.9)
 labels = nx.get_edge_attributes(G, 'label')
 plt.figure(figsize=(12, 10))
 nx.draw(G, pos, with_labels=True, font_size=10, node_size=700, node_color='lightblue', edge_color='gray', alpha=0.6)
 nx.draw_networkx_edge_labels(G, pos, edge_labels=labels, font_size=8, label_pos=0.3, verticalalignment='baseline')
 plt.title('Knowledge Graph')
 plt.show()

現(xiàn)在我們可以進(jìn)行一些分析。

分析

對(duì)于KG，我們可以做的第一件事是查看它有多少個(gè)節(jié)點(diǎn)和邊，并分析它們之間的關(guān)系。

num_nodes = G.number_of_nodes()
 num_edges = G.number_of_edges()
 print(f'Number of nodes: {num_nodes}')
 print(f'Number of edges: {num_edges}')
 print(f'Ratio edges to nodes: {round(num_edges / num_nodes, 2)}')

1、節(jié)點(diǎn)中心性分析

節(jié)點(diǎn)中心性度量圖中節(jié)點(diǎn)的重要性或影響。它有助于識(shí)別圖結(jié)構(gòu)的中心節(jié)點(diǎn)。一些最常見的中心性度量是:

Degree centrality 計(jì)算節(jié)點(diǎn)上關(guān)聯(lián)的邊的數(shù)量。中心性越高的節(jié)點(diǎn)連接越緊密。

degree_centrality = nx.degree_centrality(G)
 for node, centrality in degree_centrality.items():
 print(f'{node}: Degree Centrality = {centrality:.2f}')

Betweenness centrality 衡量一個(gè)節(jié)點(diǎn)位于其他節(jié)點(diǎn)之間最短路徑上的頻率，或者說衡量一個(gè)節(jié)點(diǎn)對(duì)其他節(jié)點(diǎn)之間信息流的影響。具有高中間性的節(jié)點(diǎn)可以作為圖的不同部分之間的橋梁。

betweenness_centrality = nx.betweenness_centrality(G)
 for node, centrality in betweenness_centrality.items():
 print(f'Betweenness Centrality of {node}: {centrality:.2f}')

Closeness centrality 量化一個(gè)節(jié)點(diǎn)到達(dá)圖中所有其他節(jié)點(diǎn)的速度。具有較高接近中心性的節(jié)點(diǎn)被認(rèn)為更具中心性，因?yàn)樗鼈兛梢愿行У嘏c其他節(jié)點(diǎn)進(jìn)行通信。

closeness_centrality = nx.closeness_centrality(G)
 for node, centrality in closeness_centrality.items():
 print(f'Closeness Centrality of {node}: {centrality:.2f}')

可視化

# Calculate centrality measures
 degree_centrality = nx.degree_centrality(G)
 betweenness_centrality = nx.betweenness_centrality(G)
 closeness_centrality = nx.closeness_centrality(G)
 
 # Visualize centrality measures
 plt.figure(figsize=(15, 10))
 
 # Degree centrality
 plt.subplot(131)
 nx.draw(G, pos, with_labels=True, font_size=10, node_size=[v * 3000 for v in degree_centrality.values()], node_color=list(degree_centrality.values()), cmap=plt.cm.Blues, edge_color='gray', alpha=0.6)
 plt.title('Degree Centrality')
 
 # Betweenness centrality
 plt.subplot(132)
 nx.draw(G, pos, with_labels=True, font_size=10, node_size=[v * 3000 for v in betweenness_centrality.values()], node_color=list(betweenness_centrality.values()), cmap=plt.cm.Oranges, edge_color='gray', alpha=0.6)
 plt.title('Betweenness Centrality')
 
 # Closeness centrality
 plt.subplot(133)
 nx.draw(G, pos, with_labels=True, font_size=10, node_size=[v * 3000 for v in closeness_centrality.values()], node_color=list(closeness_centrality.values()), cmap=plt.cm.Greens, edge_color='gray', alpha=0.6)
 plt.title('Closeness Centrality')
 
 plt.tight_layout()
 plt.show()

2、最短路徑分析

最短路徑分析的重點(diǎn)是尋找圖中兩個(gè)節(jié)點(diǎn)之間的最短路徑。這可以幫助理解不同實(shí)體之間的連通性，以及連接它們所需的最小關(guān)系數(shù)量。例如，假設(shè)你想找到節(jié)點(diǎn)“gene2”和“cancer”之間的最短路徑:

source_node = 'gene2'
 target_node = 'cancer'
 
 # Find the shortest path
 shortest_path = nx.shortest_path(G, source=source_node, target=target_node)
 
 # Visualize the shortest path
 plt.figure(figsize=(10, 8))
 path_edges = [(shortest_path[i], shortest_path[i + 1]) for i in range(len(shortest_path) — 1)]
 nx.draw(G, pos, with_labels=True, font_size=10, node_size=700, node_color='lightblue', edge_color='gray', alpha=0.6)
 nx.draw_networkx_edges(G, pos, edgelist=path_edges, edge_color='red', width=2)
 plt.title(f'Shortest Path from {source_node} to {target_node}')
 plt.show()
 print('Shortest Path:', shortest_path)

源節(jié)點(diǎn)“gene2”和目標(biāo)節(jié)點(diǎn)“cancer”之間的最短路徑用紅色突出顯示，整個(gè)圖的節(jié)點(diǎn)和邊緣也被顯示出來。這可以幫助理解兩個(gè)實(shí)體之間最直接的路徑以及該路徑上的關(guān)系。

圖嵌入

圖嵌入是連續(xù)向量空間中圖中節(jié)點(diǎn)或邊的數(shù)學(xué)表示。這些嵌入捕獲圖的結(jié)構(gòu)和關(guān)系信息，允許我們執(zhí)行各種分析，例如節(jié)點(diǎn)相似性計(jì)算和在低維空間中的可視化。

我們將使用node2vec算法，該算法通過在圖上執(zhí)行隨機(jī)游走并優(yōu)化以保留節(jié)點(diǎn)的局部鄰域結(jié)構(gòu)來學(xué)習(xí)嵌入。

from node2vec import Node2Vec
 
 # Generate node embeddings using node2vec
 node2vec = Node2Vec(G, dimensinotallow=64, walk_length=30, num_walks=200, workers=4) # You can adjust these parameters
 model = node2vec.fit(window=10, min_count=1, batch_words=4) # Training the model
 
 # Visualize node embeddings using t-SNE
 from sklearn.manifold import TSNE
 import numpy as np
 
 # Get embeddings for all nodes
 embeddings = np.array([model.wv[node] for node in G.nodes()])
 
 # Reduce dimensionality using t-SNE
 tsne = TSNE(n_compnotallow=2, perplexity=10, n_iter=400)
 embeddings_2d = tsne.fit_transform(embeddings)
 
 # Visualize embeddings in 2D space with node labels
 plt.figure(figsize=(12, 10))
 plt.scatter(embeddings_2d[:, 0], embeddings_2d[:, 1], c='blue', alpha=0.7)
 
 # Add node labels
 for i, node in enumerate(G.nodes()):
 plt.text(embeddings_2d[i, 0], embeddings_2d[i, 1], node, fnotallow=8)
 plt.title('Node Embeddings Visualization')
 plt.show()

node2vec算法用于學(xué)習(xí)KG中節(jié)點(diǎn)的64維嵌入。然后使用t-SNE將嵌入減少到2維。并將結(jié)果以散點(diǎn)圖方式進(jìn)行可視化。不相連的子圖是可以在矢量化空間中單獨(dú)表示的

聚類

聚類是一種尋找具有相似特征的觀察組的技術(shù)。因?yàn)槭菬o監(jiān)督算法，所以不必特別告訴算法如何對(duì)這些觀察進(jìn)行分組，算法會(huì)根據(jù)數(shù)據(jù)自行判斷一組中的觀測(cè)值(或數(shù)據(jù)點(diǎn))比另一組中的其他觀測(cè)值更相似。

1、K-means

K-means使用迭代細(xì)化方法根據(jù)用戶定義的聚類數(shù)量(由變量K表示)和數(shù)據(jù)集生成最終聚類。

我們可以對(duì)嵌入空間進(jìn)行K-means聚類。這樣可以清楚地了解算法是如何基于嵌入對(duì)節(jié)點(diǎn)進(jìn)行聚類的:

# Perform K-Means clustering on node embeddings
 num_clusters = 3 # Adjust the number of clusters
 kmeans = KMeans(n_clusters=num_clusters, random_state=42)
 cluster_labels = kmeans.fit_predict(embeddings)
 
 # Visualize K-Means clustering in the embedding space with node labels
 plt.figure(figsize=(12, 10))
 plt.scatter(embeddings_2d[:, 0], embeddings_2d[:, 1], c=cluster_labels, cmap=plt.cm.Set1, alpha=0.7)
 
 # Add node labels
 for i, node in enumerate(G.nodes()):
 plt.text(embeddings_2d[i, 0], embeddings_2d[i, 1], node, fnotallow=8)
 
 plt.title('K-Means Clustering in Embedding Space with Node Labels')
 
 plt.colorbar(label=”Cluster Label”)
 plt.show()

每種顏色代表一個(gè)不同的簇?，F(xiàn)在我們回到原始圖，在原始空間中解釋這些信息:

from sklearn.cluster import KMeans
 
 # Perform K-Means clustering on node embeddings
 num_clusters = 3 # Adjust the number of clusters
 kmeans = KMeans(n_clusters=num_clusters, random_state=42)
 cluster_labels = kmeans.fit_predict(embeddings)
 
 # Visualize clusters
 plt.figure(figsize=(12, 10))
 nx.draw(G, pos, with_labels=True, font_size=10, node_size=700, node_color=cluster_labels, cmap=plt.cm.Set1, edge_color=’gray’, alpha=0.6)
 plt.title('Graph Clustering using K-Means')
 
 plt.show()

2、DBSCAN

DBSCAN是基于密度的聚類算法，并且不需要預(yù)設(shè)數(shù)量的聚類。它還可以將異常值識(shí)別為噪聲。下面是如何使用DBSCAN算法進(jìn)行圖聚類的示例，重點(diǎn)是基于從node2vec算法獲得的嵌入對(duì)節(jié)點(diǎn)進(jìn)行聚類。

from sklearn.cluster import DBSCAN
 
 # Perform DBSCAN clustering on node embeddings
 dbscan = DBSCAN(eps=1.0, min_samples=2) # Adjust eps and min_samples
 cluster_labels = dbscan.fit_predict(embeddings)
 
 # Visualize clusters
 plt.figure(figsize=(12, 10))
 nx.draw(G, pos, with_labels=True, font_size=10, node_size=700, node_color=cluster_labels, cmap=plt.cm.Set1, edge_color='gray', alpha=0.6)
 plt.title('Graph Clustering using DBSCAN')
 plt.show()

上面的eps參數(shù)定義了兩個(gè)樣本之間的最大距離，，min_samples參數(shù)確定了一個(gè)被認(rèn)為是核心點(diǎn)的鄰域內(nèi)的最小樣本數(shù)?？梢钥吹紻BSCAN將節(jié)點(diǎn)分配到簇，并識(shí)別不屬于任何簇的噪聲點(diǎn)。

總結(jié)

分析KGs可以為實(shí)體之間的復(fù)雜關(guān)系和交互提供寶貴的見解。通過結(jié)合數(shù)據(jù)預(yù)處理、分析技術(shù)、嵌入和聚類分析，可以發(fā)現(xiàn)隱藏的模式，并更深入地了解底層數(shù)據(jù)結(jié)構(gòu)。

本文中的方法可以有效地可視化和探索KGs，是知識(shí)圖譜學(xué)習(xí)中的必要的入門知識(shí)。

責(zé)任編輯：華軒來源： DeepHub IMBA

Python 開發(fā)

點(diǎn)贊

51CTO技術(shù)棧公眾號(hào)

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營(yíng)

<sub id="udw1l"><p id="udw1l"></p></sub>

<sub id="udw1l"></sub>