偷偷摘套内射激情视频,久久精品99国产国产精,中文字幕无线乱码人妻,中文在线中文a,性爽19p

<style id="yewdf"></style>

<legend id="yewdf"><track id="yewdf"></track></legend>

<blockquote id="yewdf"><i id="yewdf"><video id="yewdf"></video></i></blockquote>

<style id="yewdf"></style>

AI.x社區(qū)

軟考社區(qū)

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

信創(chuàng)認(rèn)證

公眾號矩陣

移動端

視頻課免費課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考信創(chuàng)認(rèn)證華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項目管理免費題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營

鴻蒙開發(fā)者社區(qū)訂閱號

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號

51CTO軟考題庫

賬號設(shè)置退出

7 款 Python 可視化工具對比

2015-12-02 09:44:04

開發(fā) 后端

Python 的科學(xué)棧相當(dāng)成熟，各種應(yīng)用場景都有相關(guān)的模塊，包括機器學(xué)習(xí)和數(shù)據(jù)分析。數(shù)據(jù)可視化是發(fā)現(xiàn)數(shù)據(jù)和展示結(jié)果的重要一環(huán)，只不過過去以來，相對于 R 這樣的工具，發(fā)展還是落后一些。

Python 的科學(xué)棧相當(dāng)成熟，各種應(yīng)用場景都有相關(guān)的模塊，包括機器學(xué)習(xí)和數(shù)據(jù)分析。數(shù)據(jù)可視化是發(fā)現(xiàn)數(shù)據(jù)和展示結(jié)果的重要一環(huán)，只不過過去以來，相對于 R 這樣的工具，發(fā)展還是落后一些。

幸運的是，過去幾年出現(xiàn)了很多新的Python數(shù)據(jù)可視化庫，彌補了一些這方面的差距。matplotlib 已經(jīng)成為事實上的數(shù)據(jù)可視化方面最主要的庫，此外還有很多其他庫，例如vispy，bokeh， seaborn， pyga， folium 和 networkx，這些庫有些是構(gòu)建在 matplotlib 之上，還有些有其他一些功能。

本文會基于一份真實的數(shù)據(jù)，使用這些庫來對數(shù)據(jù)進(jìn)行可視化。通過這些對比，我們期望了解每個庫所適用的范圍，以及如何更好的利用整個 Python 的數(shù)據(jù)可視化的生態(tài)系統(tǒng)。

我們在 Dataquest 建了一個交互課程，教你如何使用 Python 的數(shù)據(jù)可視化工具。如果你打算深入學(xué)習(xí)，可以點這里。

探索數(shù)據(jù)集

在我們探討數(shù)據(jù)的可視化之前，讓我們先來快速的瀏覽一下我們將要處理的數(shù)據(jù)集。我們將要使用的數(shù)據(jù)來自 openlights。我們將要使用航線數(shù)據(jù)集、機場數(shù)據(jù)集、航空公司數(shù)據(jù)集。其中，路徑數(shù)據(jù)的每一行對應(yīng)的是兩個機場之間的飛行路徑；機場數(shù)據(jù)的每一行對應(yīng)的是世界上的某一個機場，并且給出了相關(guān)信息；航空公司的數(shù)據(jù)的每一行給出的是每一個航空公司。

首先我們先讀取數(shù)據(jù)：

# Import the pandas library. 
import pandas 
# Read in the airports data. 
airports = pandas.read_csv("airports.csv", header=None, dtype=str) 
airports.columns = ["id", "name", "city", "country", "code", "icao", "latitude", "longitude", "altitude", "offset", "dst", "timezone"] 
# Read in the airlines data. 
airlines = pandas.read_csv("airlines.csv", header=None, dtype=str) 
airlines.columns = ["id", "name", "alias", "iata", "icao", "callsign", "country", "active"] 
# Read in the routes data. 
routes = pandas.read_csv("routes.csv", header=None, dtype=str) 
routes.columns = ["airline", "airline_id", "source", "source_id", "dest", "dest_id", "codeshare", "stops", "equipment"]

這些數(shù)據(jù)沒有列的***項，因此我們通過賦值 column 屬性來添加列的***項。我們想要將每一列作為字符串進(jìn)行讀取，因為這樣做可以簡化后續(xù)以行 id 為匹配，對不同的數(shù)據(jù)框架進(jìn)行比較的步驟。我們在讀取數(shù)據(jù)時設(shè)置了 dtype 屬性值達(dá)到這一目的。

我們可以快速瀏覽一下每一個數(shù)據(jù)集的數(shù)據(jù)框架。

airports.head()

airlines.head()

routes.head()

我們可以分別對每一個單獨的數(shù)據(jù)集做許多不同有趣的探索，但是只要將它們結(jié)合起來分析才能取得***的收獲。Pandas 將會幫助我們分析數(shù)據(jù)，因為它能夠有效的過濾權(quán)值或者通過它來應(yīng)用一些函數(shù)。我們將會深入幾個有趣的權(quán)值因子，比如分析航空公司和航線。

那么在此之前我們需要做一些數(shù)據(jù)清洗的工作。

routes = routes[routes["airline_id"] != "//N"]

這一行命令就確保了我們在 airline_id 這一列只含有數(shù)值型數(shù)據(jù)。

制作柱狀圖

現(xiàn)在我們理解了數(shù)據(jù)的結(jié)構(gòu)，我們可以進(jìn)一步地開始描點來繼續(xù)探索這個問題。首先，我們將要使用 matplotlib 這個工具，matplotlib 是一個相對底層的 Python 棧中的描點庫，所以它比其他的工具庫要多敲一些命令來做出一個好看的曲線。另外一方面，你可以使用 matplotlib 幾乎做出任何的曲線，這是因為它十分的靈活，而靈活的代價就是非常難于使用。

我們首先通過做出一個柱狀圖來顯示不同的航空公司的航線長度分布。一個柱狀圖將所有的航線的長度分割到不同的值域，然后對落入到不同的值域范圍內(nèi)的航線進(jìn)行計數(shù)。從中我們可以知道哪些航空公司的航線長，哪些航空公司的航線短。

為了達(dá)到這一點，我們需要首先計算一下航線的長度，***步就要使用距離公式，我們將會使用余弦半正矢距離公式來計算經(jīng)緯度刻畫的兩個點之間的距離。

import math 
def haversine(lon1, lat1, lon2, lat2): 
    # Convert coordinates to floats. 
    lon1, lat1, lon2, lat2 = [float(lon1), float(lat1), float(lon2), float(lat2)] 
    # Convert to radians from degrees. 
    lon1, lat1, lon2, lat2 = map(math.radians, [lon1, lat1, lon2, lat2]) 
    # Compute distance. 
    dlon = lon2 - lon1  
    dlat = lat2 - lat1  
    a = math.sin(dlat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2 
    c = 2 * math.asin(math.sqrt(a))  
    km = 6367 * c 
    return km

然后我們就可以使用一個函數(shù)來計算起點機場和終點機場之間的單程距離。我們需要從路線數(shù)據(jù)框架得到機場數(shù)據(jù)框架所對應(yīng)的 source_id 和 dest_id，然后與機場的數(shù)據(jù)集的 id 列相匹配，然后就只要計算就行了，這個函數(shù)是這樣的：

def calc_dist(row): 
    dist = 0 
    try: 
        # Match source and destination to get coordinates. 
        source = airports[airports["id"] == row["source_id"]].iloc[0] 
        dest = airports[airports["id"] == row["dest_id"]].iloc[0] 
        # Use coordinates to compute distance. 
        dist = haversine(dest["longitude"], dest["latitude"], source["longitude"], source["latitude"]) 
    except (ValueError, IndexError): 
        pass 
    return dist

如果 source_id 和 dest_id 列沒有有效值的話，那么這個函數(shù)會報錯。因此我們需要增加 try/catch 模塊對這種無效的情況進(jìn)行捕捉。

***，我們將要使用 pandas 來將距離計算的函數(shù)運用到 routes 數(shù)據(jù)框架。這將會使我們得到包含所有的航線線長度的 pandas 序列，其中航線線的長度都是以公里做單位。

route_lengths = routes.apply(calc_dist, axis=1)

現(xiàn)在我們就有了航線距離的序列了，我們將會創(chuàng)建一個柱狀圖，它將會將數(shù)據(jù)歸類到對應(yīng)的范圍之內(nèi)，然后計數(shù)分別有多少的航線落入到不同的每個范圍：

import matplotlib.pyplot as plt  
%matplotlib inline  
 
plt.hist(route_lengths, bins=20)

7 款 Python 可視化工具對比

我們用 import matplotlib.pyplot as plt 導(dǎo)入 matplotlib 描點函數(shù)。然后我們就使用 %matplotlib inline 來設(shè)置 matplotlib 在 ipython 的 notebook 中描點，最終我們就利用 plt.hist(route_lengths, bins=20) 得到了一個柱狀圖。正如我們看到的，航空公司傾向于運行近距離的短程航線，而不是遠(yuǎn)距離的遠(yuǎn)程航線。

使用 seaborn

我們可以利用 seaborn 來做類似的描點，seaborn 是一個 Python 的高級庫。Seaborn 建立在 matplotlib 的基礎(chǔ)之上，做一些類型的描點，這些工作常常與簡單的統(tǒng)計工作有關(guān)。我們可以基于一個核心的概率密度的期望，使用 distplot 函數(shù)來描繪一個柱狀圖。一個核心的密度期望是一個曲線 —— 本質(zhì)上是一個比柱狀圖平滑一點的，更容易看出其中的規(guī)律的曲線。

import seaborn  
seaborn.distplot(route_lengths, bins=20)

7 款 Python 可視化工具對比

正如你所看到的那樣，seaborn 同時有著更加好看的默認(rèn)風(fēng)格。seaborn 不含有與每個 matplotlib 的版本相對應(yīng)的版本，但是它的確是一個很好的快速描點工具，而且相比于 matplotlib 的默認(rèn)圖表可以更好的幫助我們理解數(shù)據(jù)背后的含義。如果你想更深入的做一些統(tǒng)計方面的工作的話，seaborn 也不失為一個很好的庫。

條形圖

柱狀圖也雖然很好，但是有時候我們會需要航空公司的平均路線長度。這時候我們可以使用條形圖－－每條航線都會有一個單獨的狀態(tài)條，顯示航空公司航線的平均長度。從中我們可以看出哪家是國內(nèi)航空公司哪家是國際航空公司。我們可以使用pandas，一個python的數(shù)據(jù)分析庫，來酸楚每個航空公司的平均航線長度。

import numpy 
# Put relevant columns into a dataframe. 
route_length_df = pandas.DataFrame({"length": route_lengths, "id": routes["airline_id"]}) 
# Compute the mean route length per airline. 
airline_route_lengths = route_length_df.groupby("id").aggregate(numpy.mean) 
# Sort by length so we can make a better chart. 
airline_route_lengths = airline_route_lengths.sort("length", ascending=False)

我們首先用航線長度和航空公司的id來搭建一個新的數(shù)據(jù)框架。我們基于airline_id把route_length_df拆分成組，為每個航空公司建立一個大體的數(shù)據(jù)框架。然后我們調(diào)用pandas的aggregate函數(shù)來獲取航空公司數(shù)據(jù)框架中長度列的均值，然后把每個獲取到的值重組到一個新的數(shù)據(jù)模型里。之后把數(shù)據(jù)模型進(jìn)行排序，這樣就使得擁有最多航線的航空公司拍到了前面。

這樣就可以使用matplotlib把結(jié)果畫出來。

plt.bar(range(airline_route_lengths.shape[0]), airline_route_lengths["length"])

7 款 Python 可視化工具對比

Matplotlib的plt.bar方法根據(jù)每個數(shù)據(jù)模型的航空公司平均航線長度（airline_route_lengths["length"]）來做圖。

問題是我們想看出哪家航空公司擁有的航線長度是什么并不容易。為了解決這個問題，我們需要能夠看到坐標(biāo)軸標(biāo)簽。這有點難，畢竟有這么多的航空公司。一個能使問題變得簡單的方法是使圖表具有交互性，這樣能實現(xiàn)放大跟縮小來查看軸標(biāo)簽。我們可以使用bokeh庫來實現(xiàn)這個－－它能便捷的實現(xiàn)交互性，作出可縮放的圖表。

要使用booked，我們需要先對數(shù)據(jù)進(jìn)行預(yù)處理：

def lookup_name(row): 
    try: 
        # Match the row id to the id in the airlines dataframe so we can get the name. 
        name = airlines["name"][airlines["id"] == row["id"]].iloc[0] 
    except (ValueError, IndexError): 
        name = "" 
    return name 
# Add the index (the airline ids) as a column. 
airline_route_lengths["id"] = airline_route_lengths.index.copy() 
# Find all the airline names. 
airline_route_lengths["name"] = airline_route_lengths.apply(lookup_name, axis=1) 
# Remove duplicate values in the index. 
airline_route_lengths.index = range(airline_route_lengths.shape[0])

上面的代碼會獲取airline_route_lengths中每列的名字，然后添加到name列上，這里存貯著每個航空公司的名字。我們也添加到id列上以實現(xiàn)查找（apply函數(shù)不傳index）。

***，我們重置索引序列以得到所有的特殊值。沒有這一步，Bokeh 無法正常運行。

現(xiàn)在，我們可以繼續(xù)說圖表問題：

import numpy as np 
from bokeh.io import output_notebook 
from bokeh.charts import Bar, show 
output_notebook() 
p = Bar(airline_route_lengths, 'name', values='length', title="Average airline route lengths") 
show(p)

用 output_notebook 創(chuàng)建背景虛化，在 iPython 的 notebook 里畫出圖。然后，使用數(shù)據(jù)幀和特定序列制作條形圖。***，顯示功能會顯示出該圖。

這個圖實際上不是一個圖像－－它是一個 JavaScript 插件。因此，我們在下面展示的是一幅屏幕截圖，而不是真實的表格。

有了它，我們可以放大，看哪一趟航班的飛行路線最長。上面的圖像讓這些表格看起來擠在了一起，但放大以后，看起來就方便多了。

水平條形圖

Pygal 是一個能快速制作出有吸引力表格的數(shù)據(jù)分析庫。我們可以用它來按長度分解路由。首先把我們的路由分成短、中、長三個距離，并在 route_lengths 里計算出它們各占的百分比。

long_routes = len([k for k in route_lengths if k > 10000]) / len(route_lengths) 
medium_routes = len([k for k in route_lengths if k < 10000 and k > 2000]) / len(route_lengths) 
short_routes = len([k for k in route_lengths if k < 2000]) / len(route_lengths)

然后我們可以在 Pygal 的水平條形圖里把每一個都繪成條形圖：

import pygal 
from IPython.display import SVG 
chart = pygal.HorizontalBar() 
chart.title = 'Long, medium, and short routes' 
chart.add('Long', long_routes * 100) 
chart.add('Medium', medium_routes * 100) 
chart.add('Short', short_routes * 100) 
chart.render_to_file('routes.svg') 
SVG(filename='routes.svg')

[[157805]]

首先，我們使用 pandasapplymethod 計算每個名稱的長度。它將找到每個航空公司的名字字符的數(shù)量。然后，我們使用 matplotlib 做一個散點圖來比較航空 id 的長度。當(dāng)我們繪制時，我們把 theidcolumn of airlines 轉(zhuǎn)換為整數(shù)類型。如果我們不這樣做是行不通的，因為它需要在 x 軸上的數(shù)值。我們可以看到不少的長名字都出現(xiàn)在早先的 id 中。這可能意味著航空公司在成立前往往有較長的名字。

我們可以使用 seaborn 驗證這個直覺。Seaborn 增強版的散點圖，一個聯(lián)合的點，它顯示了兩個變量是相關(guān)的，并有著類似地分布。

data = pandas.DataFrame({"lengths": name_lengths, "ids": airlines["id"].astype(int)})
seaborn.jointplot(x="ids", y="lengths", data=data)

7 款 Python 可視化工具對比

上面的圖表明，兩個變量之間的相關(guān)性是不明確的——r 的平方值是低的。

畫弧線

在地圖上看到所有的航空路線是很酷的，幸運的是，我們可以使用 basemap 來做這件事。我們將畫弧線連接所有的機場出發(fā)地和目的地。每個弧線想展示一個段都航線的路徑。不幸的是，展示所有的線路又有太多的路由，這將會是一團糟。替代，我們只現(xiàn)實前 3000 個路由。

# Make a base map with a mercator projection.  Draw the coastlines. 
m = Basemap(projection='merc',llcrnrlat=-80,urcrnrlat=80,llcrnrlon=-180,urcrnrlon=180,lat_ts=20,resolution='c') 
m.drawcoastlines() 
# Iterate through the first 3000 rows. 
for name, row in routes[:3000].iterrows(): 
    try: 
        # Get the source and dest airports. 
        source = airports[airports["id"] == row["source_id"]].iloc[0] 
        dest = airports[airports["id"] == row["dest_id"]].iloc[0] 
        # Don't draw overly long routes. 
        if abs(float(source["longitude"]) - float(dest["longitude"])) < 90: 
            # Draw a great circle between source and dest airports. 
            m.drawgreatcircle(float(source["longitude"]), float(source["latitude"]), float(dest["longitude"]), float(dest["latitude"]),linewidth=1,color='b') 
    except (ValueError, IndexError): 
        pass 
 
# Show the map. 
plt.show()

7 款 Python 可視化工具對比

上面的代碼將會畫一個地圖，然后再在地圖上畫線路。我們添加一了寫過濾器來阻止過長的干擾其他路由的長路由。

畫網(wǎng)絡(luò)圖

我們將做的最終的探索是畫一個機場網(wǎng)絡(luò)圖。每個機場將會是網(wǎng)絡(luò)中的一個節(jié)點，并且如果兩點之間有路由將劃出節(jié)點之間的連線。如果有多重路由，將添加線的權(quán)重，以顯示機場連接的更多。將使用 networkx 庫來做這個功能。

首先，計算機場之間連線的權(quán)重。

# Initialize the weights dictionary. 
weights = {} 
# Keep track of keys that have been added once -- we only want edges with a weight of more than 1 to keep our network size manageable. 
added_keys = [] 
# Iterate through each route. 
for name, row in routes.iterrows(): 
    # Extract the source and dest airport ids. 
    source = row["source_id"] 
    dest = row["dest_id"] 
 
    # Create a key for the weights dictionary. 
    # This corresponds to one edge, and has the start and end of the route. 
    key = "{0}_{1}".format(source, dest) 
    # If the key is already in weights, increment the weight. 
    if key in weights: 
        weights[key] += 1 
    # If the key is in added keys, initialize the key in the weights dictionary, with a weight of 2. 
    elif key in added_keys: 
        weights[key] = 2 
    # If the key isn't in added_keys yet, append it. 
    # This ensures that we aren't adding edges with a weight of 1. 
    else: 
        added_keys.append(key)

一旦上面的代碼運行，這個權(quán)重字典就包含了每兩個機場之間權(quán)重大于或等于 2 的連線。所以任何機場有兩個或者更多連接的路由將會顯示出來。

# Import networkx and initialize the graph. 
import networkx as nx 
graph = nx.Graph() 
# Keep track of added nodes in this set so we don't add twice. 
nodes = set() 
# Iterate through each edge. 
for k, weight in weights.items(): 
    try: 
        # Split the source and dest ids and convert to integers. 
        source, dest = k.split("_") 
        source, dest = [int(source), int(dest)] 
        # Add the source if it isn't in the nodes. 
        if source not in nodes: 
            graph.add_node(source) 
        # Add the dest if it isn't in the nodes. 
        if dest not in nodes: 
            graph.add_node(dest) 
        # Add both source and dest to the nodes set. 
        # Sets don't allow duplicates. 
        nodes.add(source) 
        nodes.add(dest) 
 
        # Add the edge to the graph. 
        graph.add_edge(source, dest, weight=weight) 
    except (ValueError, IndexError): 
        pass 
pos=nx.spring_layout(graph) 
# Draw the nodes and edges. 
nx.draw_networkx_nodes(graph,pos, node_color='red', node_size=10, alpha=0.8) 
nx.draw_networkx_edges(graph,pos,width=1.0,alpha=1) 
# Show the plot. 
plt.show()

總結(jié)

有一個成長的數(shù)據(jù)可視化的 Python 庫，它可能會制作任意一種可視化。大多數(shù)庫基于 matplotlib 構(gòu)建的并且確保一些用例更簡單。如果你想更深入的學(xué)習(xí)怎樣使用 matplotlib，seaborn 和其他工具來可視化數(shù)據(jù)，在這兒檢出其他課程。

責(zé)任編輯：王雪燕來源： oschina

Python 視化工具

51CTO技術(shù)棧公眾號

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營