偷偷摘套内射激情视频,久久精品99国产国产精,中文字幕无线乱码人妻,中文在线中文a,性爽19p

<cite id="cdxyg"></cite>

<sub id="cdxyg"></sub>

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

信創(chuàng)認(rèn)證

公眾號矩陣

移動端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考信創(chuàng)認(rèn)證華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營

鴻蒙開發(fā)者社區(qū)訂閱號

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號

51CTO軟考題庫

賬號設(shè)置退出

Python可復(fù)用函數(shù)的六種最佳實(shí)踐

作者：云朵君 2023-08-26 20:51:25

開發(fā) 前端

在編寫Python函數(shù)時(shí)，你不需要記住所有這些最佳實(shí)踐。衡量一個Python函數(shù)質(zhì)量的一個很好的指標(biāo)是它的可測試性。如果一個函數(shù)可以很容易地被測試，這表明該函數(shù)是模塊化的，執(zhí)行單一的任務(wù)，并且沒有重復(fù)的代碼。

對于在一個有各種角色的團(tuán)隊(duì)中工作的數(shù)據(jù)科學(xué)家來說，編寫干凈的代碼是一項(xiàng)必備的技能，因?yàn)椋?

清晰的代碼增強(qiáng)了可讀性，使團(tuán)隊(duì)成員更容易理解和貢獻(xiàn)于代碼庫。
清晰的代碼提高了可維護(hù)性，簡化了調(diào)試、修改和擴(kuò)展現(xiàn)有代碼等任務(wù)。

為了實(shí)現(xiàn)可維護(hù)性，我們的Python函數(shù)應(yīng)該：

小型
只做一項(xiàng)任務(wù)
沒有重復(fù)
有一個層次的抽象性
有一個描述性的名字
有少于四個參數(shù)

我們先來看看下面的 get_data 函數(shù)。

import xml.etree.ElementTree as ET
import zipfile
from pathlib import Path
import gdown

def get_data(
    url: str,
    zip_path: str,
    raw_train_path: str,
    raw_test_path: str,
    processed_train_path: str,
    processed_test_path: str,
):
    # Download data from Google Drive
    zip_path = "Twitter.zip"
    gdown.download(url, zip_path, quiet=False)

    # Unzip data
    with zipfile.ZipFile(zip_path, "r") as zip_ref:
        zip_ref.extractall(".")

    # Extract texts from files in the train directory
    t_train = []
    for file_path in Path(raw_train_path).glob("*.xml"):
        list_train_doc_1 = [r.text for r in ET.parse(file_path).getroot()[0]]
        train_doc_1 = " ".join(t for t in list_train_doc_1)
        t_train.append(train_doc_1)
    t_train_docs = " ".join(t_train)

    # Extract texts from files in the test directory
    t_test = []
    for file_path in Path(raw_test_path).glob("*.xml"):
        list_test_doc_1 = [r.text for r in ET.parse(file_path).getroot()[0]]
        test_doc_1 = " ".join(t for t in list_test_doc_1)
        t_test.append(test_doc_1)
    t_test_docs = " ".join(t_test)

    # Write processed data to a train file
    with open(processed_train_path, "w") as f:
        f.write(t_train_docs)

    # Write processed data to a test file
    with open(processed_test_path, "w") as f:
        f.write(t_test_docs)


if __name__ == "__main__":
    get_data(
        url="https://drive.google.com/uc?id=1jI1cmxqnwsmC-vbl8dNY6b4aNBtBbKy3",
        zip_path="Twitter.zip",
        raw_train_path="Data/train/en",
        raw_test_path="Data/test/en",
        processed_train_path="Data/train/en.txt",
        processed_test_path="Data/test/en.txt",
    )

盡管在這個函數(shù)中有許多注釋，但很難理解這個函數(shù)的作用，因?yàn)椋?/p>

該函數(shù)很長。
該函數(shù)試圖完成多項(xiàng)任務(wù)。
函數(shù)內(nèi)的代碼處于不同的抽象層次。
該函數(shù)有許多參數(shù)。
有多個代碼重復(fù)。
該函數(shù)缺少一個描述性的名稱。

我們將通過使用文章開頭提到的六種做法來重構(gòu)這段代碼。

小型

一個函數(shù)應(yīng)該保持很小，以提高其可讀性。理想情況下，一個函數(shù)的代碼不應(yīng)超過20行。此外，一個函數(shù)的縮進(jìn)程度不應(yīng)超過1或2。

import zipfile
import gdown

def get_raw_data(url: str, zip_path: str) -> None:
    gdown.download(url, zip_path, quiet=False)
    with zipfile.ZipFile(zip_path, "r") as zip_ref:
        zip_ref.extractall(".")

只做一個任務(wù)

函數(shù)應(yīng)該有一個單一的重點(diǎn)，并執(zhí)行單一的任務(wù)。函數(shù)get_data試圖完成多項(xiàng)任務(wù)，包括從Google Drive檢索數(shù)據(jù)，執(zhí)行文本提取，并保存提取的文本。

因此，這個函數(shù)應(yīng)該被分成幾個小的函數(shù)，如下圖所示：

def main(
    url: str,
    zip_path: str,
    raw_train_path: str,
    raw_test_path: str,
    processed_train_path: str,
    processed_test_path: str,
) -> None:
    get_raw_data(url, zip_path)
    t_train, t_test = get_train_test_docs(raw_train_path, raw_test_path)
    save_train_test_docs(processed_train_path, processed_test_path, t_train, t_test)

這些功能中的每一個都應(yīng)該有一個單一的目的：

def get_raw_data(url: str, zip_path: str) -> None:
    gdown.download(url, zip_path, quiet=False)
    with zipfile.ZipFile(zip_path, "r") as zip_ref:
        zip_ref.extractall(".")

函數(shù)get_raw_data只執(zhí)行一個動作，那就是獲取原始數(shù)據(jù)。

重復(fù)性

我們應(yīng)該避免重復(fù)，因?yàn)椋?/p>

重復(fù)的代碼削弱了代碼的可讀性。
重復(fù)的代碼使代碼修改更加復(fù)雜。如果需要修改，需要在多個地方進(jìn)行修改，增加了出錯的可能性。

下面的代碼包含重復(fù)的內(nèi)容，用于檢索訓(xùn)練和測試數(shù)據(jù)的代碼幾乎是相同的。

from pathlib import Path  

 # 從train目錄下的文件中提取文本
t_train = []
for file_path in Path(raw_train_path).glob("*.xml"):
    list_train_doc_1 = [r.text for r in ET.parse(file_path).getroot()[0]]
    train_doc_1 = " ".join(t for t in list_train_doc_1)
    t_train.append(train_doc_1)
t_train_docs = " ".join(t_train)

# 從測試目錄的文件中提取文本
t_test = []
for file_path in Path(raw_test_path).glob("*.xml"):
    list_test_doc_1 = [r.text for r in ET.parse(file_path).getroot()[0]]
    test_doc_1 = " ".join(t for t in list_test_doc_1)
    t_test.append(test_doc_1)
t_test_docs = " ".join(t_test)

我們可以通過將重復(fù)的代碼合并到一個名為extract_texts_from_multiple_files的單一函數(shù)中來消除重復(fù)，該函數(shù)從指定位置的多個文件中提取文本。

def extract_texts_from_multiple_files(folder_path) -> str:

all_docs = []
for file_path in Path(folder_path).glob("*.xml"):
    list_of_text_in_one_file = [r.text for r in ET.parse(file_path).getroot()[0]]
    text_in_one_file = " ".join(list_of_text_in_one_file)
    all_docs.append(text_in_one_file)

return " ".join(all_docs)

現(xiàn)在你可以使用這個功能從不同的地方提取文本，而不需要重復(fù)編碼。

t_train = extract_texts_from_multiple_files(raw_train_path)
t_test  = extract_texts_from_multiple_files(raw_test_path)

一個層次的抽象

抽象水平是指一個系統(tǒng)的復(fù)雜程度。高層次指的是對系統(tǒng)更概括的看法，而低層次指的是系統(tǒng)更具體的方面。

在一個代碼段內(nèi)保持相同的抽象水平是一個很好的做法，使代碼更容易理解。

以下函數(shù)證明了這一點(diǎn)：

def extract_texts_from_multiple_files(folder_path) -> str:

    all_docs = []
    for file_path in Path(folder_path).glob("*.xml"):
        list_of_text_in_one_file = [r.text for r in ET.parse(file_path).getroot()[0]]
        text_in_one_file = " ".join(list_of_text_in_one_file)
        all_docs.append(text_in_one_file)

    return " ".join(all_docs)

該函數(shù)本身處于較高層次，但 for 循環(huán)內(nèi)的代碼涉及與XML解析、文本提取和字符串操作有關(guān)的較低層次的操作。

為了解決這種抽象層次的混合，我們可以將低層次的操作封裝在extract_texts_from_each_file函數(shù)中：

def extract_texts_from_multiple_files(folder_path: str) -> str:
    all_docs = []
    for file_path in Path(folder_path).glob("*.xml"):
        text_in_one_file = extract_texts_from_each_file(file_path)
        all_docs.append(text_in_one_file)

    return " ".join(all_docs)
    

def extract_texts_from_each_file(file_path: str) -> str:
    list_of_text_in_one_file = [r.text for r in ET.parse(file_path).getroot()[0]]
    return " ".join(list_of_text_in_one_file)

這為文本提取過程引入了更高層次的抽象，使代碼更具可讀性。

描述性的名稱

一個函數(shù)的名字應(yīng)該有足夠的描述性，使用戶不用閱讀代碼就能理解其目的。長一點(diǎn)的、描述性的名字比模糊的名字要好。例如，命名一個函數(shù)get_texts就不如命名為extract_texts_from_multiple_files來得清楚。

然而，如果一個函數(shù)的名字變得太長，比如retrieve_data_extract_text_and_save_data，這說明這個函數(shù)可能做了太多的事情，應(yīng)該拆分成更小的函數(shù)。

少于四個參數(shù)

隨著函數(shù)參數(shù)數(shù)量的增加，跟蹤眾多參數(shù)之間的順序、目的和關(guān)系變得更加復(fù)雜。這使得開發(fā)人員難以理解和使用該函數(shù)。

def main(
    url: str,
    zip_path: str,
    raw_train_path: str,
    raw_test_path: str,
    processed_train_path: str,
    processed_test_path: str,
) -> None:
    get_raw_data(url, zip_path)
    t_train, t_test = get_train_test_docs(raw_train_path, raw_test_path)
    save_train_test_docs(processed_train_path, processed_test_path, t_train, t_test)

為了提高代碼的可讀性，你可以用數(shù)據(jù)類或Pydantic模型將多個相關(guān)參數(shù)封裝在一個數(shù)據(jù)結(jié)構(gòu)中。

from pydantic import BaseModel

class RawLocation(BaseModel):
    url: str
    zip_path: str
    path_train: str
    path_test: str


class ProcessedLocation(BaseModel):
    path_train: str
    path_test: str


def main(raw_location: RawLocation, processed_location: ProcessedLocation) -> None:
    get_raw_data(raw_location)
    t_train, t_test = get_train_test_docs(raw_location)
    save_train_test_docs(processed_location, t_train, t_test)

我如何寫這樣的函數(shù)？

在編寫Python函數(shù)時(shí)，你不需要記住所有這些最佳實(shí)踐。衡量一個Python函數(shù)質(zhì)量的一個很好的指標(biāo)是它的可測試性。如果一個函數(shù)可以很容易地被測試，這表明該函數(shù)是模塊化的，執(zhí)行單一的任務(wù)，并且沒有重復(fù)的代碼。

def save_data(processed_path: str, processed_data: str) -> None:
    with open(processed_path, "w") as f:
        f.write(processed_data)


def test_save_data(tmp_path):
    processed_path = tmp_path / "processed_data.txt"
    processed_data = "Sample processed data"

    save_data(processed_path, processed_data)

    assert processed_path.exists()
    assert processed_path.read_text() == processed_data

參考文獻(xiàn) Martin, R. C. (2009).Clean code：A handbook of agile software craftsmanship.Upper Saddle River：Prentice Hall.

責(zé)任編輯：武曉燕來源：數(shù)據(jù)STUDIO

Python 函數(shù)代碼

點(diǎn)贊

51CTO技術(shù)棧公眾號

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營

<sup id="jx5ih"></sup>