Meta官方的Prompt工程指南:Llama 2這樣用更高效
隨著大型語言模型(LLM)技術(shù)日漸成熟,提示工程(Prompt Engineering)變得越來越重要。一些研究機(jī)構(gòu)發(fā)布了 LLM 提示工程指南,包括微軟、OpenAI 等等。
最近,Llama 系列開源模型的提出者 Meta 也針對 Llama 2 發(fā)布了一份交互式提示工程指南,涵蓋了 Llama 2 的快速工程和最佳實(shí)踐。

以下是這份指南的核心內(nèi)容。
Llama 模型
2023 年,Meta 推出了 Llama 、Llama 2 模型。較小的模型部署和運(yùn)行成本較低,而更大的模型能力更強(qiáng)。
Llama 2 系列模型參數(shù)規(guī)模如下:

Code Llama 是一個以代碼為中心的 LLM,建立在 Llama 2 的基礎(chǔ)上,也有各種參數(shù)規(guī)模和微調(diào)變體:

部署 LLM
LLM 可以通過多種方式部署和訪問,包括:
自托管(Self-hosting):使用本地硬件來運(yùn)行推理,例如使用 llama.cpp 在 Macbook Pro 上運(yùn)行 Llama 2。優(yōu)勢:自托管最適合有隱私 / 安全需要的情況,或者您擁有足夠的 GPU。
云托管:依靠云提供商來部署托管特定模型的實(shí)例,例如通過 AWS、Azure、GCP 等云提供商來運(yùn)行 Llama 2。優(yōu)勢:云托管是最適合自定義模型及其運(yùn)行時的方式。
托管 API:通過 API 直接調(diào)用 LLM。有許多公司提供 Llama 2 推理 API,包括 AWS Bedrock、Replicate、Anyscale、Together 等。優(yōu)勢:托管 API 是總體上最簡單的選擇。
托管 API
托管 API 通常有兩個主要端點(diǎn)(endpoint):
1. completion:生成對給定 prompt 的響應(yīng)。
2. chat_completion:生成消息列表中的下一條消息,為聊天機(jī)器人等用例提供更明確的指令和上下文。
token
LLM 以稱為 token 的塊的形式來處理輸入和輸出,每個模型都有自己的 tokenization 方案。比如下面這句話:
Our destiny is written in the stars.
Llama 2 的 tokenization 為 ["our", "dest", "iny", "is", "writing", "in", "the", "stars"]??紤] API 定價和內(nèi)部行為(例如超參數(shù))時,token 顯得尤為重要。每個模型都有一個 prompt 不能超過的最大上下文長度,Llama 2 是 4096 個 token,而 Code Llama 是 100K 個 token。
Notebook 設(shè)置
作為示例,我們使用 Replicate 調(diào)用 Llama 2 chat,并使用 LangChain 輕松設(shè)置 chat completion API。
首先安裝先決條件:
pip install langchain replicatefrom typing import Dict, List
from langchain.llms import Replicate
from langchain.memory import ChatMessageHistory
from langchain.schema.messages import get_buffer_string
import os
# Get a free API key from https://replicate.com/account/api-tokens
os.environ ["REPLICATE_API_TOKEN"] = "YOUR_KEY_HERE"
LLAMA2_70B_CHAT = "meta/llama-2-70b-chat:2d19859030ff705a87c746f7e96eea03aefb71f166725aee39692f1476566d48"
LLAMA2_13B_CHAT = "meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d"
# We'll default to the smaller 13B model for speed; change to LLAMA2_70B_CHAT for more advanced (but slower) generations
DEFAULT_MODEL = LLAMA2_13B_CHAT
def completion (
prompt: str,
model: str = DEFAULT_MODEL,
temperature: float = 0.6,
top_p: float = 0.9,
) -> str:
llm = Replicate (
model=model,
model_kwargs={"temperature": temperature,"top_p": top_p, "max_new_tokens": 1000}
)
return llm (prompt)
def chat_completion (
messages: List [Dict],
model = DEFAULT_MODEL,
temperature: float = 0.6,
top_p: float = 0.9,
) -> str:
history = ChatMessageHistory ()
for message in messages:
if message ["role"] == "user":
history.add_user_message (message ["content"])
elif message ["role"] == "assistant":
history.add_ai_message (message ["content"])
else:
raise Exception ("Unknown role")
return completion (
get_buffer_string (
history.messages,
human_prefix="USER",
ai_prefix="ASSISTANT",
),
model,
temperature,
top_p,
)
def assistant (content: str):
return { "role": "assistant", "content": content }
def user (content: str):
return { "role": "user", "content": content }
def complete_and_print (prompt: str, model: str = DEFAULT_MODEL):
print (f'==============\n {prompt}\n==============')
response = completion (prompt, model)
print (response, end='\n\n')Completion API
complete_and_print ("The typical color of the sky is:")complete_and_print ("which model version are you?")Chat Completion 模型提供了與 LLM 互動的額外結(jié)構(gòu),將結(jié)構(gòu)化消息對象數(shù)組而不是單個文本發(fā)送到 LLM。此消息列表為 LLM 提供了一些可以繼續(xù)進(jìn)行的「背景」或「歷史」信息。
通常,每條消息都包含角色和內(nèi)容:
具有系統(tǒng)角色的消息用于開發(fā)人員向 LLM 提供核心指令。
具有用戶角色的消息通常是人工提供的消息。
具有助手角色的消息通常由 LLM 生成。
response = chat_completion (messages=[
user ("My favorite color is blue."),
assistant ("That's great to hear!"),
user ("What is my favorite color?"),
])
print (response)
# "Sure, I can help you with that! Your favorite color is blue."LLM 超參數(shù)
LLM API 通常會采用影響輸出的創(chuàng)造性和確定性的參數(shù)。在每一步中,LLM 都會生成 token 及其概率的列表??赡苄宰钚〉?token 會從列表中「剪切」(基于 top_p),然后從剩余候選者中隨機(jī)(溫度參數(shù) temperature)選擇一個 token。換句話說:top_p 控制生成中詞匯的廣度,溫度控制詞匯的隨機(jī)性,溫度參數(shù) temperature 為 0 會產(chǎn)生幾乎確定的結(jié)果。
def print_tuned_completion (temperature: float, top_p: float):
response = completion ("Write a haiku about llamas", temperature=temperature, top_p=top_p)
print (f'[temperature: {temperature} | top_p: {top_p}]\n {response.strip ()}\n')
print_tuned_completion (0.01, 0.01)
print_tuned_completion (0.01, 0.01)
# These two generations are highly likely to be the same
print_tuned_completion (1.0, 1.0)
print_tuned_completion (1.0, 1.0)
# These two generations are highly likely to be differentprompt 技巧
詳細(xì)、明確的指令會比開放式 prompt 產(chǎn)生更好的結(jié)果:
complete_and_print (prompt="Describe quantum physics in one short sentence of no more than 12 words")
# Returns a succinct explanation of quantum physics that mentions particles and states existing simultaneously.我們可以給定使用規(guī)則和限制,以給出明確的指令。
- 風(fēng)格化,例如:
- 向我解釋一下這一點(diǎn),就像兒童教育網(wǎng)絡(luò)節(jié)目中教授小學(xué)生一樣;
- 我是一名軟件工程師,使用大型語言模型進(jìn)行摘要。用 250 字概括以下文字;
- 像私家偵探一樣一步步追查案件,給出你的答案。
- 格式化
使用要點(diǎn);
以 JSON 對象形式返回;
使用較少的技術(shù)術(shù)語并用于工作交流中。
- 限制
- 僅使用學(xué)術(shù)論文;
- 切勿提供 2020 年之前的來源;
- 如果你不知道答案,就說你不知道。
以下是給出明確指令的例子:
complete_and_print ("Explain the latest advances in large language models to me.")
# More likely to cite sources from 2017
complete_and_print ("Explain the latest advances in large language models to me. Always cite your sources. Never cite sources older than 2020.")
# Gives more specific advances and only cites sources from 2020零樣本 prompting
一些大型語言模型(例如 Llama 2)能夠遵循指令并產(chǎn)生響應(yīng),而無需事先看過任務(wù)示例。沒有示例的 prompting 稱為「零樣本 prompting(zero-shot prompting)」。例如:
complete_and_print ("Text: This was the best movie I've ever seen! \n The sentiment of the text is:")
# Returns positive sentiment
complete_and_print ("Text: The director was trying too hard. \n The sentiment of the text is:")
# Returns negative sentiment少樣本 prompting
添加所需輸出的具體示例通常會產(chǎn)生更加準(zhǔn)確、一致的輸出。這種方法稱為「少樣本 prompting(few-shot prompting)」。例如:
def sentiment (text):
response = chat_completion (messages=[
user ("You are a sentiment classifier. For each message, give the percentage of positive/netural/negative."),
user ("I liked it"),
assistant ("70% positive 30% neutral 0% negative"),
user ("It could be better"),
assistant ("0% positive 50% neutral 50% negative"),
user ("It's fine"),
assistant ("25% positive 50% neutral 25% negative"),
user (text),
])
return response
def print_sentiment (text):
print (f'INPUT: {text}')
print (sentiment (text))
print_sentiment ("I thought it was okay")
# More likely to return a balanced mix of positive, neutral, and negative
print_sentiment ("I loved it!")
# More likely to return 100% positive
print_sentiment ("Terrible service 0/10")
# More likely to return 100% negativeRole Prompting
Llama 2 在指定角色時通常會給出更一致的響應(yīng),角色為 LLM 提供了所需答案類型的背景信息。
例如,讓 Llama 2 對使用 PyTorch 的利弊問題創(chuàng)建更有針對性的技術(shù)回答:
complete_and_print ("Explain the pros and cons of using PyTorch.")
# More likely to explain the pros and cons of PyTorch covers general areas like documentation, the PyTorch community, and mentions a steep learning curve
complete_and_print ("Your role is a machine learning expert who gives highly technical advice to senior engineers who work with complicated datasets. Explain the pros and cons of using PyTorch.")
# Often results in more technical benefits and drawbacks that provide more technical details on how model layers思維鏈
簡單地添加一個「鼓勵逐步思考」的短語可以顯著提高大型語言模型執(zhí)行復(fù)雜推理的能力(Wei et al. (2022)),這種方法稱為 CoT 或思維鏈 prompting:
complete_and_print ("Who lived longer Elvis Presley or Mozart?")
# Often gives incorrect answer of "Mozart"
complete_and_print ("Who lived longer Elvis Presley or Mozart? Let's think through this carefully, step by step.")
# Gives the correct answer "Elvis"自洽性(Self-Consistency)
LLM 是概率性的,因此即使使用思維鏈,一次生成也可能會產(chǎn)生不正確的結(jié)果。自洽性通過從多次生成中選擇最常見的答案來提高準(zhǔn)確性(以更高的計(jì)算成本為代價):
import re
from statistics import mode
def gen_answer ():
response = completion (
"John found that the average of 15 numbers is 40."
"If 10 is added to each number then the mean of the numbers is?"
"Report the answer surrounded by three backticks, for example:```123```",
model = LLAMA2_70B_CHAT
)
match = re.search (r'```(\d+)```', response)
if match is None:
return None
return match.group (1)
answers = [gen_answer () for i in range (5)]
print (
f"Answers: {answers}\n",
f"Final answer: {mode (answers)}",
)
# Sample runs of Llama-2-70B (all correct):
# [50, 50, 750, 50, 50] -> 50
# [130, 10, 750, 50, 50] -> 50
# [50, None, 10, 50, 50] -> 50檢索增強(qiáng)生成
有時我們可能希望在應(yīng)用程序中使用事實(shí)知識,那么可以從開箱即用(即僅使用模型權(quán)重)的大模型中提取常見事實(shí):
complete_and_print ("What is the capital of the California?", model = LLAMA2_70B_CHAT)
# Gives the correct answer "Sacramento"然而,LLM 往往無法可靠地檢索更具體的事實(shí)或私人信息。模型要么聲明它不知道,要么幻想出一個錯誤的答案:
complete_and_print ("What was the temperature in Menlo Park on December 12th, 2023?")
# "I'm just an AI, I don't have access to real-time weather data or historical weather records."
complete_and_print ("What time is my dinner reservation on Saturday and what should I wear?")
# "I'm not able to access your personal information [..] I can provide some general guidance"檢索增強(qiáng)生成(RAG)是指在 prompt 中包含從外部數(shù)據(jù)庫檢索的信息(Lewis et al. (2020))。RAG 是將事實(shí)納入 LLM 應(yīng)用的有效方法,并且比微調(diào)更經(jīng)濟(jì)實(shí)惠,微調(diào)可能成本高昂并對基礎(chǔ)模型的功能產(chǎn)生負(fù)面影響。
MENLO_PARK_TEMPS = {
"2023-12-11": "52 degrees Fahrenheit",
"2023-12-12": "51 degrees Fahrenheit",
"2023-12-13": "51 degrees Fahrenheit",
}
def prompt_with_rag (retrived_info, question):
complete_and_print (
f"Given the following information: '{retrived_info}', respond to: '{question}'"
)
def ask_for_temperature (day):
temp_on_day = MENLO_PARK_TEMPS.get (day) or "unknown temperature"
prompt_with_rag (
f"The temperature in Menlo Park was {temp_on_day} on {day}'", # Retrieved fact
f"What is the temperature in Menlo Park on {day}?", # User question
)
ask_for_temperature ("2023-12-12")
# "Sure! The temperature in Menlo Park on 2023-12-12 was 51 degrees Fahrenheit."
ask_for_temperature ("2023-07-18")
# "I'm not able to provide the temperature in Menlo Park on 2023-07-18 as the information provided states that the temperature was unknown."程序輔助語言模型
LLM 本質(zhì)上不擅長執(zhí)行計(jì)算,例如:
complete_and_print ("""
Calculate the answer to the following math problem:
((-5 + 93 * 4 - 0) * (4^4 + -7 + 0 * 5))
""")
# Gives incorrect answers like 92448, 92648, 95463Gao et al. (2022) 提出「程序輔助語言模型(Program-aided Language Models,PAL)」的概念。雖然 LLM 不擅長算術(shù),但它們非常擅長代碼生成。PAL 通過指示 LLM 編寫代碼來解決計(jì)算任務(wù)。
complete_and_print (
"""
# Python code to calculate: ((-5 + 93 * 4 - 0) * (4^4 + -7 + 0 * 5))
""",
model="meta/codellama-34b:67942fd0f55b66da802218a19a8f0e1d73095473674061a6ea19f2dc8c053152"
)# The following code was generated by Code Llama 34B:
num1 = (-5 + 93 * 4 - 0)
num2 = (4**4 + -7 + 0 * 5)
answer = num1 * num2
print (answer)
































