偷偷摘套内射激情视频,久久精品99国产国产精,中文字幕无线乱码人妻,中文在线中文a,性爽19p

構(gòu)建具備深度思考能力的 Agentic RAG 流水線,用于解決復(fù)雜查詢

人工智能
目前,我們的 Policy Agent(決定 CONTINUE? 或 FINISH)依賴于像 GPT-4o 這樣的通用 LLM,每次都要調(diào)用。盡管有效,但在生產(chǎn)環(huán)境可能較慢且昂貴。學(xué)術(shù)前沿提出了更優(yōu)的路徑。

很多 RAG 系統(tǒng)失敗,并不是因?yàn)?LLM 不夠聰明,而是因?yàn)樗鼈兊募軜?gòu)太簡(jiǎn)單。它們?cè)噲D用線性的一次性方式,處理一個(gè)本質(zhì)上循環(huán)、多步驟的問(wèn)題。

許多復(fù)雜查詢需要推理、反思,以及何時(shí)行動(dòng)的聰明決策,這與我們面對(duì)問(wèn)題時(shí)如何檢索信息非常相似。這正是 RAG 流水線中引入“agent 驅(qū)動(dòng)行為”的用武之地。下面看看一個(gè)典型的深度思考 RAG 流水線長(zhǎng)什么樣……

Deep Thinking RAG Pipeline (Created by Fareed Khan)Deep Thinking RAG Pipeline (Created by Fareed Khan)

Deep Thinking RAG Pipeline (Created by Fareed Khan)

  1. Plan:首先,agent 將復(fù)雜用戶查詢拆解成結(jié)構(gòu)化的多步驟研究計(jì)劃,并決定每一步使用何種工具(內(nèi)部文檔搜索或 web 搜索)。
  2. Retrieve:對(duì)每一步,執(zhí)行自適應(yīng)的多階段檢索漏斗,由一個(gè) supervisor 動(dòng)態(tài)選擇最佳搜索策略(vector、keyword 或 hybrid)。
  3. Refine:使用高精度 Cross-Encoder 對(duì)初始結(jié)果進(jìn)行重排,并由 distiller agent 將最佳證據(jù)壓縮為簡(jiǎn)潔的上下文。
  4. Reflect:每一步后,agent 總結(jié)當(dāng)前發(fā)現(xiàn)并更新研究歷史,逐步構(gòu)建對(duì)問(wèn)題的累積理解。
  5. Critique:隨后,一個(gè) policy agent 檢查這段歷史,策略性決策是繼續(xù)下一步、遇到死胡同時(shí)修訂計(jì)劃,還是結(jié)束。
  6. Synthesize:研究完成后,最終的 agent 將來(lái)自所有來(lái)源的證據(jù)綜合為單一、全面且可引用的答案。

在這篇文章中,我們將實(shí)現(xiàn)完整的“深度思考 RAG 流水線”,并與基礎(chǔ) RAG 流水線做對(duì)比,展示它是如何解決復(fù)雜的 multi-hop 查詢的。

所有代碼與理論都在我的 GitHub 倉(cāng)庫(kù):

GitHub - FareedKhan-dev/deep-thinking-rag: A Deep Thinking RAG Pipeline to Solve Complex Queries

?? 目錄

  • ?? 目錄
  • 環(huán)境配置
  • 知識(shí)庫(kù)來(lái)源
  • 理解多源、多跳查詢
  • 構(gòu)建一個(gè)會(huì)失敗的淺層 RAG 流水線
  • 定義中央智能體系統(tǒng)的 RAG 狀態(tài)
  • 戰(zhàn)略規(guī)劃與查詢制定

使用工具感知規(guī)劃器分解問(wèn)題

使用查詢重寫(xiě)智能體優(yōu)化檢索

通過(guò)元數(shù)據(jù)感知分塊提升精度

  • 創(chuàng)建多階段檢索漏斗
  • 使用監(jiān)督器動(dòng)態(tài)選擇策略
  • 利用混合、關(guān)鍵詞與語(yǔ)義搜索進(jìn)行廣泛召回
  • 使用交叉編碼器重排器實(shí)現(xiàn)高精度
  • 通過(guò)上下文蒸餾進(jìn)行綜合
  • 使用網(wǎng)絡(luò)搜索增強(qiáng)知識(shí)
  • 自我評(píng)估與控制流策略
  • 更新并反映累積研究歷史
  • B構(gòu)建用于控制流的策略智能體
  • 定義圖節(jié)點(diǎn)
  • 定義條件邊
  • 連接深度思考 RAG 機(jī)器
  • 編譯與可視化迭代工作流
  • 運(yùn)行深度思考流水線
  • 分析最終高質(zhì)量答案
  • 并排對(duì)比
  • 評(píng)估框架與結(jié)果分析
  • 總結(jié)整個(gè)流水線
  • 使用馬爾可夫決策過(guò)程(MDP)學(xué)習(xí)策略

環(huán)境配置

在開(kāi)始編寫(xiě) Deep RAG 流水線前,我們需要打好基礎(chǔ),因?yàn)橐粋€(gè)生產(chǎn)級(jí) AI 系統(tǒng)不僅僅是最終算法,還包括在搭建時(shí)做出的深思熟慮的選擇。

我們將要實(shí)現(xiàn)的每個(gè)步驟,都會(huì)直接影響最終系統(tǒng)的有效性和可靠性。

當(dāng)開(kāi)始開(kāi)發(fā)流水線并不斷試錯(cuò)時(shí),最好把配置定義為一個(gè)簡(jiǎn)單的字典。等流程復(fù)雜起來(lái),就可以直接回到這個(gè)字典,調(diào)整配置并觀察對(duì)整體性能的影響。

# Central Configuration Dictionary to manage all system parameters
config = {
    "data_dir": "./data",                           # Directory to store raw and cleaned data
    "vector_store_dir": "./vector_store",           # Directory to persist our vector store
    "llm_provider": "openai",                       # The LLM provider we are using
    "reasoning_llm": "gpt-4o",                      # The powerful model for planning and synthesis
    "fast_llm": "gpt-4o-mini",                      # A faster, cheaper model for simpler tasks like the baseline RAG
    "embedding_model": "text-embedding-3-small",    # The model for creating document embeddings
    "reranker_model": "cross-encoder/ms-marco-MiniLM-L-6-v2", # The model for precision reranking
    "max_reasoning_iterations": 7,                  # A safeguard to prevent the agent from getting into an infinite loop
    "top_k_retrieval": 10,                          # Number of documents for initial broad recall
    "top_n_rerank": 3,                              # Number of documents to keep after precision reranking
}

這些鍵大都很好理解,但有三個(gè)值得強(qiáng)調(diào):

  • llm_provider:我們使用的 LLM 提供方,這里用的是 OpenAI。之所以選擇 OpenAI,是因?yàn)樵?LangChain 中我們可以很容易地切換模型和提供方;你也可以選擇適合自己的,比如 Ollama。
  • reasoning_llm:在整個(gè)系統(tǒng)里它必須是最強(qiáng)的,因?yàn)橐袚?dān)規(guī)劃與綜合。
  • fast_llm:用在更簡(jiǎn)單的任務(wù)上(比如 baseline RAG),應(yīng)更快更省。

接下來(lái)導(dǎo)入流水線會(huì)用到的庫(kù),并把 API keys 設(shè)為環(huán)境變量,避免在代碼中暴露。

import os
import re
import json
from getpass import getpass
from pprint import pprint
import uuid
from typing importList, Dict, TypedDict, Literal, Optional

def_set_env(var: str):
    ifnot os.environ.get(var):
        os.environ[var] = getpass(f"Enter your {var}: ")

_set_env("OPENAI_API_KEY")
_set_env("LANGSMITH_API_KEY")
_set_env("TAVILY_API_KEY")

os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_PROJECT"] = "Advanced-Deep-Thinking-RAG"

我們同時(shí)啟用了 LangSmith 的 tracing。在一個(gè) agentic 系統(tǒng)里,工作流復(fù)雜且循環(huán),tracing 并非可有可無(wú),而是很重要。它幫助你可視化內(nèi)部過(guò)程,更容易調(diào)試 agent 的思考路徑。

知識(shí)庫(kù)來(lái)源

一個(gè)生產(chǎn)級(jí) RAG 系統(tǒng)需要既復(fù)雜又有挑戰(zhàn)性的知識(shí)庫(kù),才能真正體現(xiàn)其有效性。我們將使用 NVIDIA 的 2023 年 10-K 報(bào)告(鏈接),這是一份超過(guò)百頁(yè)的文件,詳述公司的業(yè)務(wù)運(yùn)營(yíng)、財(cái)務(wù)表現(xiàn)和風(fēng)險(xiǎn)因素披露。

Sourcing the Knowledge Base (Created by Fareed Khan)Sourcing the Knowledge Base (Created by Fareed Khan)

首先實(shí)現(xiàn)一個(gè)自定義函數(shù),直接從 SEC EDGAR 數(shù)據(jù)庫(kù)下載 10-K 報(bào)告,解析原始 HTML,并轉(zhuǎn)換成干凈、結(jié)構(gòu)化的文本,供我們的 RAG 流水線攝取。

import requests
from bs4 import BeautifulSoup
from langchain.docstore.document import Document

defdownload_and_parse_10k(url, doc_path_raw, doc_path_clean):
    if os.path.exists(doc_path_clean):
        print(f"Cleaned 10-K file already exists at: {doc_path_clean}")
        return

    print(f"Downloading 10-K filing from {url}...")
    headers = {'User-Agent': 'Mozilla/5.0'}
    response = requests.get(url, headers=headers)
    response.raise_for_status()

    withopen(doc_path_raw, 'w', encoding='utf-8') as f:
        f.write(response.text)
    print(f"Raw document saved to {doc_path_raw}")

    soup = BeautifulSoup(response.content, 'html.parser')

    text = ''
    for p in soup.find_all(['p', 'div', 'span']):
        text += p.get_text(strip=True) + '\n\n'

    clean_text = re.sub(r'\n{3,}', '\n\n', text).strip()
    clean_text = re.sub(r'\s{2,}', ' ', clean_text).strip()

    withopen(doc_path_clean, 'w', encoding='utf-8') as f:
        f.write(clean_text)
    print(f"Cleaned text content extracted and saved to {doc_path_clean}")

這段代碼很直觀,使用 beautifulsoup4 解析 HTML 并提取文本,可方便地在 HTML 結(jié)構(gòu)中導(dǎo)航,獲取有效信息,忽略腳本或樣式等無(wú)關(guān)元素。

現(xiàn)在執(zhí)行看看效果。

print("Downloading and parsing NVIDIA's 2023 10-K filing...")
download_and_parse_10k(url_10k, doc_path_raw, doc_path_clean)

with open(doc_path_clean, 'r', encoding='utf-8') as f:
    print("\n--- Sample content from cleaned 10-K ---")
    print(f.read(1000) + "...")
#### OUTPUT ####
Downloading and parsing NVIDIA 2023 10-K filing...
Successfully downloaded 10-K filing from https://www.sec.gov/Archives/edgar/data/1045810/000104581023000017/nvda-20230129.htm
Raw document saved to ./data/nvda_10k_2023_raw.html
Cleaned text content extracted and saved to ./data/nvda_10k_2023_clean.txt

# --- Sample content from cleaned 10-K ---
Item 1. Business. 
 OVERVIEW 
 NVIDIA is the pioneer of accelerated computing. We are a full-stack computing company with a platform strategy that brings together hardware, systems, software, algorithms, libraries, and services to create unique value for the markets we serve. Our work in accelerated computing and AI is reshaping the worlds largest industries and profoundly impacting society. 
 Founded in 1993, we started as a PC graphics chip company, inventing the graphics processing unit, or GPU. The GPU was essential for the growth of the PC gaming market and has since been repurposed to revolutionize computer graphics, high performance computing, or HPC, and AI. 
 The programmability of our GPUs made them ...

我們調(diào)用函數(shù),把內(nèi)容存到 txt 文件,作為后續(xù) RAG 的上下文。運(yùn)行上述代碼后會(huì)自動(dòng)下載,并可預(yù)覽樣例。

理解多源、多跳查詢

為了測(cè)試我們實(shí)現(xiàn)的流水線,并與基礎(chǔ) RAG 對(duì)比,我們需要一個(gè)非常復(fù)雜的查詢,覆蓋我們所用文檔的不同方面。

Our Complex Query:

"Based on NVIDIA's 2023 10-K filing, identify their key risks related to
competition. Then, find recent news (post-filing, from 2024) about AMD's
AI chip strategy and explain how this new strategy directly addresses or
exacerbates one of NVIDIA's stated risks."

為什么這個(gè)查詢會(huì)難倒標(biāo)準(zhǔn) RAG 流水線?

  1. Multi-Hop 推理:它不能一步完成。系統(tǒng)必須先識(shí)別風(fēng)險(xiǎn),再找 AMD 的新聞,最后把二者綜合起來(lái)。
  2. 多源知識(shí):所需信息在完全不同的地方。風(fēng)險(xiǎn)在內(nèi)部靜態(tài)文檔(10-K)中,而 AMD 的新聞是外部的,需要實(shí)時(shí) web 訪問(wèn)。
  3. 綜合與分析:不是簡(jiǎn)單列出事實(shí),而是要解釋“后者如何加劇前者”,需要真正的綜合推理。

下一節(jié)我們實(shí)現(xiàn)一個(gè)基礎(chǔ) RAG 流水線,看看它是如何失敗的。

構(gòu)建一個(gè)會(huì)失敗的淺層 RAG 流水線

現(xiàn)在環(huán)境和挑戰(zhàn)性知識(shí)庫(kù)都準(zhǔn)備好了,下一步是構(gòu)建一個(gè)標(biāo)準(zhǔn)的“vanilla” RAG 流水線。這很重要……

先從最簡(jiǎn)單可行的方案開(kāi)始,然后把復(fù)雜查詢?cè)谄渖线\(yùn)行,觀察到底是如何以及為何失敗。

我們將做如下事情:

Shallow RAG Pipeline (Created by Fareed Khan)Shallow RAG Pipeline (Created by Fareed Khan)

  • 加載并切分文檔:讀取清洗后的 10-K,并按固定大小切片——這是常見(jiàn)但語(yǔ)義上“天真”的方法。
  • 創(chuàng)建 vector store:對(duì)這些切片做 embedding,并用 ChromaDB 建索引,支持基礎(chǔ)語(yǔ)義搜索。
  • 組裝 RAG Chain:使用 LangChain Expression Language(LCEL)把 retriever、prompt 模板和 LLM 串起來(lái),形成線性流程。
  • 演示關(guān)鍵失敗點(diǎn):用我們的多跳多源查詢?nèi)?zhí)行,分析其不充分的回答。

先加載清洗后的文檔并切分。我們用 LangChain 的 RecursiveCharacterTextSplitter。

from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

print("Loading and chunking the document...")
loader = TextLoader(doc_path_clean, encoding='utf-8')
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
doc_chunks = text_splitter.split_documents(documents)

print(f"Document loaded and split into {len(doc_chunks)} chunks.")
#### OUTPUT ####
Loading and chunking the document...
Document loaded and split into 378 chunks.

有了 378 個(gè) chunk,下一步是讓它們可檢索——?jiǎng)?chuàng)建向量并存入數(shù)據(jù)庫(kù)。我們用 ChromaDB 作為 vector store,并用 OpenAI 的 text-embedding-3-small 作為 embedding 模型(配置中已定義)。

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

print("Creating baseline vector store...")
embedding_function = OpenAIEmbeddings(model=config['embedding_model'])

baseline_vector_store = Chroma.from_documents(
    documents=doc_chunks,
    embedding=embedding_function
)
baseline_retriever = baseline_vector_store.as_retriever(search_kwargs={"k": 3})

print(f"Vector store created with {baseline_vector_store._collection.count()} embeddings.")
#### OUTPUT ####
Creating baseline vector store...
Vector store created with 378 embeddings.

Chroma.from_documents 會(huì)組織以上過(guò)程,把向量存入可檢索的索引。最后用 LCEL 把它們裝配成單一可運(yùn)行的 RAG chain。數(shù)據(jù)流:用戶問(wèn)題 -> retriever -> prompt -> LLM。

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.runnable import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

template = """You are an AI financial analyst. Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
llm = ChatOpenAI(model=config["fast_llm"], temperature=0)

defformat_docs(docs):
    return"\n\n---\n\n".join(doc.page_content for doc in docs)

baseline_rag_chain = (
    {"context": baseline_retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

注意第一步是個(gè)字典。context 由子鏈生成:輸入問(wèn)題 -> baseline_retriever -> format_docs;而 question 則原樣透?jìng)鳎≧unnablePassthrough)。

運(yùn)行看看哪里會(huì)失敗。

from rich.console import Console
from rich.markdown import Markdown

console = Console()
complex_query_adv = "Based on NVIDIA's 2023 10-K filing, identify their key risks related to competition. Then, find recent news (post-filing, from 2024) about AMD's AI chip strategy and explain how this new strategy directly addresses or exacerbates one of NVIDIA's stated risks."

print("Executing complex query on the baseline RAG chain...")
baseline_result = baseline_rag_chain.invoke(complex_query_adv)

console.print("\n--- BASELINE RAG FAILED OUTPUT ---")
console.print(Markdown(baseline_result))
#### OUTPUT ####
Executing complex query on the baseline RAG chain...

--- BASELINE RAG FAILED OUTPUT ---
Based on the provided context, NVIDIA operates in an intensely competitive semiconductor
industry and faces competition from companies like AMD. The context mentions
that the industry is characterized by rapid technological change. However, the provided documents do not contain any specific information about AMD's recent AI chip strategy from 2024 or how it might impact NVIDIA's stated risks.

可以看到三個(gè)明顯問(wèn)題:

  • 語(yǔ)境不相關(guān):retriever 抓來(lái)一些“泛泛的 NVIDIA/competition/AMD”段落,卻沒(méi)有 2024 年 AMD 策略的具體細(xì)節(jié)。
  • 信息缺失:2023 年的數(shù)據(jù)不可能覆蓋 2024 年事件,系統(tǒng)沒(méi)有意識(shí)到自己“缺關(guān)鍵信息”。
  • 無(wú)規(guī)劃與工具使用:把復(fù)雜問(wèn)題當(dāng)成簡(jiǎn)單問(wèn)答,不能拆分步驟,也不會(huì)用 web 搜索來(lái)補(bǔ)齊。

系統(tǒng)失敗不是因?yàn)?LLM 笨,而是因?yàn)榧軜?gòu)過(guò)于簡(jiǎn)單。它用線性的一次性流程,試圖解決一個(gè)循環(huán)的多步驟問(wèn)題。

理解了基礎(chǔ) RAG 的問(wèn)題后,接下來(lái)開(kāi)始實(shí)現(xiàn)深度思考的方法論,看看如何解決復(fù)雜查詢。

定義中央智能體系統(tǒng)的 RAG 狀態(tài)

要構(gòu)建推理 agent,首先需要管理它的“狀態(tài)”。簡(jiǎn)單 RAG chain 的每一步都是無(wú)狀態(tài)的,但……

智能的 agent 需要“記憶”。它需要記住最初的問(wèn)題、它制定的計(jì)劃、以及迄今為止收集到的證據(jù)。

RAG State (Created by Fareed Khan)RAG State (Created by Fareed Khan)

RAGState 將作為中央記憶,在我們的 LangGraph 工作流中在各節(jié)點(diǎn)之間傳遞。首先定義數(shù)據(jù)結(jié)構(gòu),從最基本的構(gòu)件開(kāi)始:研究計(jì)劃中的單一步驟。

我們希望定義 agent 計(jì)劃的原子單元。每個(gè) Step 不僅要包含一個(gè)待回答的子問(wèn)題,還要包含其背后的理由,尤其是指定要用的工具。這迫使 agent 的規(guī)劃過(guò)程明確且結(jié)構(gòu)化。

from langchain_core.documents import Document
from langchain_core.pydantic_v1 import BaseModel, Field

class Step(BaseModel):
    sub_question: str = Field(descriptinotallow="A specific, answerable question for this step.")
    justification: str = Field(descriptinotallow="A brief explanation of why this step is necessary to answer the main query.")
    tool: Literal["search_10k", "search_web"] = Field(descriptinotallow="The tool to use for this step.")
    keywords: List[str] = Field(descriptinotallow="A list of critical keywords for searching relevant document sections.")
    document_section: Optional[str] = Field(descriptinotallow="A likely document section title (e.g., 'Item 1A. Risk Factors') to search within. Only for 'search_10k' tool.")

Step 類(基于 Pydantic BaseModel)為 Planner Agent 輸出提供嚴(yán)格契約。tool: Literal[...] 強(qiáng)制 LLM 明確在內(nèi)部知識(shí)(search_10k)與外部知識(shí)(search_web)之間做出選擇。

這種結(jié)構(gòu)化輸出比解析自然語(yǔ)言計(jì)劃可靠得多。

定義了單個(gè) Step 后,需要一個(gè)容器保存整個(gè)步驟序列。創(chuàng)建 Plan 類,它是 Step 對(duì)象的列表,代表 agent 端到端的研究策略。

class Plan(BaseModel):
    steps: List[Step] = Field(descriptinotallow="A detailed, multi-step plan to answer the user's query.")

Plan 類為整個(gè)研究過(guò)程提供結(jié)構(gòu)。調(diào)用 Planner Agent 時(shí),我們會(huì)要求返回符合該 schema 的 JSON 對(duì)象。這樣在任何檢索行動(dòng)前,agent 的策略都是清晰、按序的,且機(jī)器可讀。

執(zhí)行計(jì)劃時(shí),agent 需要記住自己學(xué)到了什么。定義 PastStep 字典,存儲(chǔ)每個(gè)已完成步驟的結(jié)果,構(gòu)成 agent 的“研究歷史”或“實(shí)驗(yàn)手記”。

class PastStep(TypedDict):
    step_index: int
    sub_question: str
    retrieved_docs: List[Document]
    summary: str

該結(jié)構(gòu)對(duì) agent 的自我批判(self-critique)循環(huán)至關(guān)重要。每一步后,我們填充這個(gè)字典并加入 state。agent 就能通過(guò)回顧這份逐步增長(zhǎng)的摘要列表,理解自己已知/未知,決定是否已具備完成任務(wù)所需的信息。

最后,把這些拼裝為主 RAGState 字典。它在整個(gè)圖中流動(dòng),包含原始問(wèn)題、完整計(jì)劃、過(guò)往步驟歷史,以及當(dāng)前正在執(zhí)行步驟的中間數(shù)據(jù)。

class RAGState(TypedDict):
    original_question: str
    plan: Plan
    past_steps: List[PastStep]
    current_step_index: int
    retrieved_docs: List[Document]
    reranked_docs: List[Document]
    synthesized_context: str
    final_answer: str

RAGState 就是 agent 的“心智”。圖中的每個(gè)節(jié)點(diǎn)接受此字典為輸入,并返回更新后的版本。

例如,plan_node 會(huì)填充 plan 字段,retrieval_node 會(huì)填充 retrieved_docs,以此類推。這個(gè)共享、持久的狀態(tài)使復(fù)雜的迭代推理成為可能,這是簡(jiǎn)單 RAG 鏈所缺失的。

準(zhǔn)備好 agent 的記憶藍(lán)圖后,開(kāi)始構(gòu)建第一個(gè)認(rèn)知組件:Planner Agent。

戰(zhàn)略規(guī)劃與查詢制定

Strategic Planning (Created by Fareed Khan)Strategic Planning (Created by Fareed Khan)

本節(jié)分為三步工程:

  • Tool-Aware Planner:構(gòu)建 LLM 驅(qū)動(dòng)的 agent,把用戶查詢分解為結(jié)構(gòu)化 Plan,并為每步選擇工具。
  • Query Rewriter:創(chuàng)建專門(mén) agent,把 planner 的簡(jiǎn)單子問(wèn)題改寫(xiě)為高效檢索的查詢。
  • Metadata-Aware Chunking:對(duì)源文檔重新處理,增加 section 級(jí) metadata,這是實(shí)現(xiàn)高精度過(guò)濾檢索的關(guān)鍵。

使用工具感知規(guī)劃器分解問(wèn)題

Decomposing Step (Created by Fareed Khan)Decomposing Step (Created by Fareed Khan)

不能把完整問(wèn)題扔給數(shù)據(jù)庫(kù),指望運(yùn)氣。要教會(huì) agent 將問(wèn)題拆成更小、更易處理的部分。

為此,我們創(chuàng)建專門(mén)的 Planner Agent,并給出非常清晰的指令(prompt),告訴它該如何工作。

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from rich.pretty import pprint as rprint

planner_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are an expert research planner. Your task is to create a clear, multi-step plan to answer a complex user query by retrieving information from multiple sources.
You have two tools available:
1. `search_10k`: Use this to search for information within NVIDIA's 2023 10-K financial filing. This is best for historical facts, financial data, and stated company policies or risks from that specific time period.
2. `search_web`: Use this to search the public internet for recent news, competitor information, or any topic that is not specific to NVIDIA's 2023 10-K.
Decompose the user's query into a series of simple, sequential sub-questions. For each step, decide which tool is more appropriate.
For `search_10k` steps, also identify the most likely section of the 10-K (e.g., 'Item 1A. Risk Factors', 'Item 7. Management's Discussion and Analysis...').
It is critical to use the exact section titles found in a 10-K filing where possible."""),
    ("human", "User Query: {question}")
])

這里給 LLM 一個(gè)新 persona:expert research planner。明確告知它有兩個(gè)工具(search_10k、search_web),以及各自適用場(chǎng)景——這就是“工具感知(tool-aware)”的部分。

我們要求它輸出一個(gè)能直接映射到系統(tǒng)能力的計(jì)劃,而不是模糊表述。

接下來(lái)初始化 reasoning 模型,并與 prompt 串接。關(guān)鍵是告訴 LLM 最終輸出必須符合 Pydantic 的 Plan 類格式,確保結(jié)構(gòu)化、可預(yù)測(cè)。

reasoning_llm = ChatOpenAI(model=config["reasoning_llm"], temperature=0)

planner_agent = planner_prompt | reasoning_llm.with_structured_output(Plan)
print("Tool-Aware Planner Agent created successfully.")

print("\n--- Testing Planner Agent ---")
test_plan = planner_agent.invoke({"question": complex_query_adv})

rprint(test_plan)

我們把 planner_prompt 通過(guò) reasoning_llm,再用 .with_structured_output(Plan),讓 LangChain 用函數(shù)調(diào)用能力,返回完全匹配 Plan schema 的 JSON 對(duì)象,比解析純文本可靠得多。

測(cè)試輸出如下:

#### OUTPUT ####
Tool-Aware Planner Agent created successfully.

--- Testing Planner Agent ---
Plan(
│   steps=[
│   │   Step(
│   │   │   sub_question="What are the key risks related to competition as stated in NVIDIA's 2023 10-K filing?",
│   │   │   justification="This step is necessary to extract the foundational information about competitive risks directly from the source document as requested by the user.",
│   │   │   tool='search_10k',
│   │   │   keywords=['competition', 'risk factors', 'semiconductor industry', 'competitors'],
│   │   │   document_section='Item 1A. Risk Factors'
│   │   ),
│   │   Step(
│   │   │   sub_question="What are the recent news and developments in AMD's AI chip strategy in 2024?",
│   │   │   justification="This step requires finding up-to-date, external information that is not available in the 2023 10-K filing. A web search is necessary to get the latest details on AMD's strategy.",
│   │   │   tool='search_web',
│   │   │   keywords=['AMD', 'AI chip strategy', '2024', 'MI300X', 'Instinct accelerator'],
│   │   │   document_section=None
│   │   )
│   ]
)

可以看到,agent 不僅給了一個(gè)清晰的 Plan,還正確識(shí)別出問(wèn)題包含兩部分:

  1. 第一部分答案在 10-K 中,選了 search_10k,且正確猜測(cè)了可能的 section。
  2. 第二部分是“2024 年新聞”,10-K 中不可能有,正確選了 search_web。這說(shuō)明我們的流程在“思考層面”已有希望。

使用查詢重寫(xiě)智能體優(yōu)化檢索

目前我們有一個(gè)包含好子問(wèn)題的計(jì)劃。

但“有哪些風(fēng)險(xiǎn)?”這樣的問(wèn)法并不是好的檢索查詢,太籠統(tǒng)。無(wú)論是向量數(shù)據(jù)庫(kù)還是 web 搜索,引擎更偏好具體、關(guān)鍵詞豐富的查詢。

Query Rewriting Agent (Creted by Fareed Khan)

為此我們構(gòu)建一個(gè)小而專的 agent:Query Rewriter。它的唯一工作是,把當(dāng)前步驟的子問(wèn)題,結(jié)合已知上下文,改寫(xiě)為更適合檢索的 query。

先設(shè)計(jì) prompt:

from langchain_core.output_parsers import StrOutputParser

query_rewriter_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a search query optimization expert. Your task is to rewrite a given sub-question into a highly effective search query for a vector database or web search engine, using keywords and context from the research plan.
The rewritten query should be specific, use terminology likely to be found in the target source (a financial 10-K or news articles), and be structured to retrieve the most relevant text snippets."""),
    ("human", "Current sub-question: {sub_question}\n\nRelevant keywords from plan: {keywords}\n\nContext from past steps:\n{past_context}")
])

我們讓這個(gè) agent 扮演“search query optimization expert”。它接收三類信息:sub_question、keywords、past_context,以此構(gòu)造更強(qiáng)的查詢。

初始化 agent:

query_rewriter_agent = query_rewriter_prompt | reasoning_llm | StrOutputParser()
print("Query Rewriter Agent created successfully.")

print("\n--- Testing Query Rewriter Agent ---")

test_sub_q = "How does AMD's 2024 AI chip strategy potentially exacerbate the competitive risks identified in NVIDIA's 10-K?"
test_keywords = ['impact', 'threaten', 'competitive pressure', 'market share', 'technological change']
test_past_context = "Step 1 Summary: NVIDIA's 10-K lists intense competition and rapid technological change as key risks. Step 2 Summary: AMD launched its MI300X AI accelerator in 2024 to directly compete with NVIDIA's H100."

rewritten_q = query_rewriter_agent.invoke({
    "sub_question": test_sub_q,
    "keywords": test_keywords,
    "past_context": test_past_context
})

print(f"Original sub-question: {test_sub_q}")
print(f"Rewritten Search Query: {rewritten_q}")

結(jié)果如下:

#### OUTPUT ####
Query Rewriter Agent created successfully.

--- Testing Query Rewriter Agent ---
Original sub-question: How does AMD 2024 AI chip strategy potentially exacerbate the competitive risks identified in NVIDIA 10-K?
Rewritten Search Query: analysis of how AMD 2024 AI chip strategy, including products like the MI300X, exacerbates NVIDIA's stated competitive risks such as rapid technological change and market share erosion in the data center and AI semiconductor industry

原問(wèn)題面向分析師;改寫(xiě)后的查詢面向搜索引擎,包含更具體術(shù)語(yǔ),如“MI300X”、“market share erosion”、“data center”等,這些都從關(guān)鍵詞和過(guò)往上下文中歸納出來(lái)。這樣的查詢更可能檢回準(zhǔn)確文檔,提升系統(tǒng)準(zhǔn)確與效率。

通過(guò)元數(shù)據(jù)感知分塊提升精度

Planner Agent 讓我們有了“額外線索”:它不僅說(shuō)“找風(fēng)險(xiǎn)”,還提示“看 Item 1A. Risk Factors 這一節(jié)”。

但當(dāng)前 retriever 用不上這個(gè)提示。vector store 只是 378 個(gè) chunk 的“扁平列表”,不知道什么是“section”。

Meta aware chunking (Created by Fareed Khan)

我們需要重建 chunks。這次,每個(gè) chunk 都要加上“它來(lái)自 10-K 的哪一節(jié)”的 metadata 標(biāo)簽。這樣 agent 就能執(zhí)行更精確的“過(guò)濾檢索”。

首先,需要在原始文本中程序化定位每個(gè) section 的起始。觀察文檔格式,每個(gè)主 section 以 “ITEM”+編號(hào)開(kāi)頭(如“ITEM 1A”、“ITEM 7”),非常適合用正則。

section_pattern = r"(ITEM\\s+\\d[A-Z]?\\.\\s*.*?)(?=\\nITEM\\s+\\d[A-Z]?\\.|$)"

這條 pattern 用于檢測(cè) section 標(biāo)題,既要足夠靈活以適配多種格式,又要足夠具體避免誤抓。

應(yīng)用該 pattern,把文檔切分為兩個(gè)列表:section 標(biāo)題列表、對(duì)應(yīng)內(nèi)容列表。

raw_text = documents[0].page_content

section_titles = re.findall(section_pattern, raw_text, re.IGNORECASE | re.DOTALL)
section_titles = [title.strip().replace('\\n', ' ') for title in section_titles]

sections_content = re.split(section_pattern, raw_text, flags=re.IGNORECASE | re.DOTALL)
sections_content = [content.strip() for content in sections_content if content.strip() and not content.strip().lower().startswith('item ')]
print(f"Identified {len(section_titles)} document sections.")
assert len(section_titles) == len(sections_content), "Mismatch between titles and content sections"

這是一種高效解析半結(jié)構(gòu)化文檔的方法。用一次 findall 獲得所有 section 標(biāo)題,再用一次 split 按標(biāo)題切分全文。assert 是健全性檢查,確保解析正確。

接著,將標(biāo)題與內(nèi)容逐一對(duì)應(yīng),生成最終帶 metadata 的 chunks。

import uuid

doc_chunks_with_metadata = []

for i, content inenumerate(sections_content):
    section_title = section_titles[i]
    section_chunks = text_splitter.split_text(content)
    
    for chunk in section_chunks:
        chunk_id = str(uuid.uuid4())
        doc_chunks_with_metadata.append(
            Document(
                page_cnotallow=chunk,
                metadata={
                    "section": section_title,
                    "source_doc": doc_path_clean,
                    "id": chunk_id
                }
            )
        )

print(f"Created {len(doc_chunks_with_metadata)} chunks with section metadata.")
print("\n--- Sample Chunk with Metadata ---")

sample_chunk = next(c for c in doc_chunks_with_metadata if"Risk Factors"in c.metadata.get("section", ""))
print(sample_chunk)

核心在于:為每個(gè) chunk 附加 metadata,將 section_title 作為標(biāo)簽。輸出如下:

#### OUTPUT ####
Processing document and adding metadata...
Identified 22 document sections.
Created 381 chunks with section metadata.

--- Sample Chunk with Metadata ---
Document(
│   page_cnotallow='Our industry is intensely competitive. We operate in the semiconductor\\nindustry, which is intensely competitive and characterized by rapid\\ntechnological change and evolving industry standards. ...
│   metadata={
│   │   'section': 'Item 1A. Risk Factors.',
│   │   'source_doc': './data/nvda_10k_2023_clean.txt',
│   │   'id': '...'
│   }
)

看到 metadata 中的 'section': 'Item 1A. Risk Factors.' 了嗎?現(xiàn)在,當(dāng) agent 需要找“風(fēng)險(xiǎn)”時(shí),可以對(duì) retriever 說(shuō):“只在 sectinotallow='Item 1A. Risk Factors' 的 chunks 中檢索”。一個(gè)簡(jiǎn)單的改造,就讓檢索從“鈍器”變成“手術(shù)刀”。這是構(gòu)建生產(chǎn)級(jí) RAG 的關(guān)鍵原則。

創(chuàng)建多階段檢索漏斗

到目前為止,我們已經(jīng)做了智能規(guī)劃,并為文檔添加了 metadata?,F(xiàn)在構(gòu)建系統(tǒng)的核心:復(fù)雜的檢索流水線。

簡(jiǎn)單的一次性語(yǔ)義搜索已經(jīng)不夠。生產(chǎn)級(jí) agent 需要自適應(yīng)、分階段的檢索流程。

Multi Stage Funnel (Created by Fareed Khan)

  • Retrieval Supervisor:構(gòu)建 supervisor agent 作為動(dòng)態(tài)路由器,分析每個(gè)子問(wèn)題并選擇最佳檢索策略(vector、keyword 或 hybrid)。
  • 第一階段(廣覆蓋 Recall):實(shí)現(xiàn) supervisor 可選的不同檢索策略,盡可能廣泛地捕獲潛在相關(guān)文檔。
  • 第二階段(高精度 Precision):使用 Cross-Encoder 模型對(duì)初始結(jié)果進(jìn)行重排,去噪并將最相關(guān)文檔置頂。
  • 第三階段(綜合 Synthesis):創(chuàng)建 Distiller Agent 將 top 文檔壓縮為單一、簡(jiǎn)潔的上下文。

使用監(jiān)督器動(dòng)態(tài)選擇策略

并非所有查詢都相同。比如“Compute & Networking 分部 2023 財(cái)年的 revenue 增長(zhǎng)是多少?”包含非常具體的術(shù)語(yǔ),keyword 搜索更合適;而“公司對(duì)市場(chǎng)競(jìng)爭(zhēng)的整體態(tài)度如何?”則是概念性問(wèn)題,semantic 搜索更優(yōu)。

Supervisor Agent (Created by Fareed Khan)Supervisor Agent (Created by Fareed Khan)

我們不硬編碼策略,而是構(gòu)建一個(gè)小而智能的 agent——Retrieval Supervisor。它的職責(zé)就是分析查詢,決定用哪種檢索方式。

先定義其輸出的結(jié)構(gòu):

class RetrievalDecision(BaseModel):
    strategy: Literal["vector_search", "keyword_search", "hybrid_search"]
    justification: str

然后是 prompt:

retrieval_supervisor_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a retrieval strategy expert. Based on the user's query, you must decide the best retrieval strategy.
You have three options:
1. `vector_search`: Best for conceptual, semantic, or similarity-based queries.
2. `keyword_search`: Best for queries with specific, exact terms, names, or codes (e.g., 'Item 1A', 'Hopper architecture').
3. `hybrid_search`: A good default that combines both, but may be less precise than a targeted strategy."""),
    ("human", "User Query: {sub_question}")
])

裝配該 agent 并測(cè)試:

retrieval_supervisor_agent = retrieval_supervisor_prompt | reasoning_llm.with_structured_output(RetrievalDecision)
print("Retrieval Supervisor Agent created.")

print("\n--- Testing Retrieval Supervisor Agent ---")
query1 = "revenue growth for the Compute & Networking segment in fiscal year 2023"
decision1 = retrieval_supervisor_agent.invoke({"sub_question": query1})

print(f"Query: '{query1}'")
print(f"Decision: {decision1.strategy}, Justification: {decision1.justification}")

query2 = "general sentiment about market competition and technological innovation"
decision2 = retrieval_supervisor_agent.invoke({"sub_question": query2})
print(f"\nQuery: '{query2}'")
print(f"Decision: {decision2.strategy}, Justification: {decision2.justification}")
#### OUTPUT ####
Retrieval Supervisor Agent created.


# --- Testing Retrieval Supervisor Agent ---
Query: 'revenue growth for the Compute & Networking segment in fiscal year 2023'
Decision: keyword_search, Justification: The query contains specific keywords like 'revenue growth', 'Compute & Networking', and 'fiscal year 2023' which are ideal for a keyword-based search to find exact financial figures.

Query: 'general sentiment about market competition and technological innovation'
Decision: vector_search, Justification: This query is conceptual and seeks to understand sentiment and broader themes. Vector search is better suited to capture the semantic meaning of 'market competition' and 'technological innovation' rather than relying on exact keywords.

它能正確地為具體術(shù)語(yǔ)選 keyword_search,為概念性問(wèn)題選 vector_search。動(dòng)態(tài)決策比一刀切強(qiáng)得多。

利用混合、關(guān)鍵詞與語(yǔ)義搜索進(jìn)行廣泛召回

有了 supervisor 選擇策略,我們需要實(shí)現(xiàn)這些策略。第一階段的目標(biāo)是 Recall(廣覆蓋):盡可能捕獲所有潛在相關(guān)文檔,即使帶入一些噪聲也沒(méi)關(guān)系。

Broad Recall (Created by Fareed Khan)Broad Recall (Created by Fareed Khan)

我們實(shí)現(xiàn)三種搜索函數(shù):

  1. Vector Search:標(biāo)準(zhǔn)語(yǔ)義搜索,升級(jí)為支持 metadata filter。
  2. Keyword Search(BM25):傳統(tǒng)且強(qiáng)大的算法,擅長(zhǎng)匹配具體術(shù)語(yǔ)。
  3. Hybrid Search:結(jié)合二者,用 RRF(Reciprocal Rank Fusion)融合。

先用帶 metadata 的 chunks 創(chuàng)建一個(gè)高級(jí) vector store。

import numpy as np
from rank_bm25 import BM25Okapi

print("Creating advanced vector store with metadata...")

advanced_vector_store = Chroma.from_documents(
    documents=doc_chunks_with_metadata,
    embedding=embedding_function
)
print(f"Advanced vector store created with {advanced_vector_store._collection.count()} embeddings.")

接著構(gòu)建 BM25 的索引:

print("\nBuilding BM25 index for keyword search...")

tokenized_corpus = [doc.page_content.split(" ") for doc in doc_chunks_with_metadata]
doc_ids = [doc.metadata["id"] for doc in doc_chunks_with_metadata]
doc_map = {doc.metadata["id"]: doc for doc in doc_chunks_with_metadata}
bm25 = BM25Okapi(tokenized_corpus)

定義三個(gè)檢索函數(shù):

def vector_search_only(query: str, section_filter: str = None, k: int = 10):
    filter_dict = {"section": section_filter} if section_filter and"Unknown"notin section_filter elseNone
    return advanced_vector_store.similarity_search(query, k=k, filter=filter_dict)

defbm25_search_only(query: str, k: int = 10):
    tokenized_query = query.split(" ")
    bm25_scores = bm25.get_scores(tokenized_query)
    top_k_indices = np.argsort(bm25_scores)[::-1][:k]
    return [doc_map[doc_ids[i]] for i in top_k_indices]

defhybrid_search(query: str, section_filter: str = None, k: int = 10):
    bm25_docs = bm25_search_only(query, k=k)
    semantic_docs = vector_search_only(query, section_filter=section_filter, k=k)
    all_docs = {doc.metadata["id"]: doc for doc in bm25_docs + semantic_docs}.values()
    ranked_lists = [[doc.metadata["id"] for doc in bm25_docs], [doc.metadata["id"] for doc in semantic_docs]]
    
    rrf_scores = {}
    for doc_list in ranked_lists:
        for i, doc_id inenumerate(doc_list):
            if doc_id notin rrf_scores:
                rrf_scores[doc_id] = 0
            rrf_scores[doc_id] += 1 / (i + 61)
    sorted_doc_ids = sorted(rrf_scores.keys(), key=lambda x: rrf_scores[x], reverse=True)
    final_docs = [doc_map[doc_id] for doc_id in sorted_doc_ids[:k]]
    return final_docs

print("\nAll retrieval strategy functions ready.")

快速測(cè)試 keyword 搜索是否能精確命中目標(biāo) section:

print("\n--- Testing Keyword Search ---")
test_query = "Item 1A. Risk Factors"
test_results = bm25_search_only(test_query)
print(f"Query: {test_query}")
print(f"Found {len(test_results)} documents. Top result section: {test_results[0].metadata['section']}")
#### OUTPUT ####
Creating advanced vector store with metadata...
Advanced vector store created with 381 embeddings.

Building BM25 index for keyword search...
All retrieval strategy functions ready.

# --- Testing Keyword Search ---
Query: Item 1A. Risk Factors
Found 10 documents. Top result section: Item 1A. Risk Factors.

如預(yù)期,BM25 能精確、迅速地檢回 “Item 1A. Risk Factors” 相關(guān)文檔。當(dāng)查詢包含具體關(guān)鍵詞(如 section 標(biāo)題)時(shí),supervisor 就可以選擇這一精準(zhǔn)工具。

接下來(lái)進(jìn)入高精度階段,進(jìn)行重排。

使用交叉編碼器重排器實(shí)現(xiàn)高精度

第一階段的 Recall 能拿到 10 個(gè)“潛在相關(guān)”的文檔。

但“潛在相關(guān)”還不夠,直接把這 10 個(gè) chunk 全喂給主推理 LLM 會(huì)既低效又有風(fēng)險(xiǎn)——成本高,還可能被噪聲干擾。

High Precision (Created by Fareed Khan)High Precision (Created by Fareed Khan)

我們需要 Precision 階段,用 Reranker 來(lái)從這 10 個(gè)候選中挑出最相關(guān)的少數(shù)。區(qū)別在于模型工作方式:

  1. 初始檢索用的是 bi-encoder(embedding 模型),分別對(duì) query 與文檔編碼,速度快、適合海量搜索。
  2. Cross-Encoder 則將“query + 單個(gè)文檔”作為一對(duì),一起輸入,做更深入的比較。它更慢,但更準(zhǔn)。

我們要寫(xiě)一個(gè)函數(shù),把 10 個(gè)文檔用 Cross-Encoder 打分重排,只保留 config 里的 top 3。

初始化 Cross-Encoder 模型:

from sentence_transformers import CrossEncoder

print("Initializing CrossEncoder reranker...")

reranker = CrossEncoder(config["reranker_model"])

定義重排函數(shù):

def rerank_documents_function(query: str, documents: List[Document]) -> List[Document]:
    if not documents: 
        return []
        
    pairs = [(query, doc.page_content) for doc in documents]
    scores = reranker.predict(pairs)
    doc_scores = list(zip(documents, scores))
    doc_scores.sort(key=lambda x: x[1], reverse=True)
    reranked_docs = [doc for doc, score in doc_scores[:config["top_n_rerank"]]]
    return reranked_docs

該函數(shù)用 cross-encoder 對(duì)每個(gè)(query, doc)對(duì)進(jìn)行打分,排序后截取前 3,輸出短小而高相關(guān)的文檔列表,作為后續(xù) agent 的完美上下文。這樣的漏斗式處理(先高召回,再高精度)是生產(chǎn)級(jí) RAG 的關(guān)鍵。

通過(guò)上下文蒸餾進(jìn)行綜合

現(xiàn)在我們有 10 -> 3 的高相關(guān)文檔,但仍然是三個(gè)獨(dú)立塊。為進(jìn)一步精煉,再加入最后一道“壓縮”——Contextual Distillation:將前 3 個(gè)文檔“蒸餾”為一個(gè)簡(jiǎn)潔、干凈的段落,去除冗余,構(gòu)建一段信息密度極高的上下文。

Synthesization (Created by Fareed Khan)Synthesization (Created by Fareed Khan)

這一步是針對(duì)文本處理,不負(fù)責(zé)回答問(wèn)題。我們創(chuàng)建 Distiller Agent:

distiller_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a helpful assistant. Your task is to synthesize the following retrieved document snippets into a single, concise paragraph.
The goal is to provide a clear and coherent context that directly answers the question: '{question}'.
Focus on removing redundant information and organizing the content logically. Answer only with the synthesized context."""),
    ("human", "Retrieved Documents:\n{context}")
])

distiller_agent = distiller_prompt | reasoning_llm | StrOutputParser()
print("Contextual Distiller Agent created.")

在主循環(huán)中,每一步的流程將是:

  1. Supervisor:選擇檢索策略(vector/keyword/hybrid)。
  2. Recall:執(zhí)行選擇的策略,取 top 10 文檔。
  3. Precision:用 rerank_documents_function 取 top 3。
  4. Distillation:用 distiller_agent 壓縮為單段精華。

這樣我們的證據(jù)質(zhì)量達(dá)到最佳。下一步,給 agent “看向外部世界”的能力:web 檢索。

使用網(wǎng)絡(luò)搜索增強(qiáng)知識(shí)

目前的檢索漏斗很強(qiáng),但有一個(gè)致命盲點(diǎn):

它只能看到 2023 年 10-K 中的內(nèi)容。而我們的挑戰(zhàn)需要“2024 年 AMD 的 AI 芯片策略”的最新新聞——這些在靜態(tài)知識(shí)庫(kù)中根本不存在。

真正的“深度思考” agent,必須意識(shí)到自身知識(shí)的邊界,并能到別處找答案。我們需要給它一扇“窗”。

Augemtation using Web (Created by Fareed Khan)Augemtation using Web (Created by Fareed Khan)

這一步我們?yōu)橄到y(tǒng)增加一個(gè)新工具:Web Search,使其從“文檔特定問(wèn)答機(jī)器人”變成真正的多源研究助手。

我們使用 Tavily Search API——專為 LLM 構(gòu)建的搜索引擎,返回干凈、無(wú)廣告、相關(guān)的結(jié)果,非常適合 RAG;同時(shí)與 LangChain 集成順暢。

初始化 Tavily 搜索工具:

from langchain_community.tools.tavily_search import TavilySearchResults

web_search_tool = TavilySearchResults(k=3)

原始 API 響應(yīng)需要包裝為標(biāo)準(zhǔn)的 Document 列表,以便與我們的 reranker、distiller 無(wú)縫銜接。寫(xiě)一個(gè)小包裝函數(shù):

def web_search_function(query: str) -> List[Document]:
    results = web_search_tool.invoke({"query": query})
    return [
        Document(
            page_cnotallow=res["content"],
            metadata={"source": res["url"]}
        ) for res in results
    ]

測(cè)試:

print("\n--- Testing Web Search Tool ---")
test_query_web = "AMD AI chip strategy 2024"
test_results_web = web_search_function(test_query_web)
print(f"Found {len(test_results_web)} results for query: '{test_query_web}'")
if test_results_web:
    print(f"Top result snippet: {test_results_web[0].page_content[:250]}...")
#### OUTPUT ####
Web search tool (Tavily) initialized.

--- Testing Web Search Tool ---
Found 3 results for query: 'AMD AI chip strategy 2024'
Top result snippet: AMD has intensified its battle with Nvidia in the AI chip market with the release of the Instinct MI300X accelerator, a powerful GPU designed to challenge Nvidia's H100 in training and inference for large language models. Major cloud providers like Microsoft Azure and Oracle Cloud are adopting the MI300X, indicating strong market interest...

結(jié)果如愿,找到了 3 篇相關(guān)網(wǎng)頁(yè)。摘要提到了 AMD “Instinct MI300X” 與 NVIDIA “H100”的對(duì)抗——正是解決第二部分問(wèn)題所需的證據(jù)。現(xiàn)在 agent 擁有通往外部世界的窗口,planner 可以智能地決定何時(shí)使用它。下一步是讓 agent 能夠“反思”并決定何時(shí)結(jié)束研究。

自我評(píng)估與控制流策略

到目前為止,agent 能制定計(jì)劃、選擇工具,并執(zhí)行復(fù)雜的檢索漏斗。但還缺少一個(gè)關(guān)鍵能力:對(duì)自身進(jìn)展進(jìn)行“思考”。盲目照搬計(jì)劃逐步執(zhí)行的 agent 并非真正智能,需要一個(gè)自我批判機(jī)制。

Self Critique and Policy Making (Created by Fareed Khan)Self Critique and Policy Making (Created by Fareed Khan)

每次研究步驟后,agent 都會(huì)停下來(lái)反思,比較新信息與既有知識(shí),然后做出策略性決策:研究是否已完成,還是要繼續(xù)?

這個(gè)自我批判循環(huán)讓系統(tǒng)從腳本化工作流,躍升為自治 agent。它能判斷自己是否已經(jīng)收集到足夠的證據(jù),來(lái)有信心地回答用戶問(wèn)題。

我們將實(shí)現(xiàn)兩個(gè)專門(mén) agent:

  1. Reflection Agent:讀取當(dāng)前步驟的精煉上下文,寫(xiě)一條簡(jiǎn)潔的一句話摘要,加入“研究歷史”。
  2. Policy Agent:作為總指揮,在反思之后,審視整個(gè)歷史與最初計(jì)劃,做出關(guān)鍵決策:CONTINUE_PLAN 或 FINISH。

更新并反映累積研究歷史

每完成一步(例如:檢索并蒸餾出 NVIDIA 的風(fēng)險(xiǎn)),不要直接進(jìn)入下一步。需要把新知識(shí)整合到 agent 的記憶中。

Reflective Cumulative (Created by Fareed Khan)Reflective Cumulative (Created by Fareed Khan)

構(gòu)建 Reflection Agent:任務(wù)是讀入當(dāng)前步驟的精煉上下文,寫(xiě)一條事實(shí)性的一句話摘要,并把它添加到 RAGState 的 past_steps 中。

reflection_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a research assistant. Based on the retrieved context for the current sub-question, write a concise, one-sentence summary of the key findings.
This summary will be added to our research history. Be factual and to the point."""),
    ("human", "Current sub-question: {sub_question}\n\nDistilled context:\n{context}")
])

reflection_agent = reflection_prompt | reasoning_llm | StrOutputParser()
print("Reflection Agent created.")

它是認(rèn)知循環(huán)的重要組成:通過(guò)這些簡(jiǎn)潔摘要,構(gòu)建干凈易讀的“研究歷史”,為下一個(gè)、也是最重要的 agent——策略決策者,提供輸入。

B構(gòu)建用于控制流的策略智能體

這是 agent 自主性的“大腦”。在 reflection_agent 更新歷史后,Policy Agent 上場(chǎng),作為總調(diào)度,查看:原始問(wèn)題、初始計(jì)劃、已完成步驟摘要的全量歷史,做出高階策略決策。

Policy Agent (Created by Fareed Khan)Policy Agent (Created by Fareed Khan)

定義決策輸出結(jié)構(gòu):

class Decision(BaseModel):
    next_action: Literal["CONTINUE_PLAN", "FINISH"]
    justification: str

設(shè)計(jì) prompt:

policy_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a master strategist. Your role is to analyze the research progress and decide the next action.
You have the original question, the initial plan, and a log of completed steps with their summaries.
- If the collected information in the Research History is sufficient to comprehensively answer the Original Question, decide to FINISH.
- Otherwise, if the plan is not yet complete, decide to CONTINUE_PLAN."""),
    ("human", "Original Question: {question}\n\nInitial Plan:\n{plan}\n\nResearch History (Completed Steps):\n{history}")
])

policy_agent = policy_prompt | reasoning_llm.with_structured_output(Decision)
print("Policy Agent created.")

測(cè)試兩個(gè)狀態(tài):

print("\n--- Testing Policy Agent (Incomplete State) ---")
plan_str = json.dumps([s.dict() for s in test_plan.steps])
incomplete_history = "Step 1 Summary: NVIDIA's 10-K states that the semiconductor industry is intensely competitive and subject to rapid technological change."
decision1 = policy_agent.invoke({"question": complex_query_adv, "plan": plan_str, "history": incomplete_history})
print(f"Decision: {decision1.next_action}, Justification: {decision1.justification}")

print("\n--- Testing Policy Agent (Complete State) ---")
complete_history = incomplete_history + "\nStep 2 Summary: In 2024, AMD launched its MI300X accelerator to directly compete with NVIDIA in the AI chip market, gaining adoption from major cloud providers."
decision2 = policy_agent.invoke({"question": complex_query_adv, "plan": plan_str, "history": complete_history})
print(f"Decision: {decision2.next_action}, Justification: {decision2.justification}")
#### OUTPUT ####
Policy Agent created.

--- Testing Policy Agent (Incomplete State) ---
Decision: CONTINUE_PLAN, Justification: The research has only identified NVIDIA's competitive risks from the 10-K. It has not yet gathered the required external information about AMD's 2024 strategy, which is the next step in the plan.

--- Testing Policy Agent (Complete State) ---
Decision: FINISH, Justification: The research history now contains comprehensive summaries of both NVIDIA's stated competitive risks and AMD's recent AI chip strategy. All necessary information has been gathered to perform the final synthesis and answer the user's question.

未完成狀態(tài)時(shí),正確選擇 CONTINUE_PLAN;完成狀態(tài)時(shí),正確選擇 FINISH。有了 policy_agent,我們具備自主系統(tǒng)的頭腦。接下來(lái)用 LangGraph 把所有組件串起來(lái)。

定義圖節(jié)點(diǎn)

我們已經(jīng)設(shè)計(jì)好這些專門(mén)的 agent?,F(xiàn)在要把它們變成工作流的“積木”。LangGraph 中的“節(jié)點(diǎn)(node)”就是干這事的:每個(gè)節(jié)點(diǎn)是一個(gè) Python 函數(shù),完成一項(xiàng)具體工作,接收 RAGState,更新并返回字典。

Graph Nodes (Created by Fareed Khan)Graph Nodes (Created by Fareed Khan)

先寫(xiě)一個(gè)工具函數(shù),把研究歷史 past_steps 格式化成易讀字符串,方便傳給 prompt:

def get_past_context_str(past_steps: List[PastStep]) -> str:
    return "\\n\\n".join([f"Step {s['step_index']}: {s['sub_question']}\\nSummary: {s['summary']}" for s in past_steps])

第一個(gè)節(jié)點(diǎn):plan_node,調(diào)用 planner_agent 填充 plan 字段。

def plan_node(state: RAGState) -> Dict:
    console.print("--- ??: Generating Plan ---")
    plan = planner_agent.invoke({"question": state["original_question"]})
    rprint(plan)
    return {"plan": plan, "current_step_index": 0, "past_steps": []}

接著是兩個(gè)檢索節(jié)點(diǎn):內(nèi)部文檔與 web。

def retrieval_node(state: RAGState) -> Dict:
    current_step_index = state["current_step_index"]
    current_step = state["plan"].steps[current_step_index]
    console.print(f"--- ?? : Retrieving from 10-K (Step {current_step_index + 1}: {current_step.sub_question}) ---")
    
    past_context = get_past_context_str(state['past_steps'])
    rewritten_query = query_rewriter_agent.invoke({
        "sub_question": current_step.sub_question,
        "keywords": current_step.keywords,
        "past_context": past_context
    })
    console.print(f"  Rewritten Query: {rewritten_query}")
    
    retrieval_decision = retrieval_supervisor_agent.invoke({"sub_question": rewritten_query})
    console.print(f"  Supervisor Decision: Use `{retrieval_decision.strategy}`. Justification: {retrieval_decision.justification}")

    if retrieval_decision.strategy == 'vector_search':
        retrieved_docs = vector_search_only(rewritten_query, section_filter=current_step.document_section, k=config['top_k_retrieval'])
    elif retrieval_decision.strategy == 'keyword_search':
        retrieved_docs = bm25_search_only(rewritten_query, k=config['top_k_retrieval'])
    else:
        retrieved_docs = hybrid_search(rewritten_query, section_filter=current_step.document_section, k=config['top_k_retrieval'])
    
    return {"retrieved_docs": retrieved_docs}
def web_search_node(state: RAGState) -> Dict:
    current_step_index = state["current_step_index"]
    current_step = state["plan"].steps[current_step_index]
    console.print(f"--- ?? : Searching Web (Step {current_step_index + 1}: {current_step.sub_question}) ---")
    
    past_context = get_past_context_str(state['past_steps'])
    rewritten_query = query_rewriter_agent.invoke({
        "sub_question": current_step.sub_question,
        "keywords": current_step.keywords,
        "past_context": past_context
    })
    console.print(f"  Rewritten Query: {rewritten_query}")
    retrieved_docs = web_search_function(rewritten_query)
    return {"retrieved_docs": retrieved_docs}

然后是 Precision 與 Distillation 節(jié)點(diǎn):

def rerank_node(state: RAGState) -> Dict:
    console.print("--- ?? : Reranking Documents ---")
    current_step_index = state["current_step_index"]
    current_step = state["plan"].steps[current_step_index]
    reranked_docs = rerank_documents_function(current_step.sub_question, state["retrieved_docs"])
    console.print(f"  Reranked to top {len(reranked_docs)} documents.")
    return {"reranked_docs": reranked_docs}
def compression_node(state: RAGState) -> Dict:
    console.print("--- ??: Distilling Context ---")
    current_step_index = state["current_step_index"]
    current_step = state["plan"].steps[current_step_index]
    context = format_docs(state["reranked_docs"])
    synthesized_context = distiller_agent.invoke({"question": current_step.sub_question, "context": context})
    console.print(f"  Distilled Context Snippet: {synthesized_context[:200]}...")
    return {"synthesized_context": synthesized_context}

反思并更新歷史:

def reflection_node(state: RAGState) -> Dict:
    console.print("--- : Reflecting on Findings ---")
    current_step_index = state["current_step_index"]
    current_step = state["plan"].steps[current_step_index]
    summary = reflection_agent.invoke({"sub_question": current_step.sub_question, "context": state['synthesized_context']})
    console.print(f"  Summary: {summary}")
    
    new_past_step = {
        "step_index": current_step_index + 1,
        "sub_question": current_step.sub_question,
        "retrieved_docs": state['reranked_docs'],
        "summary": summary
    }
    return {"past_steps": state["past_steps"] + [new_past_step], "current_step_index": current_step_index + 1}

最終答案生成:

def final_answer_node(state: RAGState) -> Dict:
    console.print("--- ?: Generating Final Answer with Citations ---")
    final_context = ""
    for i, step inenumerate(state['past_steps']):
        final_context += f"\\n--- Findings from Research Step {i+1} ---\\n"
        for doc in step['retrieved_docs']:
            source = doc.metadata.get('section') or doc.metadata.get('source')
            final_context += f"Source: {source}\\nContent: {doc.page_content}\\n\\n"
    
    final_answer_prompt = ChatPromptTemplate.from_messages([
        ("system", """You are an expert financial analyst. Synthesize the research findings from internal documents and web searches into a comprehensive, multi-paragraph answer for the user's original question.
Your answer must be grounded in the provided context. At the end of any sentence that relies on specific information, you MUST add a citation. For 10-K documents, use [Source: ]. For web results, use [Source: ]."""),
        ("human", "Original Question: {question}\n\nResearch History and Context:\n{context}")
    ])
    
    final_answer_agent = final_answer_prompt | reasoning_llm | StrOutputParser()
    final_answer = final_answer_agent.invoke({"question": state['original_question'], "context": final_context})
    return {"final_answer": final_answer}

節(jié)點(diǎn)齊備后,接下來(lái)定義“邊”(edges),確定它們之間的連接關(guān)系與控制流。

定義條件邊

我們需要兩類關(guān)鍵的條件邊:

  1. 工具路由器(route_by_tool):在 plan 之后,查看當(dāng)前步驟應(yīng)使用的工具,路由到 retrieve_10k 或 retrieve_web。
  2. 主控制循環(huán)(should_continue_node):每次反思后,調(diào)用 policy_agent 決定是繼續(xù)下一步還是結(jié)束并生成答案。

工具路由器:

def route_by_tool(state: RAGState) -> str:
    current_step_index = state["current_step_index"]
    current_step = state["plan"].steps[current_step_index]
    return current_step.tool

主控制循環(huán):

def should_continue_node(state: RAGState) -> str:
    console.print("--- ?? : Evaluating Policy ---")
    current_step_index = state["current_step_index"]
    
    if current_step_index >= len(state["plan"].steps):
        console.print("  -> Plan complete. Finishing.")
        return"finish"
    
    if current_step_index >= config["max_reasoning_iterations"]:
        console.print("  -> Max iterations reached. Finishing.")
        return"finish"

    if state.get("reranked_docs") isnotNoneandnot state["reranked_docs"]:
        console.print("  -> Retrieval failed for the last step. Continuing with next step in plan.")
        return"continue"

    history = get_past_context_str(state['past_steps'])
    plan_str = json.dumps([s.dict() for s in state['plan'].steps])

    decision = policy_agent.invoke({"question": state["original_question"], "plan": plan_str, "history": history})
    console.print(f"  -> Decision: {decision.next_action} | Justification: {decision.justification}")
    
    if decision.next_action == "FINISH":
        return"finish"
    else:
        return"continue"

有了節(jié)點(diǎn)(專家)與條件邊(對(duì)話規(guī)則),我們就可以構(gòu)建完整的 StateGraph。

連接深度思考 RAG 機(jī)器

現(xiàn)在用 LangGraph 的 StateGraph 來(lái)定義完整的認(rèn)知架構(gòu),也就是 agent 的思維流程藍(lán)圖。

from langgraph.graph import StateGraph, END

graph = StateGraph(RAGState)

添加節(jié)點(diǎn):

graph.add_node("plan", plan_node)
graph.add_node("retrieve_10k", retrieval_node)
graph.add_node("retrieve_web", web_search_node)
graph.add_node("rerank", rerank_node)
graph.add_node("compress", compression_node)
graph.add_node("reflect", reflection_node)
graph.add_node("generate_final_answer", final_answer_node)

連接邊與條件邊:

graph.set_entry_point("plan")

graph.add_conditional_edges(
    "plan",
    route_by_tool,
    {
        "search_10k": "retrieve_10k",
        "search_web": "retrieve_web",
    },
)

graph.add_edge("retrieve_10k", "rerank")
graph.add_edge("retrieve_web", "rerank")
graph.add_edge("rerank", "compress")
graph.add_edge("compress", "reflect")

graph.add_conditional_edges(
    "reflect",
    should_continue_node,
    {
        "continue": "plan",
        "finish": "generate_final_answer",
    },
)

graph.add_edge("generate_final_answer", END)
print("StateGraph constructed successfully.")

流程回顧:

  1. 從 plan 開(kāi)始;
  2. route_by_tool 決定走 retrieve_10k 還是 retrieve_web;
  3. 然后始終按 rerank -> compress -> reflect;
  4. reflect 后,通過(guò) should_continue_node 決定:
  • 若 CONTINUE_PLAN,回到 plan,路由下一步;
  • 若 FINISH,進(jìn)入 generate_final_answer;
  1. 生成最終答案后結(jié)束。

至此,我們完成了深度思考 Agent 的復(fù)雜、循環(huán)架構(gòu)。下一步是編譯與可視化。

編譯與可視化迭代工作流

編譯(.compile())會(huì)把抽象的節(jié)點(diǎn)與邊定義,轉(zhuǎn)化為可執(zhí)行應(yīng)用。我們還可以用內(nèi)置工具生成圖示,有助于理解與調(diào)試。

deep_thinking_rag_graph = graph.compile()
print("Graph compiled successfully.")

try:
    from IPython.display import Image, display
    png_image = deep_thinking_rag_graph.get_graph().draw_png()
    display(Image(png_image))
except Exception as e:
    print(f"Graph visualization failed: {e}. Please ensure pygraphviz is installed.")

Deep Thinking Simpler Pipeline Flow (Created by Fareed Khan)Deep Thinking Simpler Pipeline Flow (Created by Fareed Khan)

你會(huì)看到:

  • route_by_tool 選擇內(nèi)部或外部檢索的分支;
  • 每個(gè)研究步驟的線性處理(rerank -> compress -> reflect);
  • 關(guān)鍵的反饋循環(huán):should_continue 把流程送回 plan,開(kāi)始下一輪;
  • 研究完成后進(jìn)入 generate_final_answer 的“出口”。

這就是一個(gè)“會(huì)思考”的系統(tǒng)。接下來(lái)實(shí)際運(yùn)行。

運(yùn)行深度思考流水線

我們要用相同的多跳多源查詢來(lái)測(cè)試這個(gè)系統(tǒng),看看它能否成功。

這里我們調(diào)用 .stream() 觀察每個(gè)節(jié)點(diǎn)更新后的 state,實(shí)時(shí)追蹤 agent 的“思考過(guò)程”。

final_state = None
graph_input = {"original_question": complex_query_adv}

print("--- Invoking Deep Thinking RAG Graph ---")
for chunk in deep_thinking_rag_graph.stream(graph_input, stream_mode="values"):
    final_state = chunk
print("\n--- Graph Stream Finished ---")
#### OUTPUT ####

--- Invoking Deep Thinking RAG Graph ---

--- ??: Generating Plan ---
plan:
  steps:
  - sub_question: What are the key risks related to competition as stated in NVIDIA's 2023 10-K filing?
    tool: search_10k
    ...
  - sub_question: What are the recent news and developments in AMD's AI chip strategy in 2024?
    tool: search_web
    ...

--- ?? : Retrieving from 10-K (Step 1: ...) ---
  Rewritten Query: key competitive risks for NVIDIA in the semiconductor industry...
  Supervisor Decision: Use `hybrid_search`. ...

--- ??  : Reranking Documents ---
  Reranked to top 3 documents.

--- ??: Distilling Context ---
  Distilled Context Snippet: NVIDIA operates in the intensely competitive semiconductor industry...

--- ??: Reflecting on Findings ---
  Summary: According to its 2023 10-K, NVIDIA operates in an intensely competitive semiconductor industry...

--- ??  : Evaluating Policy ---
  -> Decision: CONTINUE_PLAN | Justification: The first step...has been completed. The next step...is still pending...

--- ?? : Searching Web (Step 2: ...) ---
  Rewritten Query: AMD AI chip strategy news and developments 2024...

--- ??  : Reranking Documents ---
  Reranked to top 3 documents.

--- ??: Distilling Context ---
  Distilled Context Snippet: AMD has ramped up its challenge to Nvidia in the AI accelerator market with its Instinct MI300 series...

--- ??: Reflecting on Findings ---
  Summary: In 2024, AMD is aggressively competing with NVIDIA in the AI chip market through its Instinct MI300X accelerator...

--- ??  : Evaluating Policy ---
  -> Decision: FINISH | Justification: The research history now contains comprehensive summaries of both NVIDIA's stated risks and AMD's recent strategy...

--- ?: Generating Final Answer with Citations ---

--- Graph Stream Finished ---

可以看到系統(tǒng)完整執(zhí)行了我們?cè)O(shè)計(jì)的流程:規(guī)劃 -> 步驟 1 -> 自我評(píng)估繼續(xù) -> 步驟 2 -> 自我評(píng)估結(jié)束 -> 最終綜合。

分析最終高質(zhì)量答案

打印最終答案:

console.print("--- DEEP THINKING RAG FINAL ANSWER ---")
console.print(Markdown(final_state['final_answer']))
#### OUTPUT ####
--- DEEP THINKING RAG FINAL ANSWER ---
Based on an analysis of NVIDIA's 2023 10-K filing and recent news from 2024 regarding AMD's AI chip strategy, the following synthesis can be made:

**NVIDIA's Stated Competitive Risks:**
In its 2023 10-K filing, NVIDIA identifies its operating environment as the "intensely competitive" semiconductor industry, which is characterized by rapid technological change. A primary risk is that competitors, including AMD, could introduce new products with better performance or lower costs that gain significant market acceptance, which could materially and adversely affect its business [Source: Item 1A. Risk Factors.].

**AMD's 2024 AI Chip Strategy:**
In 2024, AMD has moved aggressively to challenge NVIDIA's dominance in the AI hardware market with its Instinct MI300 series of accelerators, particularly the MI300X. This product is designed to compete directly with NVIDIA's H100 GPU. AMD's strategy has gained significant traction, with major cloud providers such as Microsoft Azure and Oracle announcing plans to use the new chips [Source: https://www.reuters.com/technology/amd-forecasts-35-billion-ai-chip-revenue-2024-2024-01-30/].

**Synthesis and Impact:**
AMD's 2024 AI chip strategy directly exacerbates the competitive risks outlined in NVIDIA's 10-K. The successful launch and adoption of the MI300X is a materialization of the specific risk that a competitor could introduce a product with comparable performance. The adoption of AMD's chips by major cloud providers signifies a direct challenge to NVIDIA's market share in the lucrative data center segment, validating NVIDIA's stated concerns about rapid technological change [Source: Item 1A. Risk Factors. and https://www.cnbc.com/2023/12/06/amd-launches-new-mi300x-ai-chip-to-compete-with-nvidias-h100.html].

這是一次“完全成功”的綜合性回答:

  • 正確總結(jié)了 10-K 的風(fēng)險(xiǎn);
  • 正確總結(jié)了 2024 年 AMD 動(dòng)向;
  • 關(guān)鍵在“綜合與影響”部分:完成了多跳推理,解釋“后者如何加劇前者”;
  • 并提供了來(lái)源溯源(內(nèi)部 section 與外部 URL)。

并排對(duì)比

讓我們把兩種結(jié)果放在一起對(duì)比。

這個(gè)對(duì)比清晰地說(shuō)明:采用循環(huán)、工具感知、自我批判的 agent 架構(gòu),在復(fù)雜真實(shí)查詢上實(shí)現(xiàn)了顯著且可量化的性能提升。

評(píng)估框架與結(jié)果分析

雖然我們?cè)谝粋€(gè)難題上取得了成功,但在生產(chǎn)環(huán)境中需要客觀、量化、自動(dòng)化的驗(yàn)證。

Evaluation Framework (Created by Fareed Khan)Evaluation Framework (Created by Fareed Khan)

我們使用 RAGAs(RAG Assessment)庫(kù),聚焦四個(gè)關(guān)鍵指標(biāo):

  • Context Precision & Recall:衡量檢索質(zhì)量。Precision 問(wèn):檢回的文檔有多少真相關(guān)?Recall 問(wèn):所有相關(guān)文檔中,我們找到了多少?
  • Answer Faithfulness:答案是否扎根于提供的上下文,是防止 LLM 幻覺(jué)的主要檢查。
  • Answer Correctness:最終質(zhì)量度量,與人工撰寫(xiě)的“ground truth”答案對(duì)比,評(píng)估事實(shí)準(zhǔn)確性與完整性。

準(zhǔn)備評(píng)估數(shù)據(jù)集(包含問(wèn)題、兩套系統(tǒng)的答案與上下文、以及 ground truth)并評(píng)測(cè):

from datasets import Dataset
from ragas import evaluate
from ragas.metrics import (
    context_precision,
    context_recall,
    faithfulness,
    answer_correctness,
)
import pandas as pd

print("Preparing evaluation dataset...")

ground_truth_answer_adv = "NVIDIA's 2023 10-K lists intense competition and rapid technological change as key risks. This risk is exacerbated by AMD's 2024 strategy, specifically the launch of the MI300X AI accelerator, which directly competes with NVIDIA's H100 and has been adopted by major cloud providers, threatening NVIDIA's market share in the data center segment."

retrieved_docs_for_baseline_adv = baseline_retriever.invoke(complex_query_adv)
baseline_contexts = [[doc.page_content for doc in retrieved_docs_for_baseline_adv]]

advanced_contexts_flat = []
for step in final_state['past_steps']:
    advanced_contexts_flat.extend([doc.page_content for doc in step['retrieved_docs']])

advanced_contexts = [list(set(advanced_contexts_flat))]

eval_data = {
    'question': [complex_query_adv, complex_query_adv],
    'answer': [baseline_result, final_state['final_answer']],
    'contexts': baseline_contexts + advanced_contexts,
    'ground_truth': [ground_truth_answer_adv, ground_truth_answer_adv]
}

eval_dataset = Dataset.from_dict(eval_data)

metrics = [
    context_precision,
    context_recall,
    faithfulness,
    answer_correctness,
]
print("Running RAGAs evaluation...")

result = evaluate(eval_dataset, metrics=metrics, is_async=False)
print("Evaluation complete.")

results_df = result.to_pandas()
results_df.index = ['baseline_rag', 'deep_thinking_rag']

print("\n--- RAGAs Evaluation Results ---")
print(results_df[['context_precision', 'context_recall', 'faithfulness', 'answer_correctness']].T)

輸出示例:

#### OUTPUT ####
Preparing evaluation dataset...
Running RAGAs evaluation...
Evaluation complete.


--- RAGAs Evaluation Results ---
                     baseline_rag  deep_thinking_rag
context_precision        0.500000           0.890000
context_recall           0.333333           1.000000
faithfulness             1.000000           1.000000
answer_correctness       0.395112           0.991458

量化結(jié)果為 Deep Thinking 架構(gòu)給出明確客觀的優(yōu)勢(shì):

  • Context Precision(0.50 vs 0.89):baseline 只有一半相關(guān),因?yàn)橹荒軝z回關(guān)于“競(jìng)爭(zhēng)”的泛化信息;advanced agent 通過(guò)多步驟、多工具檢索,顯著提升精度。
  • Context Recall(0.33 vs 1.00):baseline 完全錯(cuò)過(guò)了 web 信息,召回低;advanced 通過(guò)規(guī)劃與工具使用,找齊全部必要信息,達(dá)到滿分。
  • Faithfulness(1.00 vs 1.00):兩者都很忠實(shí)。baseline 正確指出自己沒(méi)有信息;advanced 正確使用了找到的信息。忠實(shí)但不正確的答案也沒(méi)意義。
  • Answer Correctness(0.40 vs 0.99):最終質(zhì)量指標(biāo)。baseline 因缺失第二部分分析而低于 40%;advanced 接近完美。

總結(jié)整個(gè)流水線

本文中,我們從一個(gè)簡(jiǎn)單、脆弱的 RAG 流水線,構(gòu)建到一個(gè)復(fù)雜的自治推理 agent:

  • 先搭建 vanilla RAG,并演示它在復(fù)雜多源查詢上的必然失?。?/li>
  • 系統(tǒng)化地打造 Deep Thinking Agent,賦予其規(guī)劃、多工具使用、與自適應(yīng)檢索策略的能力;
  • 構(gòu)建多階段檢索漏斗:先廣召回(hybrid search),再高精度(cross-encoder reranker),最后綜合(distiller agent);
  • 使用 LangGraph 編排整個(gè)認(rèn)知架構(gòu),創(chuàng)建循環(huán)、有狀態(tài)的工作流,實(shí)現(xiàn)真正的多步推理;
  • 加入自我批判循環(huán),讓 agent 能識(shí)別失敗、修訂計(jì)劃、并在無(wú)法得到答案時(shí)優(yōu)雅退出;
  • 最后用 RAGAs 做生產(chǎn)級(jí)評(píng)估,客觀量化證明 advanced agent 的優(yōu)越性能。

使用馬爾可夫決策過(guò)程(MDP)學(xué)習(xí)策略

目前,我們的 Policy Agent(決定 CONTINUE 或 FINISH)依賴于像 GPT-4o 這樣的通用 LLM,每次都要調(diào)用。盡管有效,但在生產(chǎn)環(huán)境可能較慢且昂貴。學(xué)術(shù)前沿提出了更優(yōu)的路徑。

  • 將 RAG 建模為 Decision Process:把 agent 的推理循環(huán)建模為 Markov Decision Process(MDP)。在這個(gè)模型中,每個(gè) RAGState 是“狀態(tài)”,每個(gè) action(CONTINUE、REVISE、FINISH)會(huì)把系統(tǒng)帶入新?tīng)顟B(tài),并獲得一定“獎(jiǎng)勵(lì)”(比如找到正確答案)。
  • 從經(jīng)驗(yàn)中學(xué)習(xí):我們?cè)?LangSmith 中記錄的成千上萬(wàn)次成功/失敗的推理軌跡,都是寶貴的訓(xùn)練數(shù)據(jù)。每條軌跡都是 agent 在這個(gè) MDP 中的一個(gè)例子。
  • 訓(xùn)練 Policy Model:利用這些數(shù)據(jù),可以用 Reinforcement Learning 訓(xùn)練一個(gè)更小、更專門(mén)的 policy 模型。
  • 目標(biāo):速度與效率。把像 GPT-4o 這樣復(fù)雜模型的決策能力,蒸餾到一個(gè)更小(例如 7B)的模型里,使 CONTINUE/FINISH 的決策更快、更省,同時(shí)高度針對(duì)我們的領(lǐng)域。這是諸多研究(如 DeepRAG)的核心思想,也是自治 RAG 系統(tǒng)優(yōu)化的下一階段。
責(zé)任編輯:武曉燕 來(lái)源: AI大模型觀察站
相關(guān)推薦

2023-08-18 10:24:52

GitLabCI 流水線

2017-03-02 14:12:13

流水線代碼Clojure

2019-11-07 09:00:39

Jenkins流水線開(kāi)源

2021-01-05 08:39:51

容器前端流水線

2017-03-15 10:08:26

軟件開(kāi)發(fā)流水線

2017-02-28 15:40:30

Docker流水線Azure

2013-06-06 09:31:52

2021-06-26 14:22:34

Tekton流水線Kubernetes

2022-01-26 08:12:42

Jenkins開(kāi)源流水線

2017-02-28 16:00:45

DevOpsMarkdownreST

2022-07-18 06:05:28

Gitlab流水線

2023-05-10 15:08:00

Pipeline設(shè)計(jì)模式

2024-01-07 12:47:35

Golang流水線設(shè)計(jì)模式

2021-11-08 07:41:16

Go流水線編程

2019-11-07 10:02:33

開(kāi)源開(kāi)源工具DevOps

2021-12-24 08:02:48

GitLabCI模板庫(kù)流水線優(yōu)化

2021-06-28 06:32:46

Tekton Kubernetes Clone

2021-06-18 05:48:02

Tekton DevopsKubernetes

2023-09-27 08:24:49

2025-07-04 09:02:48

點(diǎn)贊
收藏

51CTO技術(shù)棧公眾號(hào)