偷偷摘套内射激情视频,久久精品99国产国产精,中文字幕无线乱码人妻,中文在线中文a,性爽19p

AI總是忘事?教你9招,讓智能體“記性”變超強(qiáng)! 原創(chuàng)

發(fā)布于 2025-7-18 14:26
瀏覽
0收藏

從滑動(dòng)窗口到類操作系統(tǒng)記憶的測(cè)試與解析

9種技巧

優(yōu)化AI代理記憶的9種技巧:從入門到高級(jí)

優(yōu)化AI代理的一種方法是設(shè)計(jì)多子代理架構(gòu)以提升準(zhǔn)確性。然而,在對(duì)話型AI中,優(yōu)化遠(yuǎn)不止于此——memory變得尤為關(guān)鍵。

隨著你與AI代理的對(duì)話越來越長(zhǎng)、越來越深入,它使用的memory會(huì)越來越多。這是因?yàn)锳I依賴于諸如歷史上下文存儲(chǔ)、工具調(diào)用、數(shù)據(jù)庫搜索等組件。

在這篇博客中,我們將編寫代碼并評(píng)估9種從入門到高級(jí)的memory optimization技巧,幫助你了解如何應(yīng)用每種技巧,以及它們的優(yōu)缺點(diǎn)——從簡(jiǎn)單的sequential approach到高級(jí)的OS-like memory management實(shí)現(xiàn)。

AI總是忘事?教你9招,讓智能體“記性”變超強(qiáng)!-AI.x社區(qū)

技巧總結(jié)

為了保持清晰和實(shí)用性,我們將全程使用一個(gè)簡(jiǎn)單的AI代理,觀察每種技巧的內(nèi)部機(jī)制,便于在更復(fù)雜系統(tǒng)中擴(kuò)展和實(shí)現(xiàn)這些策略。

所有代碼(理論+筆記本)都可在我的GitHub倉(cāng)庫獲?。?br>???https://github.com/PulsarPioneers/Multi-Agent-AI-System??

目錄

  • 環(huán)境設(shè)置
  • 創(chuàng)建輔助函數(shù)
  • 創(chuàng)建基礎(chǔ)代理和Memory Class
  • Sequential Optimization Approach的問題
  • Sliding Window Approach
  • Summarization Based Optimization
  • Retrieval Based Memory
  • Memory Augmented Transformers
  • Hierarchical Optimizationfor Multi-tasks
  • Graph Based Optimization
  • Compression & Consolidation Memory
  • OS-Like Memory Management
  • 選擇合適的策略

環(huán)境設(shè)置

為了優(yōu)化和測(cè)試AI代理的memory techniques,我們需要先初始化一些組件。但在初始化之前,得先安裝必要的Python庫:

  • openai:用于與LLM API交互的客戶端庫。
  • numpy:用于數(shù)值運(yùn)算,特別是處理embeddings。
  • faiss-cpu:Facebook AI的庫,用于高效相似性搜索,驅(qū)動(dòng)我們的retrieval memory,堪稱完美的內(nèi)存向量數(shù)據(jù)庫。
  • networkx:用于創(chuàng)建和管理Graph-Based Memory中的knowledge graph
  • tiktoken:用于精確計(jì)算tokens并管理上下文窗口限制。

安裝這些模塊:

pip install openai numpy faiss-cpu networkx tiktoken

接下來,初始化client module以調(diào)用LLM

import os
from openai import OpenAI

API_KEY = "YOUR_LLM_API_KEY"
BASE_URL = "https://api.studio.nebius.com/v1/"

client = OpenAI(
    base_url=BASE_URL,
    api_key=API_KEY
)

print("OpenAI client configured successfully.")

我們將通過BnebiusTogether AI等API提供商使用開源模型。接下來,導(dǎo)入并選擇用于創(chuàng)建AI代理的開源LLM

import tiktoken
import time

GENERATION_MODEL = "meta-llama/Meta-Llama-3.1-8B-Instruct"
EMBEDDING_MODEL = "BAAI/bge-multilingual-gemma2"

主要任務(wù)使用LLaMA 3.1 8B Instruct模型,部分優(yōu)化依賴embedding model,我們將使用Gemma-2-BGE多模態(tài)嵌入模型。

接下來,定義多個(gè)輔助函數(shù),貫穿整個(gè)博客使用。

創(chuàng)建輔助函數(shù)

為了避免重復(fù)代碼并遵循良好編碼習(xí)慣,我們將定義三個(gè)輔助函數(shù):

  • generate_text:根據(jù)系統(tǒng)和用戶prompts生成內(nèi)容。
  • generate_embeddings:為retrieval-based方法生成embeddings。
  • count_tokens:為每種retrieval-based方法計(jì)算總tokens數(shù)。

先編碼generate_text函數(shù),根據(jù)輸入prompt生成文本:

def generate_text(system_prompt: str, user_prompt: str) -> str:
    """
    調(diào)用LLM API生成文本響應(yīng)。
    
    參數(shù):
        system_prompt: 定義AI角色和行為的指令。
        user_prompt: 用戶輸入,AI需對(duì)此響應(yīng)。
        
    返回:
        AI生成的文本內(nèi)容,或錯(cuò)誤信息。
    """
    response = client.chat.completions.create(
        model=GENERATION_MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ]
    )
    return response.choices[0].message.content

generate_text函數(shù)接受system promptuser prompt,基于LLaMA 3.1 8B生成響應(yīng)。

接下來,編碼generate_embeddings函數(shù),使用Gemma-2模型生成embeddings

def generate_embedding(text: str) -> list[float]:
    """
    使用嵌入模型為給定文本生成數(shù)值嵌入。
    
    參數(shù):
        text: 要轉(zhuǎn)換為嵌入的輸入字符串。
        
    返回:
        表示嵌入向量的浮點(diǎn)數(shù)列表,或錯(cuò)誤時(shí)返回空列表。
    """
    response = client.embeddings.create(
        model=EMBEDDING_MODEL,
        input=text
    )
    return response.data[0].embedding

embedding函數(shù)使用Gemma-2模型返回輸入文本的嵌入。

最后,創(chuàng)建一個(gè)函數(shù),基于整個(gè)AI和用戶聊天歷史計(jì)算tokens,幫助了解優(yōu)化效果:

我們將使用常見的現(xiàn)代tokenizer——OpenAI cl100k_base,這是一個(gè)**Byte Pair Encoding (BPE)**分詞器。簡(jiǎn)單來說,BPE是一種高效地將文本拆分為子詞單元的算法。

BPE示例
???"lower", "lowest" → ["low", "er"], ["low", "est"]??

初始化tokenizer

tokenizer = tiktoken.get_encoding("cl100k_base")

現(xiàn)在創(chuàng)建函數(shù)來分詞并計(jì)算tokens總數(shù):

def count_tokens(text: str) -> int:
    """
    使用預(yù)加載的tokenizer計(jì)算給定字符串的token數(shù)。
    
    參數(shù):
        text: 要分詞的字符串。
        
    返回:
        token數(shù)的整數(shù)。
    """
    return len(tokenizer.encode(text))

搞定!輔助函數(shù)創(chuàng)建完畢,我們可以開始探索和評(píng)估不同技巧。

創(chuàng)建基礎(chǔ)代理和Memory Class

現(xiàn)在需要?jiǎng)?chuàng)建代理的核心設(shè)計(jì)結(jié)構(gòu),貫穿整個(gè)指南使用。關(guān)于memory,AI代理有三個(gè)關(guān)鍵組件:

  • 將歷史消息添加到AI代理的memory,使其了解上下文。
  • 檢索相關(guān)內(nèi)容,幫助AI生成響應(yīng)。

在每種策略實(shí)施后清除AI代理的memory。

Object-Oriented Programming (OOP)是構(gòu)建基于memory功能的最佳方式,我們來實(shí)現(xiàn):

import abc

class BaseMemoryStrategy(abc.ABC):
    """所有memory策略的抽象基類。"""
    
    @abc.abstractmethod
    def add_message(self, user_input: str, ai_response: str):
        """添加新的用戶-AI交互到memory存儲(chǔ)。"""
        pass

    @abc.abstractmethod
    def get_context(self, query: str) -> str:
        """從memory檢索并格式化相關(guān)上下文發(fā)送給LLM。"""
        pass

    @abc.abstractmethod
    def clear(self):
        """重置memory,適用于開始新對(duì)話。"""
        pass

我們使用**@abstractmethod**,這是子類復(fù)用不同實(shí)現(xiàn)時(shí)的常見編碼風(fēng)格。每種策略(子類)包含不同實(shí)現(xiàn),因此設(shè)計(jì)中需要抽象方法。

基于剛定義的memory state和輔助函數(shù),我們使用OOP原則構(gòu)建AI代理結(jié)構(gòu):

class AIAgent:
    """主AI代理類,設(shè)計(jì)為可與任何memory策略配合使用。"""
    
    def __init__(self, memory_strategy: BaseMemoryStrategy, system_prompt: str = "You are a helpful AI assistant."):
        """
        初始化代理。
        
        參數(shù):
            memory_strategy: 繼承自BaseMemoryStrategy的實(shí)例,決定代理如何記憶對(duì)話。
            system_prompt: 給LLM的初始指令,定義其角色和任務(wù)。
        """
        self.memory = memory_strategy
        self.system_prompt = system_prompt
        print(f"Agent initialized with {type(memory_strategy).__name__}.")

    def chat(self, user_input: str):
        """
        處理對(duì)話中的一個(gè)回合。
        
        參數(shù):
            user_input: 用戶的最新消息。
        """
        print(f"\n{'='*25} NEW INTERACTION {'='*25}")
        print(f"User > {user_input}")
        
        start_time = time.time()
        context = self.memory.get_context(query=user_input)
        retrieval_time = time.time() - start_time
        
        full_user_prompt = f"### MEMORY CONTEXT\n{context}\n\n### CURRENT REQUEST\n{user_input}"
        
        prompt_tokens = count_tokens(self.system_prompt + full_user_prompt)
        print("\n--- Agent Debug Info ---")
        print(f"Memory Retrieval Time: {retrieval_time:.4f} seconds")
        print(f"Estimated Prompt Tokens: {prompt_tokens}")
        print(f"\n[Full Prompt Sent to LLM]:\n---\nSYSTEM: {self.system_prompt}\nUSER: {full_user_prompt}\n---")
        
        start_time = time.time()
        ai_response = generate_text(self.system_prompt, full_user_prompt)
        generation_time = time.time() - start_time
        
        self.memory.add_message(user_input, ai_response)
        
        print(f"\nAgent > {ai_response}")
        print(f"(LLM Generation Time: {generation_time:.4f} seconds)")
        print(f"{'='*70}")

代理基于6個(gè)簡(jiǎn)單步驟:

1. 根據(jù)使用的策略從memory檢索上下文,記錄時(shí)間等。

2. 將檢索的memory context與當(dāng)前用戶輸入合并,準(zhǔn)備完整的prompt。

3. 打印調(diào)試信息,如prompttokens數(shù)和上下文檢索時(shí)間。

4. 將完整prompt(系統(tǒng)+用戶+上下文)發(fā)送給LLM,等待響應(yīng)。

5. 用新交互更新memory,供未來上下文檢索使用。

6. 顯示AI響應(yīng)及生成時(shí)間,結(jié)束此回合。

好了!組件編碼完成,我們開始理解和實(shí)現(xiàn)每種memory optimization技巧。

Sequential Optimization Approa 的問題

這是最基礎(chǔ)、最簡(jiǎn)單的優(yōu)化方法,許多開發(fā)者常用,是早期管理對(duì)話歷史的常用方式,常用于早期chatbots

該方法將每條新消息添加到運(yùn)行日志,并每次將整個(gè)對(duì)話反饋給模型,形成線性memory鏈,保留所有對(duì)話內(nèi)容。讓我們來可視化:

AI總是忘事?教你9招,讓智能體“記性”變超強(qiáng)!-AI.x社區(qū)


Sequential Approach工作方式:

1. 用戶與AI代理開始對(duì)話。

2. 代理響應(yīng)。

3. 用戶-AI交互(一個(gè)“回合”)保存為單一文本塊。

4. 下一回合,代理獲取整個(gè)歷史(回合1+回合2+回合3…)并與新用戶查詢結(jié)合。

5. 這個(gè)巨大的文本塊發(fā)送給LLM生成下一次響應(yīng)。

使用我們的Memory Class實(shí)現(xiàn)sequential optimization

class SequentialMemory(BaseMemoryStrategy):
    def __init__(self):
        """初始化memory,包含一個(gè)空列表存儲(chǔ)對(duì)話歷史。"""
        self.history = []

    def add_message(self, user_input: str, ai_response: str):
        """將新的用戶-AI交互添加到歷史。"""
        self.history.append({"role": "user", "content": user_input})
        self.history.append({"role": "assistant", "content": ai_response})

    def get_context(self, query: str) -> str:
        """檢索整個(gè)對(duì)話歷史,格式化為單一字符串作為L(zhǎng)LM的上下文。"""
        return "\n".join([f"{turn['role'].capitalize()}: {turn['content']}" for turn in self.history])

    def clear(self):
        """通過清空列表重置對(duì)話歷史。"""
        self.history = []
        print("Sequential memory cleared.")

代碼解析:

  • init(self):初始化空的self.history列表存儲(chǔ)對(duì)話。
  • add_message(...):添加用戶輸入和AI響應(yīng)到歷史。
  • get_context(...):將歷史格式化為“Role: Content”字符串作為上下文。
  • clear():為新對(duì)話重置歷史。

初始化memory class并構(gòu)建AI代理:

sequential_memory = SequentialMemory()
agent = AIAgent(memory_strategy=sequential_memory)

測(cè)試sequential approach,創(chuàng)建多回合對(duì)話:

agent.chat("Hi there! My name is Sam.")
agent.chat("I'm interested in learning about space exploration.")
agent.chat("What was the first thing I told you?")

輸出

==== NEW INTERACTION ====
User: Hi there! My name is Sam.  
Bot: Hello Sam! Nice to meet you. What brings you here today?  
>>>> Tokens: 23 | Response Time: 2.25s

==== NEW INTERACTION ====
User: I am interested in learning about space exploration.  
Bot: Awesome! Are you curious about:  
- Mars missions  
- Space agencies  
- Private companies (e.g., SpaceX)  
- Space tourism  
- Search for alien life?  
...  
>>>> Tokens: 92 | Response Time: 4.46s

==== NEW INTERACTION ====
User: What was the first thing I told you?  
Bot: You said, "Hi there

! My name is Sam."  
...  
>>>> Tokens: 378 | Response Time: 0.52s

對(duì)話很順暢,但注意token計(jì)算,每回合后tokens數(shù)越來越大。我們的代理不依賴顯著增加token的外部工具,因此增長(zhǎng)完全來自消息的sequential accumulation

缺點(diǎn):對(duì)話越大,token成本越高,sequential approach成本高昂。

Sliding Window Approach

為避免大上下文問題,接下來聚焦sliding window approach,代理無需記住所有歷史消息,只保留最近N條消息的上下文。

AI總是忘事?教你9招,讓智能體“記性”變超強(qiáng)!-AI.x社區(qū)

代理僅保留最近N條消息作為上下文,新消息到達(dá)時(shí),最舊的消息被丟棄,窗口向前滑動(dòng)。

Sliding Window Approach流程:

1. 定義固定窗口大小,如N=2回合。

2. 前兩回合填滿窗口。

3. 第三回合時(shí),第一個(gè)回合被推出窗口。

4. 發(fā)送給LLM的上下文僅為當(dāng)前窗口內(nèi)的內(nèi)容。

實(shí)現(xiàn)Sliding Window Memory類:

from collections import deque

class SlidingWindowMemory(BaseMemoryStrategy):
    def __init__(self, window_size: int = 4):
        """
        初始化memory,使用固定大小的deque。
        
        參數(shù):
            window_size: 保留的對(duì)話回合數(shù)(用戶+AI=1回合)。
        """
        self.history = deque(maxlen=window_size)

    def add_message(self, user_input: str, ai_response: str):
        """添加新對(duì)話回合到歷史,deque滿時(shí)自動(dòng)移除最舊回合。"""
        self.history.append([
            {"role": "user", "content": user_input},
            {"role": "assistant", "content": ai_response}
        ])

    def get_context(self, query: str) -> str:
        """檢索當(dāng)前窗口內(nèi)的對(duì)話歷史,格式化為單一字符串。"""
        context_list = []
        for turn in self.history:
            for message in turn:
                context_list.append(f"{message['role'].capitalize()}: {message['content']}")
        return "\n".join(context_list)

sequentialsliding memory類相似,區(qū)別在于添加了上下文窗口。代碼解析:

  • init(self, window_size=2):設(shè)置固定大小的deque,實(shí)現(xiàn)上下文窗口的自動(dòng)滑動(dòng)。
  • add_message(...):添加新回合,deque滿時(shí)丟棄舊條目。
  • get_context(...):僅從當(dāng)前滑動(dòng)窗口內(nèi)的消息構(gòu)建上下文。

初始化sliding window并構(gòu)建AI代理:

sliding_memory = SlidingWindowMemory(window_size=2)
agent = AIAgent(memory_strategy=sliding_memory)

測(cè)試優(yōu)化方法,創(chuàng)建多回合對(duì)話:

agent.chat("My name is Priya and I'm a software developer.")
agent.chat("I work primarily with Python and cloud technologies.")
agent.chat("My favorite hobby is hiking.")

輸出

==== NEW INTERACTION ====
User: My name is Priya and I am a software developer.  
Bot: Nice to meet you, Priya! What can I assist you with today?  
>>>> Tokens: 27 | Response Time: 1.10s

==== NEW INTERACTION ====
User: I work primarily with Python and cloud technologies.  
Bot: That is great! Given your expertise...  
>>>> Tokens: 81 | Response Time: 1.40s

==== NEW INTERACTION ====
User: My favorite hobby is hiking.  
Bot: It seems we had a nice conversation about your background...  
>>>> Tokens: 167 | Response Time: 1.59s

對(duì)話與sequential approach類似?,F(xiàn)在,測(cè)試用戶詢問窗口外的信息:

agent.chat("What is my name?")

輸出

==== NEW INTERACTION ====
User: What is my name?  
Bot: I apologize, but I dont have access to your name from our recent conversation. Could you please remind me?  
>>>> Tokens: 197 | Response Time: 0.60s

AI代理無法回答,因?yàn)橄嚓P(guān)上下文已超出滑動(dòng)窗口。token數(shù)減少,但重要上下文可能丟失。滑動(dòng)窗口大小需根據(jù)AI代理類型定制。

Summarization Based Optimization

sequential approach有巨大上下文問題,sliding window可能丟失重要上下文。需要一種方法壓縮上下文而不丟失關(guān)鍵信息,這就是summarization。

AI總是忘事?教你9招,讓智能體“記性”變超強(qiáng)!-AI.x社區(qū)

Summarization Approach流程:

1. 最近消息存儲(chǔ)在臨時(shí)“buffer”中。

2. buffer達(dá)到一定大?。ā?strong>threshold”)時(shí),代理暫停并觸發(fā)動(dòng)作。

3. 將buffer內(nèi)容和之前summary發(fā)送給LLM,要求生成新的、合并的summary。

4. LLM生成新summary,替換舊的,buffer清空。

實(shí)現(xiàn)summarization optimization

class SummarizationMemory(BaseMemoryStrategy):
    def __init__(self, summary_threshold: int = 4):
        """
        初始化summarization memory。
        
        參數(shù):
            summary_threshold: 觸發(fā)summarization的消息數(shù)(用戶+AI)。
        """
        self.running_summary = ""
        self.buffer = []
        self.summary_threshold = summary_threshold

    def add_message(self, user_input: str, ai_response: str):
        """添加新交互到buffer,buffer滿時(shí)觸發(fā)memory consolidation。"""
        self.buffer.append({"role": "user", "content": user_input})
        self.buffer.append({"role": "assistant", "content": ai_response})

        if len(self.buffer) >= self.summary_threshold:
            self._consolidate_memory()

    def _consolidate_memory(self):
        """使用LLM總結(jié)buffer內(nèi)容并與現(xiàn)有running summary合并。"""
        print("\n--- [Memory Consolidation Triggered] ---")
        buffer_text = "\n".join([f"{msg['role'].capitalize()}: {msg['content']}" for msg in self.buffer])
        
        summarization_prompt = (
            f"You are a summarization expert. Your task is to create a concise summary of a conversation. "
            f"Combine the 'Previous Summary' with the 'New Conversation' into a single, updated summary. "
            f"Capture all key facts, names, and decisions.\n\n"
            f"### Previous Summary:\n{self.running_summary}\n\n"
            f"### New Conversation:\n{buffer_text}\n\n"
            f"### Updated Summary:"
        )
        
        new_summary = generate_text("You are an expert summarization engine.", summarization_prompt)
        self.running_summary = new_summary
        self.buffer = []
        print(f"--- [New Summary: '{self.running_summary}'] ---")

    def get_context(self, query: str) -> str:
        """構(gòu)建上下文,結(jié)合長(zhǎng)期running summary和短期buffer。"""
        buffer_text = "\n".join([f"{msg['role'].capitalize()}: {msg['content']}" for msg in self.buffer])
        return f"### Summary of Past Conversation:\n{self.running_summary}\n\n### Recent Messages:\n{buffer_text}"

代碼解析:

  • init(...):設(shè)置空的running_summarybuffer列表。
  • add_message(...):將消息添加到buffer,達(dá)到summary_threshold時(shí)調(diào)用**_consolidate_memory**。
  • _consolidate_memory():格式化buffer和現(xiàn)有summary,請(qǐng)求LLM生成新summary,更新running_summary并清空buffer。
  • get_context(...):提供長(zhǎng)期summary和短期buffer,給LLM完整對(duì)話視圖。

初始化并測(cè)試:

summarization_memory = SummarizationMemory(summary_threshold=4)
agent = AIAgent(memory_strategy=summarization_memory)

agent.chat("I'm starting a new company called 'Innovatech'. Our focus is on sustainable energy.")
agent.chat("Our first product will be a smart solar panel, codenamed 'Project Helios'.")

輸出

==== NEW INTERACTION ====
User: I am starting a new company called 'Innovatech'. Ou...
Bot: Congratulations on starting Innovatech! Focusing o ...  
>>>> Tokens: 45 | Response Time: 2.55s

==== NEW INTERACTION ====
User: Our first product will be a smart solar panel....  
--- [Memory Consolidation Triggered] ---  
--- [New Summary: The user started a compan ...  
Bot: That is exciting news about  ....  
>>>> Tokens: 204 | Response Time: 3.58s

兩回合后生成summary。繼續(xù)測(cè)試:

agent.chat("The marketing budget is set at $50,000.")
agent.chat("What is the name of my company and its first product?")

輸出

==== NEW INTERACTION ====
User: What is the name of my company and its first product?  
Bot: Your company is called 'Innovatech' and its first product is codenamed 'Project Helios'.  
>>>> Tokens: 147 | Response Time: 1.05s

第四回合token數(shù)幾乎減半,summarization大大降低token使用。但需精心設(shè)計(jì)summarization prompts以捕捉關(guān)鍵細(xì)節(jié)。

缺點(diǎn):關(guān)鍵信息可能在summarization中丟失。例如,40回合對(duì)話包含數(shù)值或事實(shí)細(xì)節(jié)(如第四回合的銷售數(shù)據(jù)),可能不再出現(xiàn)在summary中。

測(cè)試40回合后的場(chǎng)景:

agent.chat("what was the gross sales of our company in the fiscal year?")

輸出

==== NEW INTERACTION ====
User: what was the gross sales of our company in the fiscal year?  
Bot: I am sorry but I do not have that information. Could you please provide the gross sales figure for the fiscal year?  
>>>> Tokens: 1532 | Response Time: 2.831s

summary信息雖減少tokens,但答案質(zhì)量可能顯著下降。建議創(chuàng)建子代理進(jìn)行fact-checking,提升可靠性。

Retrieval Based Memory

這是許多AI代理用例中最強(qiáng)大的策略:RAG-based AI agents。之前的方法減少token使用但可能丟失上下文,RAG通過基于當(dāng)前用戶查詢檢索相關(guān)上下文解決此問題。

AI總是忘事?教你9招,讓智能體“記性”變超強(qiáng)!-AI.x社區(qū)

上下文存儲(chǔ)在數(shù)據(jù)庫中,embedding models將文本轉(zhuǎn)換為向量表示,提升檢索效率。

RAG Based Memory流程:

1. 新交互保存為數(shù)據(jù)庫中的“document”,生成其數(shù)值表示(embedding)并存儲(chǔ)。

2. 用戶發(fā)送新消息,代理將其轉(zhuǎn)換為embedding。

3. 使用查詢embedding對(duì)所有document embeddings進(jìn)行相似性搜索。

4. 檢索語義上最相關(guān)的k個(gè)documents(如3個(gè)最相似的歷史回合)。

5. 僅將這些相關(guān)documents注入LLM的上下文窗口。

使用FAISS進(jìn)行向量存儲(chǔ):

import numpy as np
import faiss

class RetrievalMemory(BaseMemoryStrategy):
    def __init__(self, k: int = 2, embedding_dim: int = 3584):
        """
        初始化retrieval memory系統(tǒng)。
        
        參數(shù):
            k: 檢索的top相關(guān)documents數(shù)。
            embedding_dim: 嵌入模型生成的向量維度,BAAI/bge-multilingual-gemma2為3584。
        """
        self.k = k
        self.embedding_dim = embedding_dim
        self.documents = []
        self.index = faiss.IndexFlatL2(self.embedding_dim)

    def add_message(self, user_input: str, ai_response: str):
        """添加新對(duì)話回合到memory,分別嵌入和索引用戶和AI消息。"""
        docs_to_add = [
            f"User said: {user_input}",
            f"AI responded: {ai_response}"
        ]
        for doc in docs_to_add:
            embedding = generate_embedding(doc)
            if embedding:
                self.documents.append(doc)
                vector = np.array([embedding], dtype='float32')
                self.index.add(vector)

    def get_context(self, query: str) -> str:
        """根據(jù)語義相似性檢索k個(gè)最相關(guān)documents。"""
        if self.index.ntotal == 0:
            return "No information in memory yet."
        
        query_embedding = generate_embedding(query)
        if not query_embedding:
            return "Could not process query for retrieval."
        
        query_vector = np.array([query_embedding], dtype='float32')
        distances, indices = self.index.search(query_vector, self.k)
        
        retrieved_docs = [self.documents[i] for i in indices[0] if i != -1]
        if not retrieved_docs:
            return "Could not find any relevant information in memory."
        
        return "### Relevant Information Retrieved from Memory:\n" + "\n---\n".join(retrieved_docs)

代碼解析:

  • init(...):初始化documents列表和faiss.IndexFlatL2存儲(chǔ)搜索向量,指定embedding_dim。
  • add_message(...):為用戶和AI消息生成embedding,添加到documentsFAISS index。
  • get_context(...):嵌入用戶查詢,使用self.index.search查找k個(gè)最相似向量,提取原始文本作為上下文。

初始化并測(cè)試:

retrieval_memory = RetrievalMemory(k=2)
agent = AIAgent(memory_strategy=retrieval_memory)

agent.chat("I am planning a vacation to Japan for next spring.")
agent.chat("For my software project, I'm using the React framework for the frontend.")
agent.chat("I want to visit Tokyo and Kyoto while I'm on my trip.")
agent.chat("The backend of my project will be built with Django.")
agent.chat("What cities am I planning to visit on my vacation?")

輸出

==== NEW INTERACTION ====
User: What cities am I planning to visit on my vacation?  
--- Agent Debug Info ---  
[Full Prompt Sent to LLM]:  
---  
SYSTEM: You are a helpful AI assistant.  
USER: MEMORY CONTEXT  
Relevant Information Retrieved from Memory:  
User said: I want to visit Tokyo and Kyoto while I am on my trip.  
---  
User said: I am planning a vacation to Japan for next spring.  
...  

Bot: You are planning to visit Tokyo and Kyoto while on your vacation to Japan next spring.  
>>>> Tokens: 65 | Response Time: 0.53s

成功檢索相關(guān)上下文,token數(shù)極低,僅檢索相關(guān)信息。embedding modelvector storage database的選擇至關(guān)重要,FAISS因其高效性廣受歡迎。但數(shù)據(jù)庫越大,AI代理復(fù)雜度越高,需并行查詢等優(yōu)化技術(shù)。

Memory Augmented Transformers

AI系統(tǒng)正采用更復(fù)雜的方法,突破可能性的邊界。

想象普通AI像一個(gè)學(xué)生,只有一個(gè)小筆記本,寫的內(nèi)容有限。在長(zhǎng)考試中,他們得擦掉舊筆記為新筆記騰空間。Memory-Augmented Transformers就像給學(xué)生一堆便簽,筆記本處理當(dāng)前工作,便簽保存早期關(guān)鍵信息。

AI總是忘事?教你9招,讓智能體“記性”變超強(qiáng)!-AI.x社區(qū)

例如:設(shè)計(jì)一個(gè)無暴力的太空視頻游戲,早期提到“太空設(shè)定,無暴力”。普通AI可能忘記,但memory-augmentedAI將此寫在便簽上,稍后查詢時(shí)仍能匹配原始愿景。

Memory Augmented Transformers流程:

  • 使用SlidingWindowMemory管理近期聊天。
  • 每回合后,使用LLM作為“fact extractor”,分析對(duì)話,決定是否包含核心事實(shí)、偏好或決定。
  • 若發(fā)現(xiàn)重要事實(shí),存儲(chǔ)為memory token(簡(jiǎn)潔字符串)。
  • 提供給代理的最終上下文結(jié)合近期聊天窗口和所有持久memory tokens。

實(shí)現(xiàn):

class MemoryAugmentedMemory(BaseMemoryStrategy):
    def __init__(self, window_size: int = 2):
        """
        初始化memory-augmented系統(tǒng)。
        
        參數(shù):
            window_size: 短期memory保留的最近回合數(shù)。
        """
        self.recent_memory = SlidingWindowMemory(window_size=window_size)
        self.memory_tokens = []

    def add_message(self, user_input: str, ai_response: str):
        """添加回合到近期memory,并使用LLM決定是否創(chuàng)建持久memory token。"""
        self.recent_memory.add_message(user_input, ai_response)
        
        fact_extraction_prompt = (
            f"Analyze the following conversation turn. Does it contain a core fact, preference, or decision that should be remembered long-term? "
            f"Examples include user preferences ('I hate flying'), key decisions ('The budget is $1000'), or important facts ('My user ID is 12345').\n\n"
            f"Conversation Turn:\nUser: {user_input}\nAI: {ai_response}\n\n"
            f"If it contains such a fact, state the fact concisely in one sentence. Otherwise, respond with 'No important fact.'"
        )
        
        extracted_fact = generate_text("You are a fact-extraction expert.", fact_extraction_prompt)
        
        if "no important fact" not in extracted_fact.lower():
            print(f"--- [Memory Augmentation: New memory token created: '{extracted_fact}'] ---")
            self.memory_tokens.append(extracted_fact)

    def get_context(self, query: str) -> str:
        """結(jié)合短期近期對(duì)話和長(zhǎng)期memory tokens構(gòu)建上下文。"""
        recent_context = self.recent_memory.get_context(query)
        memory_token_context = "\n".join([f"- {token}" for token in self.memory_tokens])
        return f"### Key Memory Tokens (Long-Term Facts):\n{memory_token_context}\n\n### Recent Conversation:\n{recent_context}"

代碼解析:

  • init(...):初始化SlidingWindowMemory和空的memory_tokens列表。
  • add_message(...):添加回合到滑動(dòng)窗口,額外調(diào)用LLM檢查是否提取關(guān)鍵事實(shí),添加到memory_tokens
  • get_context(...):結(jié)合“便簽”(memory_tokens)和近期聊天歷史構(gòu)建豐富prompt。

初始化并測(cè)試:

mem_aug_memory = MemoryAugmentedMemory(window_size=2)
agent = AIAgent(memory_strategy=mem_aug_memory)

agent.chat("Please remember this for all future interactions: I am severely allergic to peanuts.")
agent.chat("Okay, let's talk about recipes. What's a good idea for dinner tonight?")
agent.chat("That sounds good. What about a dessert option?")
agent.chat("Could you suggest a Thai green curry recipe? Please ensure it's safe for me.")

輸出

==== NEW INTERACTION ====
User: Please remember this for all future interactions: I am severely allergic to peanuts.  
--- [Memory Augmentation: New memory token created: 'The user has a severe allergy to peanuts.'] ---  
Bot: I have taken note of your long-term fact: You are severely allergic to peanuts. I will keep this in mind...  
>>>> Tokens: 45 | Response Time: 1.32s

...

==== NEW INTERACTION ====
User: Could you suggest a Thai green curry recipe? Please ensure it is safe for me.  
--- Agent Debug Info ---  
[Full Prompt Sent to LLM]:  
---  
SYSTEM: You are a helpful AI assistant.  
USER: MEMORY CONTEXT  
Key Memory Tokens (Long-Term Facts):  
- The user has a severe allergy to peanuts.  
...  
Recent Conversation:  
User: Okay, lets talk about recipes...  
...  

Bot: Of course. Given your peanut allergy, it is very important to be careful with Thai cuisine as many recipes use peanuts or peanut oil. Here is a peanut-free Thai green curry recipe...  
>>>> Tokens: 712 | Response Time: 6.45s

此方法因需額外LLM調(diào)用進(jìn)行fact extraction,復(fù)雜且成本高,但能長(zhǎng)期保留關(guān)鍵信息,非常適合構(gòu)建可靠的個(gè)人助手。

Hierarchical Optimization for Multi-tasks

之前我們將memory視為單一系統(tǒng)。如果代理能像人類一樣,擁有不同用途的memory類型呢?這就是Hierarchical Memory的理念,結(jié)合多種簡(jiǎn)單memory類型,創(chuàng)建更復(fù)雜、有組織的智能系統(tǒng)。

類比人類記憶:

  • Working Memory:最近聽到的幾句話,快速但短暫。
  • Short-Term Memory:今天早上會(huì)議的要點(diǎn),幾小時(shí)內(nèi)易回憶。
  • -Term Memory:家庭地址或多年前學(xué)到的關(guān)鍵事實(shí),持久且深入。AI總是忘事?教你9招,讓智能體“記性”變超強(qiáng)!-AI.x社區(qū)

Hierarchical Approach流程:

1. 捕獲用戶消息到working memory。

2. 檢查信息是否重要,需提升至long-term memory。

3. 提升內(nèi)容存儲(chǔ)到retrieval memory供未來使用。

4. 新查詢時(shí),搜索long-term memory獲取相關(guān)上下文。

5. 將相關(guān)memories注入上下文,生成更好響應(yīng)。

實(shí)現(xiàn):

class HierarchicalMemory(BaseMemoryStrategy):
    def __init__(self, window_size: int = 2, k: int = 2, embedding_dim: int = 3584):
        """
        初始化hierarchical memory系統(tǒng)。
        
        參數(shù):
            window_size: 短期working memory的回合數(shù)。
            k: 從long-term memory檢索的documents數(shù)。
            embedding_dim: long-term memory的嵌入向量維度。
        """
        print("Initializing Hierarchical Memory...")
        self.working_memory = SlidingWindowMemory(window_size=window_size)
        self.long_term_memory = RetrievalMemory(k=k, embedding_dim=embedding_dim)
        self.promotion_keywords = ["remember", "rule", "preference", "always", "never", "allergic"]

    def add_message(self, user_input: str, ai_response: str):
        """添加消息到working memory,基于內(nèi)容有條件提升到long-term memory。"""
        self.working_memory.add_message(user_input, ai_response)
        
        if any(keyword in user_input.lower() for keyword in self.promotion_keywords):
            print(f"--- [Hierarchical Memory: Promoting message to long-term storage.] ---")
            self.long_term_memory.add_message(user_input, ai_response)

    def get_context(self, query: str) -> str:
        """結(jié)合long-term和short-term memory層構(gòu)建豐富上下文。"""
        working_context = self.working_memory.get_context(query)
        long_term_context = self.long_term_memory.get_context(query)
        return f"### Retrieved Long-Term Memories:\n{long_term_context}\n\n### Recent Conversation (Working Memory):\n{working_context}"

代碼解析:

  • init(...):初始化SlidingWindowMemoryRetrievalMemory,定義promotion_keywords。
  • add_message(...):添加消息到working_memory,檢查是否包含keywords,若有則添加到long_term_memory
  • get_context(...):從兩種memory系統(tǒng)獲取上下文,合并為豐富prompt。

初始化并測(cè)試:

hierarchical_memory = HierarchicalMemory()
agent = AIAgent(memory_strategy=hierarchical_memory)

agent.chat("Please remember my User ID is AX-7890.")
agent.chat("Let's chat about the weather. It's very sunny today.")
agent.chat("I'm planning to go for a walk later.")
agent.chat("I need to log into my account, can you remind me of my ID?")

輸出

==== NEW INTERACTION ====
User: Please remember my User ID is AX-7890.  
--- [Hierarchical Memory: Promoting message to long-term storage.] ---  
Bot: You have provided your User ID as AX-7890, which has been stored in long-term memory for future reference.  
...

==== NEW INTERACTION ====
User: I need to log into my account, can you remind me of my ID?  
--- Agent Debug Info ---  
[Full Prompt Sent to LLM]:  
---  
SYSTEM: You are a helpful AI assistant.  
USER: ### MEMORY CONTEXT  
### Retrieved Long-Term Memories:  
### Relevant Information Retrieved from Memory:  
User said: Please remember my User ID is AX-7890.  
...  
### Recent Conversation (Working Memory):  
User: Let's chat about the weather...  
User: I'm planning to go for a walk later...  

Bot: Your User ID is AX-7890. You can use this to log into your account. Is there anything else I can assist you with?  
>>>> Tokens: 452 | Response Time: 2.06s

代理成功結(jié)合不同memory類型,使用快速working memory維持對(duì)話流,查詢long-term memory檢索關(guān)鍵User ID。

Graph Based Optimization

之前memory以文本塊存儲(chǔ),無論是完整對(duì)話、summary還是檢索document。如果代理能理解信息間的關(guān)系呢?這就是Graph-Based Memory的飛躍。

此策略將信息表示為knowledge graph

?Nodes (Entities):對(duì)話中的“事物”,如人(Clara)、公司(FutureScape)、概念(Project Odyssey)。

?Edges (Relations):描述nodes關(guān)系的連接,如works_for、is_based_in、manages。

結(jié)果是結(jié)構(gòu)化的網(wǎng)狀memory。例如,不是簡(jiǎn)單事實(shí)“Clara works for FutureScape”,而是存儲(chǔ)連接:(Clara) --[works_for]--> (FutureScape)。

AI總是忘事?教你9招,讓智能體“記性”變超強(qiáng)!-AI.x社區(qū)

這對(duì)于回答需要推理關(guān)系的復(fù)雜查詢非常強(qiáng)大。挑戰(zhàn)在于從非結(jié)構(gòu)化對(duì)話填充graph。我們使用LLM提取結(jié)構(gòu)化(Subject, Relation, Object)三元組。

實(shí)現(xiàn),使用networkx庫:

import networkx as nx
import re

class GraphMemory(BaseMemoryStrategy):
    def __init__(self):
        """初始化memory,包含空的NetworkX有向圖。"""
        self.graph = nx.DiGraph()

    def _extract_triples(self, text: str) -> list[tuple[str, str, str]]:
        """使用LLM從文本提取(Subject, Relation, Object)三元組。"""
        print("--- [Graph Memory: Attempting to extract triples from text.] ---")
        extraction_prompt = (
            f"You are a knowledge extraction engine. Your task is to extract Subject-Relation-Object triples from the given text. "
            f"Format your output strictly as a list of Python tuples. For example: [('Sam', 'works_for', 'Innovatech'), ('Innovatech', 'focuses_on', 'Energy')]. "
            f"If no triples are found, return an empty list [].\n\n"
            f"Text to analyze:\n\"""{text}\""""
        )
        
        response_text = generate_text("You are an expert knowledge graph extractor.", extraction_prompt)
        
        try:
            found_triples = re.findall(r"\(['\"](.*?)['\"],\s*['\"](.*?)['\"],\s*['\"](.*?)['\"]\)", response_text)
            print(f"--- [Graph Memory: Extracted triples: {found_triples}] ---")
            return found_triples
        except Exception as e:
            print(f"Could not parse triples from LLM response: {e}")
            return []

    def add_message(self, user_input: str, ai_response: str):
        """從最新對(duì)話回合提取三元組并添加到knowledge graph。"""
        full_text = f"User: {user_input}\nAI: {ai_response}"
        triples = self._extract_triples(full_text)
        for subject, relation, obj in triples:
            self.graph.add_edge(subject.strip(), obj.strip(), relatinotallow=relation.strip())

    def get_context(self, query: str) -> str:
        """通過查詢中的實(shí)體查找graph,返回所有已知關(guān)系。"""
        if not self.graph.nodes:
            return "The knowledge graph is empty."
        
        query_entities = [word.capitalize() for word in query.replace('?','').split() if word.capitalize() in self.graph.nodes]
        
        if not query_entities:
            return "No relevant entities from your query were found in the knowledge graph."
        
        context_parts = []
        for entity in set(query_entities):
            for u, v, data in self.graph.out_edges(entity, data=True):
                context_parts.append(f"{u} --[{data['relation']}]--> {v}")
            for u, v, data in self.graph.in_edges(entity, data=True):
                context_parts.append(f"{u} --[{data['relation']}]--> {v}")
        
        return "### Facts Retrieved from Knowledge Graph:\n" + "\n".join(sorted(list(set(context_parts))))

代碼解析:

?_extract_triples(…):策略核心,將對(duì)話文本發(fā)送給LLM,請(qǐng)求結(jié)構(gòu)化數(shù)據(jù)。

?add_message(…):調(diào)用**_extract_triples**,將三元組添加到networkx graph

?get_context(…):搜索查詢中的實(shí)體,檢索所有相關(guān)關(guān)系作為結(jié)構(gòu)化上下文。

測(cè)試:

graph_memory = GraphMemory()
agent = AIAgent(memory_strategy=graph_memory)

agent.chat("A person named Clara works for a company called 'FutureScape'.")
agent.chat("FutureScape is based in Berlin.")
agent.chat("Clara's main project is named 'Odyssey'.")
agent.chat("Tell me about Clara's project.")

輸出

==== NEW INTERACTION ====
User: A person named Clara works for a company called 'FutureScape'.  
--- [Graph Memory: Attempting to extract triples from text.] ---  
--- [Graph Memory: Extracted triples: [('Clara', 'works_for', 'FutureScape')]] ---  
Bot: Understood. I've added the fact that Clara works for FutureScape to my knowledge graph.  
...

==== NEW INTERACTION ====
User: Clara's main project is named 'Odyssey'.  
--- [Graph Memory: Attempting to extract triples from text.] ---  
--- [Graph Memory: Extracted triples: [('Clara', 'manages_project', 'Odyssey')]] ---  
Bot: Got it. I've noted that Clara's main project is Odyssey.  

==== NEW INTERACTION ====
User: Tell me about Clara's project.  
--- Agent Debug Info ---  
[Full Prompt Sent to LLM]:  
---  
SYSTEM: You are a helpful AI assistant.  
USER: ### MEMORY CONTEXT  
### Facts Retrieved from Knowledge Graph:  
Clara --[manages_project]--> Odyssey  
Clara --[works_for]--> FutureScape  
...  

Bot: Based on my knowledge graph, Clara's main project is named 'Odyssey', and Clara works for the company FutureScape.  
>>>> Tokens: 78 | Response Time: 1.5s

代理通過導(dǎo)航內(nèi)部graph提供所有相關(guān)事實(shí),適合構(gòu)建高知識(shí)專家代理。

Compression & Consolidation Memory

summarization管理長(zhǎng)對(duì)話效果不錯(cuò),但能否更激進(jìn)地降低token使用?這就是Compression & Consolidation Memory,像是summarization的更強(qiáng)版本。

目標(biāo)是將每條信息提煉為最密集的事實(shí)表示,例如將冗長(zhǎng)會(huì)議記錄轉(zhuǎn)化為簡(jiǎn)潔的單句要點(diǎn)。

AI總是忘事?教你9招,讓智能體“記性”變超強(qiáng)!-AI.x社區(qū)

Compression Approach流程:

1. 每回合(用戶輸入+AI響應(yīng))發(fā)送給LLM。

2. 使用特定prompt要求LLM作為“data compression engine”。

3. LLM將回合重寫為單一、必要語句,剔除寒暄、禮貌用語等。

4. 壓縮事實(shí)存儲(chǔ)在簡(jiǎn)單列表中。

5. 代理的memory成為高效的核心事實(shí)列表,token效率極高。

實(shí)現(xiàn):

class CompressionMemory(BaseMemoryStrategy):
    def __init__(self):
        """初始化memory,包含空的compressed facts列表。"""
        self.compressed_facts = []

    def add_message(self, user_input: str, ai_response: str):
        """使用LLM將最新回合壓縮為簡(jiǎn)潔事實(shí)語句。"""
        text_to_compress = f"User: {user_input}\nAI: {ai_response}"
        
        compression_prompt = (
            f"You are a data compression engine. Your task is to distill the following text into its most essential, factual statement. "
            f"Be as concise as possible, removing all conversational fluff. For example, 'User asked about my name and I, the AI, responded that my name is an AI assistant' should become 'User asked for AI's name.'\n\n"
            f"Text to compress:\n\"{text_to_compress}\""
        )
        
        compressed_fact = generate_text("You are an expert data compressor.", compression_prompt)
        print(f"--- [Compression Memory: New fact stored: '{compressed_fact}'] ---")
        self.compressed_facts.append(compressed_fact)

    def get_context(self, query: str) -> str:
        """返回所有compressed facts列表,格式為項(xiàng)目符號(hào)列表。"""
        if not self.compressed_facts:
            return "No compressed facts in memory."
        return "### Compressed Factual Memory:\n- " + "\n- ".join(self.compressed_facts)

代碼解析:

?init(...):創(chuàng)建空的compressed_facts列表。

?add_message(...):將回合發(fā)送給LLM,用compression prompt存儲(chǔ)簡(jiǎn)潔結(jié)果。

?get_context(...):將compressed facts格式化為簡(jiǎn)潔的項(xiàng)目符號(hào)列表。

測(cè)試:

compression_memory = CompressionMemory()
agent = AIAgent(memory_strategy=compression_memory)

agent.chat("Okay, I've decided on the venue for the conference. It's going to be the 'Metropolitan Convention Center'.")
agent.chat("The date is confirmed for October 26th, 2025.")
agent.chat("Could you please summarize the key details for the conference plan?")

輸出

==== NEW INTERACTION ====
User: Okay, I've decided on the venue for the conference. It's going to be the 'Metropolitan Convention Center'.  
--- [Compression Memory: New fact stored: 'The conference venue has been decided as the 'Metropolitan Convention Center'.'] ---  
Bot: Great! The Metropolitan Convention Center is an excellent choice. What's next on our planning list?  
...

==== NEW INTERACTION ====
User: The date is confirmed for October 26th, 2025.  
--- [Compression Memory: New fact stored: 'The conference date is confirmed for October 26th, 2025.'] ---  
Bot: Perfect, I've noted the date.  
...

==== NEW INTERACTION ====
User: Could you please summarize the key details for the conference plan?  
--- Agent Debug Info ---  
[Full Prompt Sent to LLM]:  
---  
SYSTEM: You are a helpful AI assistant.  
USER: ### MEMORY CONTEXT  
### Compressed Factual Memory:  
- The conference venue has been decided as the 'Metropolitan Convention Center'.  
- The conference date is confirmed for October 26th, 2025.  
...  

Bot: Of course. Based on my notes, here are the key details for the conference plan:  
- **Venue:** Metropolitan Convention Center  
- **Date:** October 26th, 2025  
>>>> Tokens: 48 | Response Time: 1.2s

此策略極大降低token數(shù),保留核心事實(shí),適合需要長(zhǎng)期事實(shí)召回且token預(yù)算緊張的應(yīng)用。但對(duì)依賴細(xì)微語氣和個(gè)性的對(duì)話,壓縮可能過激。

OS-Like Memory Management

如果為代理構(gòu)建一個(gè)像計(jì)算機(jī)memory一樣的系統(tǒng)呢?

AI總是忘事?教你9招,讓智能體“記性”變超強(qiáng)!-AI.x社區(qū)

此高級(jí)概念借鑒計(jì)算機(jī)Operating System管理RAMhard disk的方式:

  • RAM:計(jì)算機(jī)用于活動(dòng)程序的超快memory,昂貴且容量有限。代理的LLM context windowRAM,訪問快但大小受限。
  • Hard Disk:長(zhǎng)期存儲(chǔ),容量大且便宜,但訪問慢。代理可將其視為外部數(shù)據(jù)庫或文件,存儲(chǔ)舊對(duì)話歷史。

OS-Like Memory Management流程:

  • Active Memory (RAM):最近對(duì)話回合保存在快速訪問的buffer中。
  • Passive Memory (Disk)active memory滿時(shí),最舊信息移到長(zhǎng)期存儲(chǔ),稱為“paging out”。
  • Page Fault:用戶詢問不在active memory的信息時(shí),發(fā)生“page fault”。
  • 系統(tǒng)從passive storage查找相關(guān)信息,加載到active contextLLM使用,稱為“paging in”。

實(shí)現(xiàn),模擬active_memorydeque)和passive_memorydictionary):

class OSMemory(BaseMemoryStrategy):
    def __init__(self, ram_size: int = 2):
        """
        初始化OS-like memory系統(tǒng)。

        參數(shù):
            ram_size: active memory (RAM)保留的最大對(duì)話回合數(shù)。
        """
        self.ram_size = ram_size
        self.active_memory = deque()
        self.passive_memory = {}
        self.turn_count = 0

    def add_message(self, user_input: str, ai_response: str):
        """添加回合到active memory,RAM滿時(shí)將最舊回合paging out到passive memory。"""
        turn_id = self.turn_count
        turn_data = f"User: {user_input}\nAI: {ai_response}"
        
        if len(self.active_memory) >= self.ram_size:
            lru_turn_id, lru_turn_data = self.active_memory.popleft()
            self.passive_memory[lru_turn_id] = lru_turn_data
            print(f"--- [OS Memory: Paging out Turn {lru_turn_id} to passive storage.] ---")
        
        self.active_memory.append((turn_id, turn_data))
        self.turn_count += 1

    def get_context(self, query: str) -> str:
        """提供RAM上下文,模擬page fault從passive memory拉取數(shù)據(jù)。"""
        active_context = "\n".join([data for _, data in self.active_memory])
        
        paged_in_context = ""
        for turn_id, data in self.passive_memory.items():
            if any(word in data.lower() for word in query.lower().split() if len(word) > 3):
                paged_in_context += f"\n(Paged in from Turn {turn_id}): {data}"
                print(f"--- [OS Memory: Page fault! Paging in Turn {turn_id} from passive storage.] ---")
        
        return f"### Active Memory (RAM):\n{active_context}\n\n### Paged-In from Passive Memory (Disk):\n{paged_in_context}"

    def clear(self):
        """清空active和passive memory。"""
        self.active_memory.clear()
        self.passive_memory = {}
        self.turn_count = 0
        print("OS-like memory cleared.")

代碼解析:

  • init(...):設(shè)置固定大小的active_memory deque和空的passive_memory dictionary。
  • add_message(...):添加新回合到active_memory,滿時(shí)將最舊回合popleft()移到passive_memorypaging out)。
  • get_context(...):包含active_memory,搜索passive_memory,匹配查詢時(shí)paging in數(shù)據(jù)到上下文。

測(cè)試,代理被告知秘密代碼,強(qiáng)制其paging outpassive memory,然后詢問代碼:

os_memory = OSMemory(ram_size=2)
agent = AIAgent(memory_strategy=os_memory)

agent.chat("The secret launch code is 'Orion-Delta-7'.")
agent.chat("The weather for the launch looks clear.")
agent.chat("The launch window opens at 0400 Zulu.")
agent.chat("I need to confirm the launch code.")

輸出

...

==== NEW INTERACTION ====
User: The launch window opens at 0400 Zulu.  
--- [OS Memory: Paging out Turn 0 to passive storage.] ---  
Bot: PROCESSING NEW LAUNCH WINDOW INFORMATION...  
...

==== NEW INTERACTION ====
User: I need to confirm the launch code.  
--- [OS Memory: Page fault! Paging in Turn 0 from passive storage.] ---  
--- Agent Debug Info ---  
[Full Prompt Sent to LLM]:  
---  
SYSTEM: You are a helpful AI assistant.  
USER: ### MEMORY CONTEXT  
### Active Memory (RAM):  
User: The weather for the launch looks clear.  
...  
User: The launch window opens at 0400 Zulu.  
...  
### Paged-In from Passive Memory (Disk):  
(Paged in from Turn 0): User: The secret launch code is 'Orion-Delta-7'.  
...  

Bot: CONFIRMING LAUNCH CODE: The stored secret launch code is 'Orion-Delta-7'.  
>>>> Tokens: 539 | Response Time: 2.56s

完美運(yùn)行!代理成功將舊數(shù)據(jù)移到passive storage,僅在查詢需要時(shí)智能檢索。

此模型適合構(gòu)建幾乎無限memory的大規(guī)模系統(tǒng),同時(shí)保持active context小而快。

選擇合適的策略

我們探討了九種不同的memory optimization策略,從簡(jiǎn)單到復(fù)雜。沒有單一“最佳”策略,選擇需平衡代理需求、預(yù)算和工程資源。

何時(shí)選擇什么?

  • 簡(jiǎn)單、短生命周期botsSequentialSliding Window簡(jiǎn)單易實(shí)現(xiàn),效果好。
  • 長(zhǎng)、創(chuàng)意對(duì)話Summarization維持對(duì)話流,減少token開銷。
  • 需要精確長(zhǎng)期召回的代理Retrieval-Based memory是行業(yè)標(biāo)準(zhǔn),強(qiáng)大且可擴(kuò)展,RAG應(yīng)用的基石。
  • 高可靠性個(gè)人助手Memory-AugmentedHierarchical方法分離關(guān)鍵事實(shí)和對(duì)話雜音。
  • 專家系統(tǒng)和知識(shí)庫Graph-Based memory在推理數(shù)據(jù)點(diǎn)關(guān)系方面無與倫比。

生產(chǎn)中最強(qiáng)大的代理通常使用混合方法,結(jié)合這些技術(shù)。你可能使用hierarchical system,long-term memory結(jié)合vector databaseknowledge graph。

關(guān)鍵是明確代理需要記住什么、多久、精度如何。掌握這些memory strategies,你能超越簡(jiǎn)單chatbots,打造真正智能的代理,隨時(shí)間學(xué)習(xí)、記憶、表現(xiàn)更好。


本文轉(zhuǎn)載自??AI大模型觀察站??,作者:AI大模型觀察站

?著作權(quán)歸作者所有,如需轉(zhuǎn)載,請(qǐng)注明出處,否則將追究法律責(zé)任
已于2025-7-18 14:36:48修改
收藏
回復(fù)
舉報(bào)
回復(fù)
相關(guān)推薦