偷偷摘套内射激情视频,久久精品99国产国产精,中文字幕无线乱码人妻,中文在线中文a,性爽19p

使用CLIP和LLM構(gòu)建多模態(tài)RAG系統(tǒng)

人工智能
在本文中我們將探討使用開(kāi)源大型語(yǔ)言多模態(tài)模型(Large Language Multi-Modal)構(gòu)建檢索增強(qiáng)生成(RAG)系統(tǒng)。本文的重點(diǎn)是在不依賴LangChain或LLlama index的情況下實(shí)現(xiàn)這一目標(biāo),這樣可以避免更多的框架依賴。

在本文中我們將探討使用開(kāi)源大型語(yǔ)言多模態(tài)模型(Large Language Multi-Modal)構(gòu)建檢索增強(qiáng)生成(RAG)系統(tǒng)。本文的重點(diǎn)是在不依賴LangChain或LLlama index的情況下實(shí)現(xiàn)這一目標(biāo),這樣可以避免更多的框架依賴。

什么是RAG

在人工智能領(lǐng)域,檢索增強(qiáng)生成(retrieve - augmented Generation, RAG)作為一種變革性技術(shù)改進(jìn)了大型語(yǔ)言模型(Large Language Models)的能力。從本質(zhì)上講,RAG通過(guò)允許模型從外部源動(dòng)態(tài)檢索實(shí)時(shí)信息來(lái)增強(qiáng)AI響應(yīng)的特異性。

該體系結(jié)構(gòu)將生成能力與動(dòng)態(tài)檢索過(guò)程無(wú)縫結(jié)合,使人工智能能夠適應(yīng)不同領(lǐng)域中不斷變化的信息。與微調(diào)和再訓(xùn)練不同,RAG提供了一種經(jīng)濟(jì)高效的解決方案,允許人工智能在不改變整個(gè)模型的情況下能夠得到最新和相關(guān)的信息。

RAG的作用

1、提高準(zhǔn)確性和可靠性

通過(guò)將大型語(yǔ)言模型(llm)重定向到權(quán)威的知識(shí)來(lái)源來(lái)解決它們的不可預(yù)測(cè)性。降低了提供虛假或過(guò)時(shí)信息的風(fēng)險(xiǎn),確保更準(zhǔn)確和可靠的反應(yīng)。

2、增加透明度和信任

像LLM這樣的生成式人工智能模型往往缺乏透明度,這使得人們很難相信它們的輸出。RAG通過(guò)允許組織對(duì)生成的文本輸出有更大的控制,解決了對(duì)偏差、可靠性和遵從性的關(guān)注。

3、減輕幻覺(jué)

LLM容易產(chǎn)生幻覺(jué)反應(yīng)——連貫但不準(zhǔn)確或捏造的信息。RAG通過(guò)確保響應(yīng)以權(quán)威來(lái)源為基礎(chǔ),減少關(guān)鍵部門(mén)誤導(dǎo)性建議的風(fēng)險(xiǎn)。

4、具有成本效益的適應(yīng)性

RAG提供了一種經(jīng)濟(jì)有效的方法來(lái)提高AI輸出,而不需要廣泛的再訓(xùn)練/微調(diào)??梢酝ㄟ^(guò)根據(jù)需要?jiǎng)討B(tài)獲取特定細(xì)節(jié)來(lái)保持最新和相關(guān)的信息,確保人工智能對(duì)不斷變化的信息的適應(yīng)性。

多模式模態(tài)模型

多模態(tài)涉及有多個(gè)輸入,并將其結(jié)合成單個(gè)輸出,以CLIP為例:CLIP的訓(xùn)練數(shù)據(jù)是文本-圖像對(duì),通過(guò)對(duì)比學(xué)習(xí),模型能夠?qū)W習(xí)到文本-圖像對(duì)的匹配關(guān)系。

該模型為表示相同事物的不同輸入生成相同(非常相似)的嵌入向量。


多模態(tài)大型語(yǔ)言(multi-modal large language)

GPT4v和Gemini vision就是探索集成了各種數(shù)據(jù)類型(包括圖像、文本、語(yǔ)言、音頻等)的多模態(tài)語(yǔ)言模型(MLLM)。雖然像GPT-3、BERT和RoBERTa這樣的大型語(yǔ)言模型(llm)在基于文本的任務(wù)中表現(xiàn)出色,但它們?cè)诶斫夂吞幚砥渌麛?shù)據(jù)類型方面面臨挑戰(zhàn)。為了解決這一限制,多模態(tài)模型結(jié)合了不同的模態(tài),從而能夠更全面地理解不同的數(shù)據(jù)。

多模態(tài)大語(yǔ)言模型它超越了傳統(tǒng)的基于文本的方法。以GPT-4為例,這些模型可以無(wú)縫地處理各種數(shù)據(jù)類型,包括圖像和文本,從而更全面地理解信息。

與RAG相結(jié)合

這里我們將使用Clip嵌入圖像和文本,將這些嵌入存儲(chǔ)在ChromDB矢量數(shù)據(jù)庫(kù)中。然后將利用大模型根據(jù)檢索到的信息參與用戶聊天會(huì)話。


我們將使用來(lái)自Kaggle的圖片和維基百科的信息來(lái)創(chuàng)建一個(gè)花卉專家聊天機(jī)器人。

首先我們安裝軟件包:

! pip install -q timm einops wikipedia chromadb open_clip_torch
 !pip install -q transformers==4.36.0
 !pip install -q bitsandbytes==0.41.3 accelerate==0.25.0

預(yù)處理數(shù)據(jù)的步驟很簡(jiǎn)單只是把圖像和文本放在一個(gè)文件夾里。

可以隨意使用任何矢量數(shù)據(jù)庫(kù),這里我們使用ChromaDB。

import chromadb
 
 from chromadb.utils.embedding_functions import OpenCLIPEmbeddingFunction
 from chromadb.utils.data_loaders import ImageLoader
 from chromadb.config import Settings
 
 
 client = chromadb.PersistentClient(path="DB")
 
 embedding_function = OpenCLIPEmbeddingFunction()
 image_loader = ImageLoader() # must be if you reads from URIs

ChromaDB需要自定義嵌入函數(shù)。

from chromadb import Documents, EmbeddingFunction, Embeddings
 
 class MyEmbeddingFunction(EmbeddingFunction):
    def __call__(self, input: Documents) -> Embeddings:
        # embed the documents somehow or images
        return embeddings

這里將創(chuàng)建2個(gè)集合,一個(gè)用于文本,另一個(gè)用于圖像。

collection_images = client.create_collection(
    name='multimodal_collection_images', 
    embedding_functinotallow=embedding_function, 
    data_loader=image_loader)
 
 collection_text = client.create_collection(
    name='multimodal_collection_text', 
    embedding_functinotallow=embedding_function, 
    )
 
 # Get the Images
 IMAGE_FOLDER = '/kaggle/working/all_data'
 
 
 image_uris = sorted([os.path.join(IMAGE_FOLDER, image_name) for image_name in os.listdir(IMAGE_FOLDER) if not image_name.endswith('.txt')])
 ids = [str(i) for i in range(len(image_uris))]
 
 collection_images.add(ids=ids, uris=image_uris) #now we have the images collection

對(duì)于Clip,我們可以像這樣使用文本檢索圖像。

from matplotlib import pyplot as plt
 
 retrieved = collection_images.query(query_texts=["tulip"], include=['data'], n_results=3)
 for img in retrieved['data'][0]:
    plt.imshow(img)
    plt.axis("off")
    plt.show()

也可以使用圖像檢索相關(guān)的圖像。

文本集合如下所示:

# now the text DB
 from chromadb.utils import embedding_functions
 default_ef = embedding_functions.DefaultEmbeddingFunction()
 
 text_pth = sorted([os.path.join(IMAGE_FOLDER, image_name) for image_name in os.listdir(IMAGE_FOLDER) if image_name.endswith('.txt')])
 
 list_of_text = []
 for text in text_pth:
    with open(text, 'r') as f:
        text = f.read()
        list_of_text.append(text)
 
 ids_txt_list = ['id'+str(i) for i in range(len(list_of_text))]
 ids_txt_list
 
 collection_text.add(
    documents = list_of_text,
    ids =ids_txt_list
 )

然后使用上面的文本集合獲取嵌入。

results = collection_text.query(
    query_texts=["What is the bellflower?"],
    n_results=1
 )
 
 results

結(jié)果如下:

{'ids': [['id0']],
  'distances': [[0.6072186183744086]],
  'metadatas': [[None]],
  'embeddings': None,
  'documents': [['Campanula () is the type genus of the Campanulaceae family of flowering plants. Campanula are commonly known as bellflowers and take both their common and scientific names from the bell-shaped flowers—campanula is Latin for "little bell".\nThe genus includes over 500 species and several subspecies, distributed across the temperate and subtropical regions of the Northern Hemisphere, with centers of diversity in the Mediterranean region, Balkans, Caucasus and mountains of western Asia. The range also extends into mountains in tropical regions of Asia and Africa.\nThe species include annual, biennial and perennial plants, and vary in habit from dwarf arctic and alpine species under 5 cm high, to large temperate grassland and woodland species growing to 2 metres (6 ft 7 in) tall.']],
  'uris': None,
  'data': None}

或使用圖片獲取文本。

query_image = '/kaggle/input/flowers/flowers/rose/00f6e89a2f949f8165d5222955a5a37d.jpg'
 raw_image = Image.open(query_image)
 
 doc = collection_text.query(
    query_embeddings=embedding_function(query_image),
     
    n_results=1,
         
 )['documents'][0][0]

上圖的結(jié)果如下:

A rose is either a woody perennial flowering plant of the genus Rosa (), in the family Rosaceae (), or the flower it bears. There are over three hundred species and tens of thousands of cultivars. They form a group of plants that can be erect shrubs, climbing, or trailing, with stems that are often armed with sharp prickles. Their flowers vary in size and shape and are usually large and showy, in colours ranging from white through yellows and reds. Most species are native to Asia, with smaller numbers native to Europe, North America, and northwestern Africa. Species, cultivars and hybrids are all widely grown for their beauty and often are fragrant. Roses have acquired cultural significance in many societies. Rose plants range in size from compact, miniature roses, to climbers that can reach seven meters in height. Different species hybridize easily, and this has been used in the development of the wide range of garden roses.

這樣我們就完成了文本和圖像的匹配工作,其實(shí)這里都是CLIP的工作,下面我們開(kāi)始加入LLM。

from huggingface_hub import hf_hub_download
 
 hf_hub_download(repo_id="visheratin/LLaVA-3b", filename="configuration_llava.py", local_dir="./", force_download=True)
 hf_hub_download(repo_id="visheratin/LLaVA-3b", filename="configuration_phi.py", local_dir="./", force_download=True)
 hf_hub_download(repo_id="visheratin/LLaVA-3b", filename="modeling_llava.py", local_dir="./", force_download=True)
 hf_hub_download(repo_id="visheratin/LLaVA-3b", filename="modeling_phi.py", local_dir="./", force_download=True)
 hf_hub_download(repo_id="visheratin/LLaVA-3b", filename="processing_llava.py", local_dir="./", force_download=True)

我們是用visheratin/LLaVA-3b。

from modeling_llava import LlavaForConditionalGeneration
 import torch
 
 model = LlavaForConditionalGeneration.from_pretrained("visheratin/LLaVA-3b")
 model = model.to("cuda")

加載tokenizer。

from transformers import AutoTokenizer
 
 tokenizer = AutoTokenizer.from_pretrained("visheratin/LLaVA-3b")

然后定義處理器,方便我們以后調(diào)用。

from processing_llava import LlavaProcessor, OpenCLIPImageProcessor
 
 image_processor = OpenCLIPImageProcessor(model.config.preprocess_config)
 processor = LlavaProcessor(image_processor, tokenizer)

下面就可以直接使用了。

question = 'Answer with organized answers: What type of rose is in the picture? Mention some of its characteristics and how to take care of it ?'
 
 query_image = '/kaggle/input/flowers/flowers/rose/00f6e89a2f949f8165d5222955a5a37d.jpg'
 raw_image = Image.open(query_image)
 
 doc = collection_text.query(
    query_embeddings=embedding_function(query_image),
     
    n_results=1,
         
 )['documents'][0][0]
 
 plt.imshow(raw_image)
 plt.show()
 imgs = collection_images.query(query_uris=query_image, include=['data'], n_results=3)
 for img in imgs['data'][0][1:]:
    plt.imshow(img)
    plt.axis("off")
    plt.show()

得到的結(jié)果如下:

結(jié)果還包含了我們需要的大部分信息。

這樣我們整合就完成了,最后就是創(chuàng)建聊天模板。

prompt = """<|im_start|>system
 A chat between a curious human and an artificial intelligence assistant.
 The assistant is an exprt in flowers , and gives helpful, detailed, and polite answers to the human's questions.
 The assistant does not hallucinate and pays very close attention to the details.<|im_end|>
 <|im_start|>user
 <image>
 {question} Use the following article as an answer source. Do not write outside its scope unless you find your answer better {article} if you thin your answer is better add it after document.<|im_end|>
 <|im_start|>assistant
 """.format(questinotallow='question', article=doc)

如何創(chuàng)建聊天過(guò)程我們這里就不詳細(xì)介紹了,完整代碼在這里:

https://github.com/nadsoft-opensource/RAG-with-open-source-multi-modal

責(zé)任編輯:華軒 來(lái)源: DeepHub IMBA
相關(guān)推薦

2025-01-08 08:21:16

2024-12-06 08:20:26

2023-10-31 16:37:55

大型語(yǔ)言模型人工智能

2024-12-16 07:00:00

2024-10-29 11:54:25

2024-12-17 08:05:34

大型語(yǔ)言模型MetaAILLM

2025-06-09 08:42:23

2024-12-18 18:57:58

2024-08-08 13:04:28

2024-04-30 09:48:33

LLMRAG人工智能

2025-01-02 08:36:25

多模態(tài)RAG深度學(xué)習(xí)自然語(yǔ)言處理

2025-06-26 15:11:41

AI模型自動(dòng)化

2024-11-27 14:00:00

模型訓(xùn)練

2025-05-26 09:49:59

多模態(tài)智能體RAG

2025-05-06 08:40:00

2024-10-07 08:49:25

2025-05-26 09:57:46

2025-04-07 05:30:00

2023-12-22 08:00:00

2025-04-22 07:00:00

點(diǎn)贊
收藏

51CTO技術(shù)棧公眾號(hào)