偷偷摘套内射激情视频,久久精品99国产国产精,中文字幕无线乱码人妻,中文在线中文a,性爽19p

<abbr id="rjyt3"></abbr>

51CTO首頁(yè)

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開(kāi)發(fā)者社區(qū)

信創(chuàng)認(rèn)證

公眾號(hào)矩陣

移動(dòng)端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考信創(chuàng)認(rèn)證華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫(kù)

在線學(xué)習(xí)

文章資源問(wèn)答課堂專(zhuān)欄直播

51CTO

鴻蒙開(kāi)發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營(yíng)

鴻蒙開(kāi)發(fā)者社區(qū)訂閱號(hào)

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開(kāi)發(fā)者社區(qū)視頻號(hào)

51CTO軟考題庫(kù)

賬號(hào)設(shè)置退出

MetaMath：新數(shù)學(xué)推理語(yǔ)言模型，訓(xùn)練大模型的逆向思維

作者：機(jī)器之心 2023-10-11 12:32:53

人工智能新聞

為了提升模型的正向和逆向推理能力，劍橋、港科大、華為的研究者基于兩個(gè)常用的數(shù)學(xué)數(shù)據(jù)集（GSM8K 和 MATH）提出了 MetaMathQA 數(shù)據(jù)集：一個(gè)覆蓋面廣、質(zhì)量高的數(shù)學(xué)推理數(shù)據(jù)集。MetaMathQA 由 395K 個(gè)大語(yǔ)言模型生成的正向逆向數(shù)學(xué)問(wèn)答對(duì)組成。

復(fù)雜數(shù)學(xué)推理是評(píng)價(jià)大語(yǔ)言模型推理能力的重要指標(biāo)，目前常用的數(shù)學(xué)推理數(shù)據(jù)集樣本量有限且問(wèn)題多樣性不足，導(dǎo)致大語(yǔ)言模型存在 [逆轉(zhuǎn)詛咒] 的現(xiàn)象，即一個(gè)訓(xùn)練于「A 是 B」的語(yǔ)言模型無(wú)法推廣到「B 是 A」[1]。此現(xiàn)象在數(shù)學(xué)推理任務(wù)中的具體形式是：即給定一個(gè)數(shù)學(xué)問(wèn)題，語(yǔ)言模型擅于用正向推理解答問(wèn)題但缺乏逆向推理解決問(wèn)題的能力。逆向推理在數(shù)學(xué)問(wèn)題中十分常見(jiàn)，如下 2 個(gè)例子。

1. 經(jīng)典問(wèn)題 - 雞兔同籠

正向推理：籠子里有 23 只雞和 12 只兔，問(wèn)籠子里有多少個(gè)頭和多少只腳？
逆向推理：有若干只雞兔同在一個(gè)籠子里，從上面數(shù)，有 35 個(gè)頭，從下面數(shù)，有 94 只腳。問(wèn)籠中各有多少只雞和兔？

2. GSM8K 問(wèn)題

正向推理: James buys 5 packs of beef that are 4 pounds each. The price of beef is $5.50 per pound. How much did he pay?
逆向推理: James buys x packs of beef that are 4 pounds each. The price of beef is $5.50 per pound. How much did he pay? If we know the answer to the above question is 110, what is the value of unknown variable x?

為了提升模型的正向和逆向推理能力，劍橋、港科大、華為的研究者基于兩個(gè)常用的數(shù)學(xué)數(shù)據(jù)集（GSM8K 和 MATH）提出了 MetaMathQA 數(shù)據(jù)集：一個(gè)覆蓋面廣、質(zhì)量高的數(shù)學(xué)推理數(shù)據(jù)集。MetaMathQA 由 395K 個(gè)大語(yǔ)言模型生成的正向逆向數(shù)學(xué)問(wèn)答對(duì)組成。他們?cè)?MetaMathQA 數(shù)據(jù)集上基于 LLaMA-2 微調(diào)得到專(zhuān)注于數(shù)學(xué)推理 (正向和逆向) 的大語(yǔ)言模型 MetaMath，在數(shù)學(xué)推理數(shù)據(jù)集上達(dá)到了 SOTA。MetaMathQA 數(shù)據(jù)集和不同規(guī)模的 MetaMath 模型已開(kāi)源供研究人員使用。

項(xiàng)目地址：https://meta-math.github.io/
論文地址：https://arxiv.org/abs/2309.12284
數(shù)據(jù)地址：https://huggingface.co/datasets/meta-math/MetaMathQA
模型地址：https://huggingface.co/meta-math
代碼地址：https://github.com/meta-math/MetaMath

本文中構(gòu)造了一個(gè)逆向推理數(shù)據(jù)集 GSM8K-Backward。實(shí)驗(yàn)結(jié)果表明，目前的方法（e.g., SFT、RFT、WizardMath）在逆向推理問(wèn)題中表現(xiàn)遠(yuǎn)差于正向推理。相較而言，MetaMath 模型在正向和反向推理上都取得了優(yōu)越的性能。

方法

MetaMathQA 包含四種數(shù)據(jù)增強(qiáng)的方法：

1. Answer Augmentation (答案增強(qiáng))：給定問(wèn)題，通過(guò)大語(yǔ)言模型生成能得到正確結(jié)果的思維鏈作為數(shù)據(jù)增廣。

Question: James buys 5 packs of beef that are 4 pounds each. The price of beef is $5.50 per pound. How much did he pay?
Answer: James buys 5 packs of beef that are 4 pounds each, so he buys a total of 5 * 4 = 20 pounds of beef. The price of beef is $5.50 per pound, so he pays 20 * $5.50 = $110. The answer is: 110.

2. Rephrasing Question (問(wèn)題改寫(xiě)增強(qiáng))：給定元問(wèn)題，通過(guò)大語(yǔ)言模型重寫(xiě)問(wèn)題并生成得到正確結(jié)果的思維鏈作為數(shù)據(jù)增廣。

Question: What is the total amount that James paid when he purchased 5 packs of beef, each weighing 4 pounds, at a price of $5.50 per pound?
Answer: Each pack of beef weighs 4 pounds, so 5 packs weigh 4 * 5 = 20 pounds in total. The price per pound of beef is $5.50, so the total cost for 20 pounds is 20 * $5.50 = $110. Therefore, James paid a total of $110. The answer is: 110.

3. FOBAR Question (FOBAR 逆向問(wèn)題增強(qiáng))：給定元問(wèn)題，通過(guò)掩碼條件中的數(shù)字為 x，給定原有答案并反推 x 來(lái)產(chǎn)生逆向問(wèn)題，并基于該逆向問(wèn)題生成正確的思維鏈過(guò)程來(lái)進(jìn)行數(shù)據(jù)增廣（反推樣例：“If we know the answer to the above question is 110, what is the value of unknown variable x?”）。

Question: James buys x packs of beef that are 4 pounds each. The price of beef is $5.50 per pound. How much did he pay? If we know the answer to the above question is 110, what is the value of unknown variable x?
Answer: James buys x packs of beef that are 4 pounds each, so he buys a total of 4x pounds of beef. The price of beef is $5.50 per pound, so the total cost of the beef is 5.50 * 4x = 22x. We are given that the total cost is $110, so we can write: 22x = 110. Dividing both sides by 22, we get: x = 5. The value of x is 5.

4. Self-Verification Question (Self-Verification 逆向問(wèn)題增強(qiáng))：在 FOBAR 的基礎(chǔ)上，通過(guò)大語(yǔ)言模型改寫(xiě)逆向問(wèn)題部分為陳述的語(yǔ)句來(lái)進(jìn)行數(shù)據(jù)增廣（改寫(xiě)樣例：“How much did he pay?” (with the answer 110) 被改寫(xiě)成 “He paid 110”）。

Question: James buys x packs of beef that are 4 pounds each. The price of beef is $5.50 per pound. He paid 110. What is the value of unknown variable x?
Answer: To solve this problem, we need to determine the value of x, which represents the number of packs of beef that James bought. Each pack of beef weighs 4 pounds and costs $5.50 per pound. The total amount James paid is $110. We can set up the equation as follows: Number of packs of beef * Weight per pack * Price per pound = Total amount paid; x * 4 * $5.50 = $110; Let's simplify and solve for x: 4x * $5.50 = $110; 22x = $110; To isolate x, we divide both sides of the equation by 22: 22x / 22 = $110 / 22; x = $5; The value of x is 5.

實(shí)驗(yàn)結(jié)果

在兩個(gè)常見(jiàn)數(shù)學(xué)推理數(shù)據(jù)集（GSM8K 和 MATH）的實(shí)驗(yàn)結(jié)果表明，在不借助外部工具（e.g., code interpreter）, MetaMath 在性能上顯著優(yōu)于已有開(kāi)源 LLM 模型。其中，我們的 MetaMath-7B 模型在 GSM8K 上達(dá)到了 66.5% 的準(zhǔn)確率，在 MATH 上達(dá)到了 19.8% 的準(zhǔn)確率，超過(guò)了相同規(guī)模的最先進(jìn)模型分別 11.6% 和 9.1%。特別地，MetaMath-70B 在 GSM8K 上達(dá)到了 82.3% 的準(zhǔn)確率，超過(guò) GPT-3.5-Turbo。

根據(jù) Superficial Alignment Hypothesis [2]，大語(yǔ)言模型的能力源于預(yù)訓(xùn)練，而來(lái)自下游任務(wù)的數(shù)據(jù)則激活了預(yù)訓(xùn)練期間學(xué)習(xí)到的語(yǔ)言模型的固有能力。由此產(chǎn)生了兩個(gè)重要問(wèn)題：（i）什么類(lèi)型的數(shù)據(jù)可以最有效地激活潛在知識(shí)，以及（ii）為什么一個(gè)數(shù)據(jù)集在這種激活中比另一個(gè)數(shù)據(jù)集更好？

為什么 MetaMathQA 有用？提高了思維鏈數(shù)據(jù)的質(zhì)量 (Perplexity)

如上圖所示，研究者們計(jì)算了 LLaMA-2-7B 模型在僅答案數(shù)據(jù)、GSM8K CoT 以及 MetaMathQA 數(shù)據(jù)的各部分上的 perplexity。MetaMathQA 各部分的 perplexity 顯著低于其他兩個(gè)數(shù)據(jù)集，這突顯了它固有的易學(xué)性，可能更有助于引導(dǎo)出模型的潛在知識(shí)。

為什么 MetaMathQA 有用？增加了思維鏈數(shù)據(jù)的多樣性 (Diversity)

通過(guò)比較數(shù)據(jù)的多樣性增益和模型的準(zhǔn)確率增益，他們發(fā)現(xiàn) Rephrasing、FOBAR 和 SV 添加相同數(shù)量的增廣數(shù)據(jù)帶來(lái)了明顯的多樣性增益，顯著提升了模型準(zhǔn)確率。相比之下，簡(jiǎn)單地使用答案增強(qiáng)會(huì)導(dǎo)致明顯的準(zhǔn)確率飽和。在準(zhǔn)確率飽和之后，增加 AnsAug 數(shù)據(jù)只會(huì)帶來(lái)有限的性能提升。

責(zé)任編輯：張燕妮來(lái)源：機(jī)器之心

點(diǎn)贊

51CTO技術(shù)棧公眾號(hào)

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開(kāi)發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營(yíng)

<form id="6jlpf"></form>