TowardsMixedModalRetrievalforUniversalRetrievalAugmentedGeneration20251020|RUC,RUC??28??http:arxiv.orgabs2510.17354v1??????https:huggingface.copapers2510.17354??????https:github.comSnowNation101Nyx??研究背景與意義問題定義與現(xiàn)狀近年來,檢索增強(qiáng)生成(RetrievalAugmentedGeneration,RAG)成為提升大型語言模型(LLMs)能力的重要方法,主要通過從外部語料庫(kù)檢索相關(guān)文檔來補(bǔ)充模型知識(shí)。然而...
2025-10-24 00:19:43 1101瀏覽 0點(diǎn)贊 0回復(fù) 0收藏
PICABench:HowFarAreWefromPhysicallyRealisticImageEditing20251020|SJTU,ShanghaiAILab,CUHKMMLab,KreaAI,BUAA,AlibabaTongyiLab,USTC,HKU??53??http:arxiv.orgabs2510.17681v1??????https:huggingface.copapers2510.17681??????https:picabench.github.io??研究背景與意義隨著指令驅(qū)動(dòng)的圖像編輯技術(shù)迅速發(fā)展,現(xiàn)代模型已能較好地理解復(fù)雜編輯指令并生成語義連貫的圖像。然而,當(dāng)前主流研究和基準(zhǔn)測(cè)試主要...
2025-10-24 00:16:07 443瀏覽 0點(diǎn)贊 0回復(fù) 0收藏
DeepAnalyze:AgenticLargeLanguageModelsforAutonomousDataScience20251019|RUC,THU??52??http:arxiv.orgabs2510.16872v1??????https:huggingface.copapers2510.16872??????https:github.comrucdatalabDeepAnalyze??研究背景與意義背景簡(jiǎn)述當(dāng)前數(shù)據(jù)科學(xué)領(lǐng)域追求實(shí)現(xiàn)從數(shù)據(jù)源到分析報(bào)告的全流程自動(dòng)化,即“自主數(shù)據(jù)科學(xué)”。這一目標(biāo)旨在減少人工干預(yù),提高數(shù)據(jù)處理和洞察的效率與質(zhì)量。然而,傳統(tǒng)方法多依賴預(yù)...
2025-10-24 00:12:34 745瀏覽 0點(diǎn)贊 0回復(fù) 0收藏
MemMamba:RethinkingMemoryPatternsinStateSpaceModel20250928|RUC,SUFE,GaoLingInstitute,ShanghaiAILab??63??http:arxiv.orgabs2510.03279v1??????https:huggingface.copapers2510.03279??研究背景與意義隨著數(shù)據(jù)量的爆炸式增長(zhǎng),長(zhǎng)序列建模在自然語言處理、生物信息學(xué)等領(lǐng)域變得尤為關(guān)鍵。傳統(tǒng)的循環(huán)神經(jīng)網(wǎng)絡(luò)(RNN)由于梯度消失和爆炸問題,難以擴(kuò)展到超長(zhǎng)序列;而Transformer雖然能捕獲全局依賴,但其計(jì)算復(fù)雜...
2025-10-14 00:07:45 1180瀏覽 0點(diǎn)贊 0回復(fù) 0收藏
AgentLearningviaEarlyExperience20251009|OSU,MetaSuperintelligenceLabs,MetaFAIR??172??http:arxiv.orgabs2510.08558v1??????https:huggingface.copapers2510.08558??研究背景與意義語言智能體旨在通過自身經(jīng)驗(yàn)學(xué)習(xí)和提升,最終在復(fù)雜現(xiàn)實(shí)任務(wù)中超越人類表現(xiàn)。然而,當(dāng)前訓(xùn)練方法面臨諸多挑戰(zhàn)。傳統(tǒng)的監(jiān)督學(xué)習(xí)依賴專家示范數(shù)據(jù),難以擴(kuò)展且泛化能力有限,因示范數(shù)據(jù)覆蓋場(chǎng)景狹窄且缺乏環(huán)境多樣性。強(qiáng)化學(xué)習(xí)雖能優(yōu)...
2025-10-14 00:07:28 1999瀏覽 0點(diǎn)贊 0回復(fù) 0收藏
Videomodelsarezeroshotlearnersandreasoners20250924|GoogleDeepMind??50??http:arxiv.orgabs2509.20328v1??????https:huggingface.copapers2509.20328??????https:videozeroshot.github.io??研究背景與意義背景與現(xiàn)狀近年來,自然語言處理領(lǐng)域經(jīng)歷了從任務(wù)專用模型向大型語言模型(LLMs)轉(zhuǎn)變的革命,LLMs通過大規(guī)模生成模型和海量網(wǎng)絡(luò)數(shù)據(jù),實(shí)現(xiàn)了統(tǒng)一、通用的語言理解能力。機(jī)器視覺領(lǐng)域當(dāng)前正處于類似的...
2025-09-29 07:28:02 2960瀏覽 0點(diǎn)贊 0回復(fù) 0收藏
TheLandscapeofAgenticReinforcementLearningforLLMs:ASurvey20250902|OxfordU,ShanghaiAILab,NUS,UCL,UIUC,Brown,USTC,ImperialCollegeLondon,Bristol,CAS,CUHK,FudanU,UGA,UCSD,DLUT,UCSB??81??http:arxiv.orgabs2509.02547v1??????https:huggingface.copapers2509.02547??????https:github.comxhyumiracleAwesomeAgenticLLMRLPapers??研究背景與意義本論文聚焦于“AgenticReinforcementLearning(AgenticRL...
2025-09-05 00:17:53 3678瀏覽 0點(diǎn)贊 0回復(fù) 0收藏
ELVHalluc:BenchmarkingSemanticAggregationHallucinationsinLongVideoUnderstanding20250829|SenseTime??51??http:arxiv.orgabs2508.21496v2??????https:huggingface.copapers2508.21496??????https:github.comhlsv02ELVHalluc??研究背景與意義研究背景視頻多模態(tài)大型語言模型(VideoMLLMs)在視頻理解領(lǐng)域取得了顯著進(jìn)展,但仍存在“幻覺”問題,即生成與視頻內(nèi)容不一致或無關(guān)的信息。現(xiàn)有研究多聚焦于短視...
2025-09-05 00:17:36 1170瀏覽 0點(diǎn)贊 0回復(fù) 0收藏
FromScorestoSkills:ACognitiveDiagnosisFrameworkforEvaluatingFinancialLargeLanguageModels20250819|WHU,WHU,NAU,SWJTU,BUFT,AU,UoM??53??http:arxiv.orgabs2508.13491v1https:huggingface.copapers2508.13491https:github.comWHUNextGenFinCDM??研究背景與意義問題定義與現(xiàn)狀概述金融領(lǐng)域中大型語言模型(LLMs)展現(xiàn)出廣泛應(yīng)用潛力,但現(xiàn)有評(píng)測(cè)方法多依賴單一分?jǐn)?shù),難以揭示模型對(duì)金融知識(shí)的具體掌握情況。傳統(tǒng)金融LLM...
2025-08-25 01:40:51 1368瀏覽 0點(diǎn)贊 0回復(fù) 0收藏
VeriGUI:VerifiableLongChainGUIDataset20250806|??117??http:arxiv.orgabs2508.04026v1??????https:huggingface.copapers2508.04026??????https:github.comVeriGUITeamVeriGUI??研究背景與意義問題定義與現(xiàn)狀概述當(dāng)前自主GUI代理的研究已取得初步成果,主要聚焦于短期任務(wù)和基于結(jié)果的驗(yàn)證,難以滿足現(xiàn)實(shí)中復(fù)雜、長(zhǎng)鏈任務(wù)的需求。現(xiàn)有數(shù)據(jù)集多為短步驟操作,缺乏對(duì)多步驟、跨應(yīng)用復(fù)雜流程的支持,且驗(yàn)證方式...
2025-08-11 06:20:29 1334瀏覽 0點(diǎn)贊 0回復(fù) 0收藏
DesignLab:DesigningSlidesThroughIterativeDetectionandCorrection20250723|Sony,KAIST??33???http:arxiv.orgabs2507.17202v1????????https:huggingface.copapers2507.17202????????https:yeolj00.github.iopersonalprojectsdesignlab???研究背景與意義問題定義與現(xiàn)狀概述高質(zhì)量的演示幻燈片設(shè)計(jì)對(duì)于非專業(yè)人士而言是一項(xiàng)復(fù)雜且挑戰(zhàn)性的任務(wù),涉及內(nèi)容布局、配色方案、字體選擇等多方面的細(xì)節(jié)。現(xiàn)有自...
2025-07-28 00:20:47 1933瀏覽 0點(diǎn)贊 0回復(fù) 0收藏
Pixels,Patterns,butNoPoetry:ToSeeTheWorldlikeHumans20250721|UCAS,NJU,NUS,BUPT,NKU,PSU,PKU,BJTU??46???http:arxiv.orgabs2507.16863v1????????https:huggingface.copapers2507.16863????????https:TuringEyeTest.github.io???研究背景與意義多模態(tài)大語言模型(MLLMs)近年來在視覺理解與語言處理的結(jié)合上取得了顯著進(jìn)展,成為人工智能領(lǐng)域的重要研究方向。盡管已有研究多聚焦于提升MLLMs的推理能力...
2025-07-28 00:13:07 2389瀏覽 0點(diǎn)贊 0回復(fù) 0收藏
3DSceneGeneration:ASurvey20250508|NTU??10???http:arxiv.orgabs2505.05474v1????????https:huggingface.copapers2505.05474????????https:github.comhzxieAwesome3DSceneGeneration???研究背景與意義圖片3D場(chǎng)景生成旨在創(chuàng)建具有空間結(jié)構(gòu)、語義意義和逼真視覺效果的虛擬環(huán)境,支撐沉浸式媒體、機(jī)器人、自動(dòng)駕駛和embodiedAI等多種應(yīng)用。隨著虛擬現(xiàn)實(shí)、虛擬制作、城市規(guī)劃等需求的增長(zhǎng),逼真、多樣且具...
2025-07-07 06:29:17 1659瀏覽 0點(diǎn)贊 0回復(fù) 0收藏
Perception,Reason,Think,andPlan:ASurveyonLargeMultimodalReasoningModels20250508|HIT,Shenzhen??79???http:arxiv.orgabs2505.04921v1????????https:huggingface.copapers2505.04921????????https:github.comHITszTMGAwesomeLargeMultimodalReasoningModels???研究背景與意義智能推理的核心地位:推理作為智能行為的核心,賦予人工智能系統(tǒng)在多變、不確定及多模態(tài)環(huán)境中做出決策、歸納總結(jié)及跨領(lǐng)域泛...
2025-07-07 06:17:39 3184瀏覽 0點(diǎn)贊 0回復(fù) 0收藏
ImprovedIterativeRefinementforCharttoCodeGenerationviaStructuredInstruction20250615|SJTU,ShanghaiInno,LehighU,BIGAI,BIGAI??8??http:arxiv.orgabs2506.14837v1??????https:huggingface.copapers2506.14837??研究背景與意義問題定義與現(xiàn)狀概述多模態(tài)大語言模型(MLLMs)在視覺理解領(lǐng)域表現(xiàn)卓越,但在圖表到代碼生成任務(wù)中仍存在明顯不足。該任務(wù)不僅要求模型精準(zhǔn)理解高密度、多維度的圖表信息,還需將其準(zhǔn)確轉(zhuǎn)...
2025-06-23 06:24:31 1897瀏覽 0點(diǎn)贊 0回復(fù) 0收藏
REIMAGINE:SymbolicBenchmarkSynthesisforReasoningEvaluation20250618|MSRCUK,MicrosoftResearchIndia|ICML2025??2??http:arxiv.orgabs2506.15455v1??????https:huggingface.copapers2506.15455??研究背景與意義問題定義與現(xiàn)狀當(dāng)前大型語言模型(LLMs)在多種推理基準(zhǔn)測(cè)試中表現(xiàn)出較高準(zhǔn)確率,但仍存在爭(zhēng)議,即這些結(jié)果是否源自真正的推理能力,還是僅僅是訓(xùn)練數(shù)據(jù)的統(tǒng)計(jì)記憶。推理作為一種認(rèn)知過程,涉及基于事實(shí)...
2025-06-23 06:22:11 1642瀏覽 0點(diǎn)贊 0回復(fù) 0收藏
RoboRefer:TowardsSpatialReferringwithReasoninginVisionLanguageModelsforRobotics20250604|BUAA,PKU,BAAI??32??http:arxiv.orgabs2506.04308v1??????https:huggingface.copapers2506.04308??????https:zhoues.github.ioRoboRefer??研究背景與意義問題定義與現(xiàn)狀空間指稱是機(jī)器人理解并與三維物理世界交互的基礎(chǔ)能力。盡管現(xiàn)有預(yù)訓(xùn)練視覺語言模型(VLMs)在二維視覺任務(wù)上表現(xiàn)優(yōu)異,但它們?cè)趶?fù)雜三維場(chǎng)景的...
2025-06-09 22:40:39 1717瀏覽 0點(diǎn)贊 0回復(fù) 0收藏
Perception,Reason,Think,andPlan:ASurveyonLargeMultimodalReasoningModels20250508|HIT,Shenzhen??79??http:arxiv.orgabs2505.04921v1??????https:huggingface.copapers2505.04921??????https:github.comHITszTMGAwesomeLargeMultimodalReasoningModels??研究背景與意義智能推理的核心地位:推理作為智能行為的核心,賦予人工智能系統(tǒng)在多變、不確定及多模態(tài)環(huán)境中做出決策、歸納總結(jié)及跨領(lǐng)域泛化的能力。隨...
2025-05-13 07:32:02 3326瀏覽 0點(diǎn)贊 0回復(fù) 0收藏
BreakingtheModalityBarrier:UniversalEmbeddingLearningwithMultimodalLLMs20250424|USYD,DeepGlint,AlibabaGroup,ICL(Imperial)??28??http:arxiv.orgabs2504.17432v1??????https:huggingface.copapers2504.17432??????https:garygutc.github.ioUniME??研究背景與意義背景概述:當(dāng)前多模態(tài)表示學(xué)習(xí)領(lǐng)域,CLIP框架因其跨模態(tài)對(duì)比學(xué)習(xí)能力被廣泛采用,尤其在圖文檢索和聚類任務(wù)中表現(xiàn)突出。然而,CLIP存在文本...
2025-04-27 23:54:18 2740瀏覽 0點(diǎn)贊 0回復(fù) 0收藏
Step1XEdit:APracticalFrameworkforGeneralImageEditing20250424|StepFun,??55???http:arxiv.orgabs2504.17761v1????????https:huggingface.copapers2504.17761????????https:github.comstepfunaiStep1XEdit???研究背景與意義領(lǐng)域現(xiàn)狀與挑戰(zhàn)近年來,圖像編輯技術(shù)迅速發(fā)展,尤其是在多模態(tài)大模型(如GPT4o、Gemini2Flash)推動(dòng)下,實(shí)現(xiàn)了基于自然語言的高質(zhì)量圖像編輯。這些閉源模型在理解復(fù)雜編輯指令和...
2025-04-27 23:39:05 2860瀏覽 0點(diǎn)贊 0回復(fù) 0收藏