




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
毛航宇快手科技快意大模型知識增強(qiáng)研發(fā)負(fù)責(zé)人現(xiàn)就職于快手科技,快意大模型知識增強(qiáng)研發(fā)負(fù)責(zé)人,同時兼任智能交互團(tuán)隊負(fù)責(zé)人。主要關(guān)注Agent,RAG,Alignment,RL,LLM等技術(shù),在ICLR及NeurIPS,ICML等CCF-A/B類會議和期刊上發(fā)表論文30余篇,申請國際、國內(nèi)專利十余項(xiàng),相關(guān)研究在企業(yè)場景落地并產(chǎn)生較大效益。曾擔(dān)任上述國際會議的PC,SeniorPC,AreaChair,中國數(shù)據(jù)挖掘會議(CCDM)的論壇主席,以及CCF多智能體學(xué)組的執(zhí)行委員。本人和所帶領(lǐng)的團(tuán)隊曾獲全球數(shù)字經(jīng)濟(jì)大會“人工智能大模型場景應(yīng)用典型案例”、國際人工智能會議NeurIPS強(qiáng)化學(xué)習(xí)競賽冠軍、中國計算機(jī)學(xué)會“多智能體研究優(yōu)秀博士論文獎”、北京市“優(yōu)秀(博士)畢業(yè)生”、華為“創(chuàng)新先鋒總裁獎”。演講主題:從強(qiáng)化學(xué)習(xí)(多)智能體到大語言模型(多)智能體22.強(qiáng)化學(xué)習(xí)(多)智能體到大語言模型(多)智能體代表工作選講https://lilianweng.github.io/posts/2023-06-23-agent/LLM–AIAgentLLM–AIAgents16-1:AlphaGoCommunicationCommNet/BiCNet/ACCNetATOC/IC3Net/Gated-ACMLTransformerC51/QR-DQNEvolutionStrategy17:MADDPG/19:ATT-MADDPG18:VDN/QMIXGrouping/Role/Graph/AttentionCognitionConsistency(NCC-MARL)PermutationInvariant/Equivalent22-5:GeneralistAgentPromptTuning22:BootstrapTran(BooT)23:TIT/PDiTLlama/Llama-2GPT-3.5/GPT-423-3-23:ChatGPTplugins(OpenAI)23-6-23:LLMPoweredAgents(Lil23-8-7:TPTU23-8-22:SurveyfromReminUniversity23-9-14:SurveyfromFudanUniversity23-11-19:TPTU-2DS-Agent;Sheet/SQLAgent;ToolGen23:GenerativeAgents(斯坦福小鎮(zhèn))23:RecAgent/EconAgent23:ChatDev/ChatEval23:AgentGen/AgentVerse24:LLMAgentOperatingSystem24:AutomatedDesignofAgenticO1LLM–AIAgentLLM–AIAgents16-1:AlphaGoCommunicationCommNet/BiCNet/ACCNetATOC/IC3Net/Gated-ACMLTransformerC51/QR-DQNEvolutionStrategy17:MADDPG/19:ATT-MADDPG18:VDN/QMIXGrouping/Role/Graph/AttentionCognitionConsistency(NCC-MARL)PermutationInvariant/Equivalent22-5:GeneralistAgentPromptTuning22:BootstrapTran(BooT)23:TIT/PDiTLlama/Llama-2GPT-3.5/GPT-423-3-23:ChatGPTplugins(OpenAI)23-6-23:LLMPoweredAgents(Lil23-8-7:TPTU23-8-22:SurveyfromReminUniversity23-9-14:SurveyfromFudanUniversity23-11-19:TPTU-2DS-Agent;Sheet/SQLAgent;ToolGen23:GenerativeAgents(斯坦福小鎮(zhèn))23:RecAgent/EconAgent23:ChatDev/ChatEval23:AgentGen/AgentVerse24:LLMAgentOperatingSystem24:AutomatedDesignofAgenticSTEERO1TRLAgents應(yīng)用范式不需要重新訓(xùn)練僅需微調(diào)(提示詞)狀態(tài)空間小小相同點(diǎn)8TPTU-2EMNLP/ICLR24-LLMAgentAAAI20AAMAS24RL已經(jīng)能解決Atari等單技能任務(wù)智能體有在開放環(huán)境中的持續(xù)學(xué)習(xí)能力嗎?MineCraft成為天然“演練場”Mao,Hangyu,etal."Seihai:Asample-efficienthierarchicalaifortheminerlcompetition."DistributedArtificialIntelligence:ThirdInternationalConference,DAI2021.Guss,WilliamHebgen,etal."Towardsrobustanddomainagnosticreinforcementlearningcompetitions:MineRL2020."NeurIPS2020CompetitionandDemonstrationTrack.PMLR,2021.Mao,Hangyu,etal."Seihai:Asample-efficienthierarchicalaifortheminerlcompetition."DistributedArtificialIntelligence:ThirdInternationalConference,DAI2021.Guss,WilliamHebgen,etal."Towardsrobustanddomainagnosticreinforcementlearningcompetitions:MineRL2020."NeurIPS2020CompetitionandDemonstrationTrack.PMLR,2021.trainingtheschedulerboilsdowntoaclassificationtaskTPTU-2AAAI20AAMAS24Gated-ACMLMAcommunication是很經(jīng)典的研究課題,但實(shí)際問題中通信帶寬有限Mao,Hangyu,etal."Learningagentcommunicationunderlimitedbandwidthbymessagepruning."ProceedingsoftheAAAIConferenceonArtificialIntelligence.Vol.34.No.04.2020.Gated-ACML如何設(shè)置T==》動態(tài)(如下圖)如何設(shè)置T==》動態(tài)(如下圖)和靜態(tài)(?)messagepruning轉(zhuǎn)化為binaMao,Hangyu,etal."Learningagentcommunicationunderlimitedbandwidthbymessagepruning."ProceedingsoftheAAAIConferenceonArtificialIntelligence.Vol.34.No.04.2020.Gated-ACMLMao,Hangyu,etal."Learningagentcommunicationunderlimitedbandwidthbymessagepruning."ProceedingsoftheAAAIConferenceonArtificialIntelligence.Vol.34.No.04.2020.Chen,Yiqun,MaoHangyu,etal."Ptde:Personalizedtrainingwithdistilledexecutionformulti-agentreinforcementlearning."arXivpreprintarXiv:2210.08872(2022).AcceptedbyIJCAI2024.已有工作:在框架層面上思考CooperativeMARL可能的形式Chen,Yiqun,MaoHangyu,etal."Ptde:Personalizedtrainingwithdistilledexecutionformulti-agentreinforcementlearning."arXivpreprintarXiv:2210.08872(2022).AcceptedbyIJCAI2024.18已有工作:在框架層面上思考CooperativeMARL可能的形式①關(guān)注centralized①關(guān)注centralizedinformation對于actor的影響②關(guān)注如何personalizedthesamecentralizedinformationOverallFrameworkChen,Yiqun,MaoHangyu,etal."Ptde:Personalizedtrainingwithdistilledexecutionformulti-agentreinforcementlearning."arXivpreprintarXiv:2210.08872(2022).AcceptedbyIJCAI2024.19已有工作:在框架層面上思考CooperativeMARL可能的形式③知識蒸餾:保證去中心化執(zhí)行③知識蒸餾:保證去中心化執(zhí)行關(guān)注state向observation蒸餾,而不是模型壓縮①關(guān)注centralizedinformation對于actor的影響②關(guān)注如何personalizedthesamecentralizedinformationOverallFrameworkChen,Yiqun,MaoHangyu,etal."Ptde:Personalizedtrainingwithdistilledexecutionformulti-agentreinforcementlearning."arXivpreprintarXiv:2210.08872(2022).AcceptedbyIJCAI2024.20Chen,Yiqun,MaoHangyu,etal."Ptde:Personalizedtrainingwithdistilledexecutionformulti-agentreinforcementlearning."arXivpreprintarXiv:2210.08872(2022).AcceptedbyIJCAI2024.SeePaperforMoreResultswithVDN/MAPPOonGoogleResearchFootballandLearning2RankTasks怎么改進(jìn)主流的MADDPG方法?更核心的思考:如何建模不斷變化的隊友策略,并進(jìn)一步更新自己的策略?Mao,Hangyu,etal."ModellingtheDynamicJointPolicyofTeammateswithAttentionMulti-agentDDPG."Proceedingsofthe18thInternationalConferenceonAutonomousAgentsandMultiAgentSystems.2019.Multi-agent怎么才能像人一樣很好的合作?人在合作時有什么特性?Mao,Hangyu,etal."Neighborhoodcognitionconsistentmulti-agentreinforcementlearning."ProceedingsoftheAAAIconferenceonartificialintelligence.Vol.34.No.05.2020.如何利用多智能體系統(tǒng)的特性Hao,Xiaotian,MaoHangyu,etal."Breakingthecurseofdimensionalityinmultiagentstatespace:Aunifiedagentpermutationframework."arXivpreprintarXiv:2203.05285(2022).Hao,Jianye,HaoXiaotian,MaoHangyu,etal."Boostingmultiagentreinforcementlearningviapermutationinvariantandpermutationequivariantnetworks."TheEleventhICLR.2023.關(guān)于MARL/Game更詳細(xì)梳理TPTU-2AAAI20AAMAS24Transformer結(jié)構(gòu)如何適配RL?Perception&DecisionMao,Hangyu,etal."Transformerintransformerasbackbonefordeepreinforcementlearning."arXivpreprintarXiv:2212.14538(2022).Mao,Hangyu,etal."PDiT:InterleavingPerceptionandDecision-makingTransformersforDeepReinforcementLearning."The23rdInternationalConferenceonAutonomousAgentsandMultiAgentSystems.2024.Mao,Hangyu,etal."Transformerintransformerasbackbonefordeepreinforcementlearning."arXivpreprintarXiv:2212.14538(2022).Mao,Hangyu,etal."PDiT:InterleavingPerceptionandDecision-makingTransformersforDeepReinforcementLearning."The23rdInternationalConferenceonAutonomousAgentsandMultiAgentSystems.2024.Mao,Hangyu,etal."Transformerintransformerasbackbonefordeepreinforcementlearning."arXivpreprintarXiv:2212.14538(2022).Mao,Hangyu,etal."PDiT:InterleavingPerceptionandDecision-makingTransformersforDeepReinforcementLearning."The23rdInternationalConferenceonAutonomousAgentsandMultiAgentSystems.2024.效果好:相同算法下好過DT/GATO/StARformerMao,Hangyu,etal."Transformerintransformerasbackbonefordeepreinforcementlearning."arXivpreprintarXiv:2212.14538(2022).Mao,Hangyu,etal."PDiT:InterleavingPerceptionandDecision-makingTransformersforDeepReinforcementLearning."The23rdInternationalConferenceonAutonomousAgentsandMultiAgentSystems.2024.TPTU-2AAAI20AAMAS24Transformer結(jié)構(gòu)如何適配MARL?和異步?jīng)Q策完全匹配!Zhang,Bin,HangyuMao,etal."Stackelbergdecisiontransformerforasynchronousactioncoordinationinmulti-agentsystems."arXivpreprintarXiv:2305.07856(2023).Zhang,Bin,HangyuMao,etal."SequentialAsynchronousActionCoordinationinMulti-AgentSystems:AStackelbergDecisionTransformerApproach."ICML2024.Zhang,Bin,HangyuMao,etal."Stackelbergdecisiontransformerforasynchronousactioncoordinationinmulti-agentsystems."arXivpreprintarXiv:2305.07856(2023).Zhang,Bin,HangyuMao,etal."SequentialAsynchronousActionCoordinationinMulti-AgentSystems:AStackelbergDecisionTransformerApproach."ICML2024.效果好:一定條件下能收斂到StackelbergEquilibrium(TPTU-2AAAI20AAMAS24LLM已經(jīng)初步具備通識能力,可以認(rèn)為是通用的“世界模型”RL中的actionspace/externalenvironmentToolUseRuan,Jingqing,etal."TPTU:TaskPlanningandToolUsageofLargeLanguageModel-basedAIAgents."NeurIPS2023FoundationModelsforDecisionMakingWorkshop.2023.Kong,Yilun,etal."Tptu-v2:Boostingtaskplanningandtoolusageoflargelanguagemodel-basedagentsinreal-worldsystems."ICLR2024LLMAgentWorkshop.2024.Ruan,Jingqing,etal."TPTU:TaskPlanningandToolUsageofLargeLanguageModel-basedAIAgents."NeurIPS2023FoundationModelsforDecisionMakingWorkshop.2023.Kong,Yilun,etal."Tptu-v2:Boostingtaskplanningandtoolusageoflargelanguagemodel-basedagentsinreal-worldsystems."ICLR2024LLMAgentWorkshop.2024.3.2EvaluationonTaskPlanningAbility3.3EvaluationonToolUsageAbilityTools3.4InsightfulObservationsTPTU-2LLM已經(jīng)初步具備通識能力,可以認(rèn)為是通用的“世界模型”LLM如何賦能智能體從RL角度看)關(guān)鍵點(diǎn)在什么地方?RL中的long-term/multi-stepdeciTPTU-2TPTU-2Sincetheubiquityandsatisfactoryperformanceofexistingfine-tuningmethods,suchasSFT,LoRA,andQLoRA,weshiftour猜猜DemoRetriever細(xì)節(jié)?TPTU-2TPTU-2AAAI20AAMAS24LLM已經(jīng)初步具備通識能力,可以認(rèn)為是通用的“世界模型”LLM如何賦能多(large-scale)智能體從MARL角度看)關(guān)鍵點(diǎn)在什么地方?Zhang,Bin,HangyuMao,etal."Controllinglargelanguagemodel-basedagentsforlarge-scaledecision-making:Anactor-criticapproach."ICLR2024LLMAgentWorkshop.2024.Zhang,Bin,HangyuMao,etal."Controllinglargelanguagemodel-basedagentsforlarge-scaledecision-making:Anactor-criticapproach."ICLR2024LLMAgentWorkshop.2024.stationary。2.通過accessor來平衡探索和來的問題。Feedback,使信息更加精簡準(zhǔn)確,減少迭代次數(shù),進(jìn)而減少token數(shù)量。2.強(qiáng)化學(xué)習(xí)(多)智能體到大語言模型(多)智能體代表工作選講技術(shù)發(fā)展方向ATT-MADDPG,AAMAS19AAMAS24TPTU-2技術(shù)發(fā)展方向AAMAS24TPTU-2X-X-Light/CoSLightATT-MADDPG,AAMAS19技術(shù)發(fā)展方向AAMAS24TPTU-2AAMAS24TPTU-2X-Light/CoSLightATT-MADDPG,AAMAS19嚴(yán)肅商業(yè)場景,提升效率為主(ToC/B/G)新聞報道訴狀生成Copilot新聞報道訴狀生成Copilot代碼小浣熊SQLBench/PET-SQL個人助手(請假)TPTU/TPTU-2刷單/惡評識別Character.AI泛娛樂場景,情感共鳴為主(ToC)幾乎閉集幾乎閉集幻覺好控價值好衡量新聞報道訴狀生成幻覺難控制,合規(guī)安全難保障Copilot代碼小浣熊短代碼價值不好SQLBench/PET-SQL個人助手(請假)TPTU/TPTU-2幻覺好控制價值好衡量錦上添花型泛娛樂場景,情感共鳴為主(ToC)嚴(yán)肅商業(yè)場景,提升效率為主(ToC/B/G)Copilot代碼小浣熊Copilot代碼小浣熊SQLBench/PET-SQL刷單/惡評識別“AI“AI小快”創(chuàng)意生成Character.AI“AI小快”技術(shù)發(fā)展方向AAMAS24TPTU-2AAMAS24TPTU-2 X-Light/CoSLight ATT-MADDPG,AAMAS19代碼TRL-AgentsTPTU-v2參考論文oursMao,Hangyu,etal."Seihai:Asample-efficienthierarchicalaifortheminerlcompetition."DistributedArtificialIntelligence:ThirdInternationalConference,DAI2021,Shanghai,China,December17–18,2021,Proceedings3.SpringerInternationalPublishing,2022.Mao,Hangyu,etal.“ACCNet:Actor-coordinator-criticnetfor”Learning-to-communicate“withdeepmulti-agentreinforcementlearning.”arXivpreprintarXiv:1706.03235(2017).Mao,Hangyu,etal."Learningagentcommunicationunderlimitedbandwidthbymessagepruning."ProceedingsoftheAAAIConferenceonArtificialIntelligence.Vol.34.No.04.2020.Mao,Hangyu,etal."ModellingtheDynamicJointPolicyofTeammateswithAttentionMulti-agentDDPG."Proceedingsofthe18thInternationalConferenceonAutonomousAgentsandMultiAgentSystems.2019.Chen,Yiqun,etal."Ptde:Personalizedtrainingwithdistillatedexecutionformulti-agentreinforcementlearning."arXivpreprintarXiv:2210.08872(2022).AcceptedbyIJCAI2024.Mao,Hangyu,etal."Neighborhoodcognitionconsistentmulti-agentreinforcementlearning."ProceedingsoftheAAAIconferenceonartificialintelligence.Vol.34.No.05.2020.Jianye,H.A.O.,etal."Boostingmultiagentreinforcementlearningviapermutationinvariantandpermutationequivariantnetworks."TheEleventhInternationalConferenceonLearningRepresentations.2022.Zhang,Xianjie,etal."Structuralrelationalinferenceactor-criticformulti-agentreinforcementlearning."Neurocomputing459(2021):383-394.Mao,Hangyu,etal."Transformerintransformerasbackbonefordeepreinforcementlearning."arXivpreprintarXiv:2212.14538(2022).Mao,Hangyu,etal."PDiT:InterleavingPerceptionandDecision-makingTransformersforDeepReinforcementLearning."arXivpreprintarXiv:2312.15863(2023).AcceptedbyAAMAS2024.Zhang,Bin,etal."Stackelbergdecisiontransformerforasynchronousactioncoordinationinmulti-agentsystems."arXivpreprintarXiv:2305.07856(2023).AcceptedbyICML2024.Zhang,Bin,etal.SequentialAsynchronousActionCoordinationinMulti-AgentSystems:AStackelbergDecisionTransformerApproach.AcceptedbyICML2024.Ruan,Jingqing,etal."TPTU:TaskPlanningandToolUsageofLargeLanguageModel-basedAIAgents."NeurIPS2023FoundationModelsforDecisionMakingWorkshop.2023.Kong,Yilun,etal."Tptu-v2:Boostingtaskplanningandtoolusageoflargelanguagemodel-basedagentsinreal-worldsystems."arXivpreprintarXiv:2311.11315.AcceptedbyICLR2024LLMAgentworkshop.Zhang,Bin,etal."Controllinglargelanguagemodel-basedagentsforlarge-scaledecision-making:Anactor-criticapproach."arXivpreprintarXiv:2311.13884.AcceptedbyICLR2024LLMAgentworkshop.思考總結(jié):Zhang,Bin,etal."BenchmarkingtheText-to-SQLCapabilityofLargeLanguageModels:AComprehensiveEvaluation."arXivpreprintarXiv:2403.02951(2024).Li,Zhishuai,etal."PET-SQL:APrompt-enhancedTwo-stageText-to-SQLFrameworkwithCross-consistency."arXivpreprintarXiv:2403.09732(2024).Sui,Guanghu,etal."ReboostLargeLanguageModel-basedText-to-SQL,Text-to-Python,andText-to-Function--withRealApplicationsinTrafficDomain."arXivpreprintarXiv:2310.18752(2023).Jiang,Haoyuan,etal."Ageneralscenario-agnosticreinforcementlearningfortrafficsignalcontrol."IEEETransactionsonIntelligentTransportationSystems(2024).Lu,Jiaming,etal."DuaLight:EnhancingTrafficSignalControlbyLeveragingScenario-SpecificandScenario-SharedKnowledge."arXivpreprintarXiv:2312.14532(2023).AcceptedbyAAMAS2024.Jiang,Haoyuan,etal."X-Light:Cross-CityTrafficSignalControlUsingTransformeronTransformerasMetaMulti-AgentReinforcementLearner."arXivpreprintarXiv:2404.12090(2024).IJCAI24.Ruan,Jingqing,etal.CoSLight:Co-optimizingCollaboratorSelectionandDecision-makingtoEnhanceTrafficSignalControl.AcceptedbyKDD2024.Kong,Yilun,etal."QPO:Query-dependentPromptOptimizationviaMulti-LoopOfflineReinforcementLearning."arXivpreprintarXiv:2408.10504(2024).參考論文1DRLFoundation(2015-2017):[1]Mnih,Volodymyr,etal."Human-levelcontrolthroughdeepreinforcementlearning."nature518.7540(2015):529-533.[2]Schulman,John,etal."TrustRegionPolicyOptimization."arXivpreprintarXiv:1502.05477(2015).[3]Schulman,John,etal."High-dimensionalcontinuouscontrolusinggeneralizedadvantageestimation."arXivpreprintarXiv:1506.02438(2015).[4]Lillicrap,TimothyP.,etal."Continuouscontrolwithdeepreinforcementlearning."arXivpreprintarXiv:1509.02971(2015).[5]Silver,David,etal."MasteringthegameofGowithdeepneuralnetworksandtreesearch."nature529.7587(2016):484-489.[6]Schulman,John,etal."Proximalpolicyoptimizationalgorithms."arXivpreprintarXiv:1707.06347(2017).DRLin2018-2020:Rainbow:CombiningImprovementsinDeepReinforcementLearning,Hesseletal,2017.Algorithm:RainbowDQN.ADistributionalPerspectiveonReinforcementLearning,Bellemareetal,2017.Algorithm:C51.DistributionalReinforcementLearningwithQuantileRegression,Dabneyetal,2017.Algorithm:QR-DQN.EvolutionStrategiesasaScalableAlternativetoReinforcementLearning,Salimansetal,2017.Algorithm:ES.NeuralNetworkDynamicsforModel-BasedDeepReinforcementLearningwithModel-FreeFine-Tuning,Nagabandietal,2017.Algorithm:MBMF.(modelislearned)MasteringChessandShogibySelf-PlaywithaGeneralReinforcementLearningAlgorithm,Silveretal,2017.Algorithm:AlphaZero.(modelisgiven)IMPALA:ScalableDistributedDeep-RLwithImportanceWeightedActor-LearnerArchitectures,Espeholtetal,2018.Algorithm:IMPALA.Data-EfficientHierarchicalReinforcementLearning,Nachumetal,2018.Algorithm:HIRO.Mao,Hangyu,etal."Seihai:Asample-efficienthierarchicalaifortheminerlcompetition."DistributedArtificialIntelligence:ThirdInternationalConference,DAI2021,Shanghai,China,December17–18,2021,Proceedings3.SpringerInternationalPublishing,2022.Fu,Justin,etal."D4rl:Datasetsfordeepdata-drivenreinforcementlearning."arXivpreprintarXiv:2004.07219(2020).參考論文2MARLCommunication:Mao,Hangyu,etal.“ACCNet:Actor-coordinator-criticnetfor”Learning-to-communicate“withdeepmulti-agentreinforcementlearning.”arXivpreprintarXiv:1706.03235(2017).Mao,Hangyu,etal."Learningagentcommunicationunderlimitedbandwidthbymessagepruning."ProceedingsoftheAAAIConferenceonArtificialIntelligence.Vol.34.No.04.2020.MARLCTDE:Lowe,Ryan,etal."Multi-agentactor-criticformixedcooperative-competitiveenvironments."Advancesinneuralinformationprocessingsystems30(2017).Mao,Hangyu,etal."ModellingtheDynamicJointPolicyofTeammateswithAttentionMulti-agentDDPG."Proceedingsofthe18thInternationalConferenceonAutonomousAgentsandMultiAgentSystems.2019.Sunehag,Peter,etal."Value-DecompositionNetworksForCooperativeMulti-AgentLearningBasedOnTeamReward."Proceedingsofthe17thInternationalConferenceonAutonomousAgentsandMultiAgentSystems.2018.Rashid,Tabish,etal."QMIX:MonotonicValueFunctionFactorisationforDeepMulti-AgentReinforcementLearning."InternationalConferenceonMachineLearning.PMLR,2018.Yu,Chao,etal."Thesurprisingeffectivenessofppoincooperativemulti-agentgames."AdvancesinNeuralInformationProcessingSystems35(2022):24611-24624.Chen,Yiqun,etal."Ptde:Personalizedtrainingwithdistillatedexecutionformulti-agentreinforcementlearning."arXivpreprintarXiv:2210.08872(2022).AcceptedbyIJCAI2024.MARLin2020-2021:Mao,Hangyu,etal."Neighborhoodcognitionconsistentmulti-agentreinforcementlearning."ProceedingsoftheAAAIconferenceonartificialintelligence.Vol.34.No.05.2020.Jianye,H.A.O.,etal."Boostingmultiagentreinforcementlearningviapermutationinvariantandpermutationequivariantnetworks."TheEleventhInternationalConferenceonLearningRepresentations.2022.Zhang,Xianjie,etal."Structuralrelationalinferenceactor-criticformulti-agentreinforcementlearning."Neurocomputing459(2021):383-394.參考論文3Transformer-basedRL(TRL)Foundation(2021-2022):Chen,Lili,etal."Decisiontransformer:Reinforcementlearningviasequencemodeling."Advancesinneuralinformationprocessingsystems34(2021):15084-15097.Janner,Michael,QiyangLi,andSergeyLevine."Offlinereinforcementlearningasonebigsequencemodelingproblem."Advancesinneuralinformationprocessingsystems34(2021):1273-1286.Reed,Scott,etal."Ageneralistagent."arXivpreprintarXiv:2205.06175(2022).Brohan,Anthony,etal."Rt-1:Roboticstransformerforreal-worldcontrolatscale."arXivpreprintarXiv:2212.06817(2022).TRLin2023-2024:Mao,Hangyu,etal."Transformerintransformerasbackbonefordeepreinforcementlearning."arXivpreprintarXiv:2212.14538(2022).Mao,Hangyu,etal."PDiT:InterleavingPerceptionandDecision-makingTransformersforDeepReinforcementLearning."arXivpreprintarXiv:2312.15863(2023).AcceptedbyAAMAS2024.Siebenborn,Max,etal."Howcrucialistransformerindecisiontransformer?."arXivpreprintarXiv:2211.14655(2022).Lee,Kuang-Huei,etal."Multi-gamedecisiontransformers."AdvancesinNeuralInformationProcessingSystems35(2022):27921-27936.Paster,Keiran,SheilaMcIlraith,andJimmyBa."Youcan’tcountonluck:Whydecisiontransformersandrvsfailinstochasticenvironments."Advancesinneuralinformationprocessingsystems35(2022):38966-38979.Wang,Kerong,etal."Bootstrappedtransformerforofflinereinforcementlearning."AdvancesinNeuralInformationProcessingSystems35(2022):34748-34761.Zheng,Qinqing,AmyZhang,andAdityaGrover."Onlinedecisiontransformer."internationalconferenceonmachinelearning.PMLR,2022.Xu,Mengdi,etal."Promptingdecisiontransformerforfew-shotpolicygeneralization."internationalconferenceonmachinelearning.PMLR,2022.Yamagata,Taku,AhmedKhalil,andRaulSantos-Rodriguez."Q-learningdecisiontransformer:Leveragingdynamicprogrammingforconditionalsequencemodellinginofflinerl."InternationalConferenceonMachineLearning.PMLR,2023.Hu,Shengchao,etal."Graphdecisiontransformer."arXivpreprintarXiv:2303.03747(2023).Ma,Yi,etal."RethinkingDecisionTransformerviaHierarchicalReinforcementLearning."arXivpreprintarXiv:2311.00267(2023).Wang,Yuanfu,etal."Critic-guideddecisiontransformerforofflinereinforcementlearning."ProceedingsoftheAAAIConferenceonArtificialIntelligence.Vol.38.No.14.2024.參考論文4Transformer-basedMARL:Wen,M.,Kuba,J.,Lin,R.,Zhang,W.,Wen,Y.,Wang,J.,&Yang,Y.(2022).Multi-agentreinforcementlearningisasequencemodelingproblem.AdvancesinNeuralInformationProcessingSystems,35,16509-16521.Meng,Linghui,etal."Offlinepre-trainedmulti-agentdecisiontransformer."MachineIntelligenceResearch20.2(2023):233-248.Zhang,Bin,etal."Stackelbergdecisiontransformerforasynchronousactioncoordinationinmulti-agentsystems."arXivpreprintarXiv:2305.07856(2023).AcceptedbyICML2024.SequentialAsynchronousActionCoordinationinMulti-AgentSystems:AStackelbergDecisionTransformerApproach.AcceptedbyICML2024.參考論文-NLPVaswani,Ashish,etal."Attentionisallyouneed."Advancesinneuralinformationprocessingsystems30(2017).Devlin,Jacob,etal."Bert:Pre-trainingofdeepbidirectionaltransformersforlanguageunderstanding."arXivpreprintarXiv:1810.04805(2018).Radford,Alec,etal."Languagemodelsareunsupervisedmultitasklearners."OpenAIblog1.8(2019):9.Brown,Tom,etal."Languagemodelsarefew-shotlearners."Advancesinneuralinformationprocessingsystems33(2020):1877-1901.202101-Prefix-Tuning:OptimizingContinuousPromptsforGeneration202104-ThePowerofScaleforParameter-EfficientPromptTuning202110-P-Tuningv2:PromptTuningCanBeComparabletoFine-tuningUniversallyAcrossScalesandTasksWei,Jason,etal."Chain-of-ThoughtPromptingElicitsReasoninginLargeLanguageModels."arXivpreprintarXiv:2201.11903(2022).NeurIPS22.Ouyang,Long,etal."Traininglanguagemodelstofollowinstructionswithhumanfeedback."Advancesinneuralinformationprocessingsystems35(2022):27730-27744./index/chatgpt/ByJanuary2023,ithadbecomewhatwasthenthefastest-growingconsumersoftwareapplicationinhistory,gainingover100millionusersTouvron,Hugo,etal."Llama:Openandefficientfoundationlanguagemodels."arXivpreprintarXiv:2302.13971(2023).Touvron,Hugo,etal."Llama2:Openfoundationandfine-tunedchatmodels."arXivpreprintarXiv:2307.09288(2023)./meta-llama/llama3/blob/main/MODEL_CARD.md/index/gpt-4-research/Achiam,Josh,etal."Gpt-4technicalreport."arXivpreprintarXiv:2303.08774(2023).參考論文5&6NLP-Agent:/Significant-Gravitas/AutoGPT/yoheinakajima/babyagi/index/chatgpt-plugins/(We’realsohostingtwopluginsourselves,awebbrowserandcodeinterp
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 住院醫(yī)師病例匯報大賽
- 內(nèi)科咨詢報告總結(jié)
- DBJT 13-119-2010 福建省住宅工程質(zhì)量分戶驗(yàn)收規(guī)程
- 半年護(hù)理工作總結(jié)
- 女士形象禮儀培訓(xùn)
- 污水泵基礎(chǔ)知識培訓(xùn)
- 急性胰腺炎病人的護(hù)理
- 祭祖活動面試題及答案
- java基礎(chǔ)面試題及答案軟件測試
- 輪胎工藝考試題及答案
- 2025年社區(qū)工作者考試題目及答案
- 跨國知識產(chǎn)權(quán)爭議的司法解決途徑
- 西方倫理學(xué)名著選輯(上下卷)
- 應(yīng)急管理部門職工招聘合同
- 2025年教師招聘教師資格面試逐字稿初中體育教師招聘面試《排球正面雙手墊球》試講稿(逐字稿)
- 2024北京海淀初一(上)期中數(shù)學(xué)試卷及答案解析
- 2023年貴州貴州貴安發(fā)展集團(tuán)有限公司招聘筆試真題
- 神經(jīng)內(nèi)科常見藥物及管理
- 2025版國家開放大學(xué)法學(xué)本科《國際私法》歷年期末紙質(zhì)考試案例題題庫
- 【MOOC】中醫(yī)診斷學(xué)-福建中醫(yī)藥大學(xué) 中國大學(xué)慕課MOOC答案
- 中華傳統(tǒng)文化之戲曲瑰寶學(xué)習(xí)通超星期末考試答案章節(jié)答案2024年
評論
0/150
提交評論