毛宇航_RLChina23 - 周日上午 - 毛航宇 - 从强化学习(多)智能体到大语言模型(多)智能体(1)_watermark.pdf

上传人：张**

编号：155525

2024-02-15

PDF 35页 2.73MB

《毛宇航_RLChina23 - 周日上午 - 毛航宇 - 从强化学习(多)智能体到大语言模型(多)智能体(1)_watermark.pdf》由会员分享，可在线阅读，更多相关《毛宇航_RLChina23 - 周日上午 - 毛航宇 - 从强化学习(多)智能体到大语言模型(多)智能体(1)_watermark.pdf（35页珍藏版）》请在三个皮匠报告上搜索。

1、从强化学习(多)智能体到大语言模型(多)智能体1毛航宇商汤科技RLChina2023 “大模型与AI Agent”目录SEIHAINeurIPS20,DAI21TIT/PDiTSubmit to AAMAS24Arxiv 22.12TPTUNeurIPS24-FMDMTPTU-V2Arxiv 23.11Gated-ACMLAAAI20NCC-MARLAAAI20STEERSubmit to AAAI24Arxiv 23.05LLaMACArxiv 23.11SingleMultiDRL TRL LLM-based Agent SEIHAISEIHAI:A Sample-efficient

2、Hierarchical AI for the MineRL CompetitionMotivation验证agents在Open-ende环境中的不断学习能力成为AI的一个重要方向MineCraft成为天然的“演练场”SEIHAI是第一个在NeurIPS MineRLCompetition中完全learning-based达到“铁器时代”的agentMineCraft难点item依赖、稀疏奖励+长episode、无任何语义SEIHAISEIHAI:A Sample-efficient Hierarchical AI for the MineRL Competitiontraining the

3、 scheduler boils down to a classification taskSEIHAISEIHAI:A Sample-efficient Hierarchical AI for the MineRL Competition目录SEIHAINeurIPS20,DAI21TIT/PDiTSubmit to AAMAS24Arxiv 22.12TPTUNeurIPS24-FMDMTPTU-V2Arxiv 23.11Gated-ACMLAAAI20NCC-MARLAAAI20STEERSubmit to AAAI24Arxiv 23.05LLaMACArxiv 23.11Single

4、MultiDRL TRL LLM-based Agent Gated-ACMLLearning Agent Communication under Limited Bandwidth by Message PruningMotivationMulti-agent communication是个很古老的研究课题，研究how、what、to whom to communicate但实际问题中通信带宽有限，如何在limited bandwidth下进行通信？Gated-ACMLLearning Agent Communication under Limited Bandwidth by Messag

5、e Pruning如何设置T=动态（如下图）和静态（？）将limited bandwidth转化为message pruningmessage pruning转化为binary classification如何设置T=动态（如下图）和静态（？）Gated-ACMLLearning Agent Communication under Limited Bandwidth by Message PruningNCC-MARLNeighborhood Cognition Consistent Multi-Agent Reinforcement LearningMotivationMulti-agent

6、怎么才能像人一样很好的合作？人在合作时有什么特性？认知一致性！NCC-MARLNeighborhood Cognition Consistent Multi-Agent Reinforcement LearningNCC-MARLNeighborhood Cognition Consistent Multi-Agent Reinforcement Learning一致性近似变分推理目录SEIHAINeurIPS20,DAI21TIT/PDiTSubmit to AAMAS24Arxiv 22.12TPTUNeurIPS24-FMDMTPTU-V2Arxiv 23.11Gated-ACMLAAA

毛宇航_RLChina23 - 周日上午 - 毛航宇 - 从 强化学习(多)智能体 到 大语言模型(多)智能体(1)_watermark.pdf

相关报告

毛宇航_RLChina23 - 周日上午 - 毛航宇 - 从强化学习(多)智能体到大语言模型(多)智能体(1)_watermark.pdf