1、从 强化学习(多)智能体到 大语言模型(多)智能体1毛航宇商汤科技RLChina2023 “大模型与AI Agent”目录SEIHAINeurIPS20,DAI21TIT/PDiTSubmit to AAMAS24Arxiv 22.12TPTUNeurIPS24-FMDMTPTU-V2Arxiv 23.11Gated-ACMLAAAI20NCC-MARLAAAI20STEERSubmit to AAAI24Arxiv 23.05LLaMACArxiv 23.11SingleMultiDRL TRL LLM-based Agent SEIHAISEIHAI:A Sample-efficient
2、Hierarchical AI for the MineRL CompetitionMotivation验证agents在Open-ende环境中的不断学习能力成为AI的一个重要方向MineCraft成为天然的“演练场”SEIHAI是第一个在NeurIPS MineRLCompetition中完全learning-based达到“铁器时代”的agentMineCraft难点item依赖、稀疏奖励+长episode、无任何语义SEIHAISEIHAI:A Sample-efficient Hierarchical AI for the MineRL Competitiontraining the
3、 scheduler boils down to a classification taskSEIHAISEIHAI:A Sample-efficient Hierarchical AI for the MineRL Competition目录SEIHAINeurIPS20,DAI21TIT/PDiTSubmit to AAMAS24Arxiv 22.12TPTUNeurIPS24-FMDMTPTU-V2Arxiv 23.11Gated-ACMLAAAI20NCC-MARLAAAI20STEERSubmit to AAAI24Arxiv 23.05LLaMACArxiv 23.11Single
4、MultiDRL TRL LLM-based Agent Gated-ACMLLearning Agent Communication under Limited Bandwidth by Message PruningMotivationMulti-agent communication是个很古老的研究课题,研究how、what、to whom to communicate但实际问题中通信带宽有限,如何在limited bandwidth下进行通信?Gated-ACMLLearning Agent Communication under Limited Bandwidth by Messag
5、e Pruning如何设置T=动态(如下图)和静态(?)将limited bandwidth转化为message pruningmessage pruning转化为binary classification如何设置T=动态(如下图)和静态(?)Gated-ACMLLearning Agent Communication under Limited Bandwidth by Message PruningNCC-MARLNeighborhood Cognition Consistent Multi-Agent Reinforcement LearningMotivationMulti-agent
6、怎么才能像人一样很好的合作?人在合作时有什么特性?认知一致性!NCC-MARLNeighborhood Cognition Consistent Multi-Agent Reinforcement LearningNCC-MARLNeighborhood Cognition Consistent Multi-Agent Reinforcement Learning一致性近似变分推理目录SEIHAINeurIPS20,DAI21TIT/PDiTSubmit to AAMAS24Arxiv 22.12TPTUNeurIPS24-FMDMTPTU-V2Arxiv 23.11Gated-ACMLAAA