1、从 强化学习(多)智能体到 大语言模型(多)智能体毛航宇,快手科技2024年10月12日RLChina2024目录21.强化学习(多)智能体 到 大语言模型(多)智能体 十年研究脉络梳理2.强化学习(多)智能体 到 大语言模型(多)智能体 代表工作选讲Deep RL Agent(DRL)Transformer-based RL Agent(TRL)LLM-based AI Agent3.企业实践中的心得体会Background:RL AgentsBackground:AI Agentshttps:/lilianweng.github.io/posts/2023-06-23-agent/强化学习
2、(多)智能体 和 大语言模型(多)智能体 十年研究脉络Deep RLDeep MARLNLPLLM AI AgentLLM AI Agents15DRL Foundation15-2:DQN15-2:TRPO15-6:GAE15-9:DDPG16-1:AlphaGo17-7:PPO-16CommunicationCommNet/BiCNet/ACCNetATOC/IC3Net/Gated-ACML-17Transformer-18Novel PerspectiveRainbow DQNC51/QR-DQNEvolution StrategyModel-based RLScaling RLHie
3、rarchal RL(SEIHAI)Offline RLCTDE17:MADDPG/19:ATT-MADDPG18:VDN/QMIX21:IPPO/MAPPO22:PTDEBERT-19GPT-2-20Novel PerspectiveGrouping/Role/Graph/AttentionCognition Consistency(NCC-MARL)Permutation Invariant/EquivalentGPT-3-21TRL Foundation21-6:DT/TT22-5:Generalist Agent22-12:RT-1Prompt Tuning-22MAT3-4:Inst
4、ructGPT11-30:ChatGPT-23Novel Perspective22:Prompting DT22:Online DT22:Bootstrap Tran(BooT)23:Q-learning DT23:Hierarchical DT23:TIT/PDiTMADTLlama/Llama-2GPT-3.5/GPT-423-3-23:ChatGPT plugins(OpenAI)23-6-23:LLM Powered Agents(LilLog)Github Project:AutoGPT/BabyAGI23-8-7:TPTU23-8-22:Survey from Remin Uni
5、versity23-9-14:Survey from Fudan University23-11-19:TPTU-2DS-Agent;Sheet/SQLAgent;ToolGen23:Generative Agents(斯坦福小镇)23:ChatDev/ChatEval23:AgentGen/AgentVerse23:LlaMAC24:LLM Agent Operating System24:Internet of Agents24:Automated Design of Agentic24STEERLlama-3GPT-4oO1强化学习(多)智能体 和 大语言模型(多)智能体 十年研究脉络D
6、eep RLDeep MARLNLPLLM AI AgentLLM AI Agents15DRL Foundation15-2:DQN15-2:TRPO15-6:GAE15-9:DDPG16-1:AlphaGo17-7:PPO-16CommunicationCommNet/BiCNet/ACCNetATOC/IC3Net/Gated-ACML-17Transformer-18Novel PerspectiveRainbow DQNC51/QR-DQNEvolution StrategyModel-based RLScaling RLHierarchal RL(SEIHAI)Offline RL