《杜雅丽_MAL tutorial_watermark.pdf》由会员分享,可在线阅读,更多相关《杜雅丽_MAL tutorial_watermark.pdf(65页珍藏版)》请在三个皮匠报告上搜索。
1、Cooperation in Multi-Agent Learning:A Review Yali Du Kings College London1Multi-Agent Systems Are Everywhere2Type of multi-agent systems3strategic situations(games)Pure motiveMixed motivepure conflict(zero-sum)pure common interestTeam reward Markov Games,football,smac,Mixed motive Markov Games self-
2、driving cars supply chains,Markov Game All agents see the global state Individual action:selects and action State transitions:Agent s reward:Notes In practice,agent has limited partial observability.Partially observable Markov Game(POMG)(,):0,1iri(s,a):S A Ro O(o|s,a)4Environmenta1a2o1,R1o2,R2s P(s|
3、s,a1,a2,.)Team Markov games Team reward is equal to individual rewards,i.e.Or another similar setting Mixed-motive Markov games General sum Markov games with agents having individualised rewards.Agents are self-interested But still cooperation may help improve the social welfare,such as Prison Dilem
4、marr(s,a)=r1(s,a)=.,rn(s,a)r(s,a)=1niri(s,a)5Multi-Agent MDP All agents see the global state Individual action:selects and action State transitions:Shared team reward:Notes Equivalent to an MDP with a factored action space Learn a joint policy or factorized policy Example:Ball games(e.g.soccer,volle
5、yball),industry robots,etc(,):0,1(,):(a)(ai)6Dec-POMDP Dec-POMDP:decentralized partially observable MDP Agent:State:Action:Transition function:Reward:Observation:Observation function:=1,2,(,)(,)(,)7Example:StartCraft,Super Mario,traffic signal control,autonomous drivingChallenges in Multi-agent coop
6、erationShared challenges Non-stationarity and scalability in the number of agents Partial and noisy observations A large number of agents Coordinated exploration among agents Cooperating with Novel Partners(Generalisable Social Behavior)Capacity of Ad hoc team work Zero-shot human-ai ocordination8Ch