《会议8_Sunfish 和 HPC 工作负载管理器的集成深度强化学习代理可组合分解资源调度.pdf》由会员分享,可在线阅读,更多相关《会议8_Sunfish 和 HPC 工作负载管理器的集成深度强化学习代理可组合分解资源调度.pdf(26页珍藏版)》请在三个皮匠报告上搜索。
1、A DEEP REINFORCEMENT AGENT FOR RESOURCE SCHEDULING WITH SUNFISH IN A COMPOSABLE DISAGGREGATED INFRASTRUCTURECatherine Appleby,Computer Scientist,Applied Machine IntelligenceMichael Aguilar,EIT,PI,Senior Computer Scientist,HPC Research and DevelopmentSandia National Laboratories2024 OFA Virtual Works
2、hopSandia National Laboratories in a multimission laboratory managed and operated by National Technology&Engineering Solutions of Sandia,LLC,a wholly owned subsidiary of Honeywell International Inc.,for the U.S.Department of Energys National Nuclear Security Administration under contract DE-NA000352
3、5.SAND2024-04819PESandia National Laboratories is a multimission laboratory managed and operated by National Technology&Engineering Solutions of Sandia,LLC,a wholly owned subsidiary of Honeywell International Inc.,for the U.S.Department of Energys National Nuclear Security Administration under contr
4、act DE-NA0003525.A Deep Reinforcement Agent for Resource Scheduling with Sunfish in a Composable Disaggregated InfrastructureSAND2024-04819PECatherine Appleby,Computer Scientist,Applied Machine Intelligence,Sandia National LabsMichael Aguilar,EIT,PI,Senior Computer Scientist,HPC Research and Develop
5、ment,Sandia National LabsC h r i s t i a n P i n t o,P h i l C a y t o n,R u s s H e r r e l l,R i c h e l l e A h l v e r s,A l e x L o v e l l-T r o yOpenFabrics Alliance WorkshopApril,2024A Deep Reinforcement Agent for Resource Scheduling with Sunfish in a Composable Disaggregated Infrastructure1
6、.Why Composable Disaggregated Infrastructure(CDI)2.Design Considerations for a Composability Manager on a Large-Scale HPC system3.Sunfish4.Deep Reinforcement Learning for Resource Allocation5.Integrations6.Acknowledgements and Questions3Why Composable Disaggregated Infrastructure(CDI)Current Beowulf