《AI的存储要求.pdf》由会员分享,可在线阅读,更多相关《AI的存储要求.pdf(19页珍藏版)》请在三个皮匠报告上搜索。
1、1|SNIA.All Rights Reserved.Storage Requirements for AITraining and CheckpointingJohn CardenteTechnical Staff,Dell Storage CTO Group2|SNIA.All Rights Reserved.The AI boom is driving incredible demand for GPUs leading to a need to maximize their utilizationGPUs Essential for AI Modern deep learning AI
2、 models require millions of matrix operations Matrix operations must be parallelized to make AI computationally feasible GPUs designed to do parallel matrix operations quickly and cost effectively.GPUs needed to make AI economically feasibleGPUs Expense and Scarce Companies are racing to build AI da
3、tacenters AI datacenters can contain 100s to 1000s of GPUs Demand for GPUs is surpassing supply GPUs are becoming costly and difficult to acquireMaximizing GPU Utilization Essential Demand,cost,and scarcity making GPUs the most valuable AI datacenter asset Companies must maximize the use of the GPUs
4、 they have Maximizing GPU utilization becoming the main AI datacenter design goal3|SNIA.All Rights Reserved.Maximizing GPU utilization requires balancing compute,network,and storage performanceServerGPUNICGPUNICGPUNICGPUNICGPUNICGPUNICGPUNICGPUNICNICNICServerServerServerServerServerServerGPU-to-GPUN
5、etworkStorage NetworkSubstantial“East-West”network for GPUs to exchange model gradients and weights during training.“North-South”network to read training data and write model artifacts Storage4|SNIA.All Rights Reserved.Storage plays an important role across entire AI lifecycleCritical CapabilitiesKe
6、y TasksData Preparation Scalable and performant storage to support transforming data for AI use Protecting valuable raw and derived training data setsTraining&Tuning Providing training data to keep expensive GPUs fully utilized Saving and restoring model checkpoints to protect training investmentsIn