《AI for All:突破基础设施界限.pdf》由会员分享,可在线阅读,更多相关《AI for All:突破基础设施界限.pdf(16页珍藏版)》请在三个皮匠报告上搜索。
1、AI Hardware&SystemsaiandsystemsAI for All:Pushing Infra BoundariesManoj WadekarAI System Technologist,MetaAI Hardware&SystemsaiandsystemsAI-enabled creation toolsText-to-image generationsurrealist paintingLarge language models(LLMs)+173%Source:Meta for Business.Culture Rising:2023 Trends Report.2023
2、.Conversation topic growth on InstagramMeta AI is used for diverse casesSource:Meta for Business.Culture Rising:2023 Trends Report.2023.AI Hardware&SystemsaiandsystemsGenAI runs on Large Languages ModelsTotal Compute(PF/s)400Memory Capacity(TB)10Llama-2 65BCirca:2023Training Scale(GPUs)4kAI Hardware
3、&Systemsaiandsystemstowards Multi-ModalityLlama-NextCirca:202xAudioImagesVideosLlama-2Circa:2023Llama-3Circa:2024Text1x TokensText7-8x TokensAI Hardware&Systemsaiandsystems2024AI Cluster Size 202620282030Number of connected accelerators10 xAI Hardware&SystemsaiandsystemsAI Challenging DC Infra6AI Ha
4、rdware&SystemsaiandsystemsAI needs for DC InfraCPU-centric Scale-out applications Millions of small stateless applications Failure handling through redundancy Scale performance through large number of nodes Accelerator-centric AI Apps AI job spread across 1000s of GPUs Failure penalty of large job r
5、estart Performance scaling depends on all the components in the cluster(GPU/Accel,memory,network.)AI Hardware&SystemsaiandsystemsDiversity of AI system requirements Difficult to serve all classes of models with a single system design point AI use cases are pushing all the design points through softw
6、are/hardware co-design Need for innovation in all the design points:compute,network,memory,packaging,connectivity,cooling.AI Hardware&SystemsaiandsystemsMemory Requirements for AI9AI Hardware&SystemsaiandsystemsMemory Capacity and Bandwidth Accelerators and Models getting larger Model sizes are incr