《超越 GPUs:为下一波 AI 提供动力.pdf》由会员分享,可在线阅读,更多相关《超越 GPUs:为下一波 AI 提供动力.pdf(17页珍藏版)》请在三个皮匠报告上搜索。
1、Anton McGonnellVP of ProductSept 10,2024Sept 10,2024Beyond GPUs:Powering the Next Wave of AIv 1.0Copyright 2024 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use Only2 2The Need for SpeedSpeed and Latency are important Speed and Latency are important criteria for Gen AI Developers criteri
2、a for Gen AI Developers Artificial AnalysisArtificial Analysis65%Building Agents Requires Many Building Agents Requires Many Models and Faster RealModels and Faster Real-Time Time InferenceInferenceFast TokensFast TokensThe faster,the better3 33 3The Fastest AI Inference on the Best ModelCopyright 2
3、024 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use Only5 55 5Copyright 2024 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use Only6 66 6405B is the Best Open-Source Model Copyright 2024 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use Only7 77 7Faster On All Scal
4、esSambaNova RDUsNvidia GPUsLlama 3.1 8B 16-bit1066106693Llama 3.1 70B 16-bit57057032Llama 3.1 405B 16-bit1321321410X Faster Than GPUs10X Faster Than GPUsTokens/Second/UserNo Number of GPUs Can No Number of GPUs Can Achieve RDU PerformanceAchieve RDU Performance8 88 8Copyright 2024 SambaNova Systems
5、Inc.|Confidential&Proprietary|Internal Use OnlyA Fundamental Shift of Models Deployment at ScaleTraditional GPU SystemsAll models in memory(Super low latency model switching)Individual model endpointsCopyright 2024 SambaNova Systems Inc.|Confidential&Proprietary|Internal Use OnlySN40L:The Best Chip
6、Designed for AI“Cerulean”Architecture-based Reconfigurable Dataflow Unit1.5 TB High Capacity Memory5nm TSMC5nm TSMC3 3-tier Dataflow Memorytier Dataflow Memory1,040 RDU Cores102B Transistors64 GB High Bandwidth Memory520 MB On-Chip Memory638 TFLOPS(bf16)Cerulean SN40L RDUGenerative AI Training and I