《英伟达Blackwell平台:推进生成式人工智能和加速计算.pdf》由会员分享,可在线阅读,更多相关《英伟达Blackwell平台:推进生成式人工智能和加速计算.pdf(33页珍藏版)》请在三个皮匠报告上搜索。
1、NVIDIA Blackwell Platform:Advancing Generative AI and Accelerated ComputingAjay Tirumala,Raymond Wong|NVIDIANVIDIA Blackwell Platform Data Center Scale Architecture Blackwell GPUNVIDIA Quasar Quantization System:Enabling Low Precision AINetworking for AI,End-to-End Performance and Power Scaling Conc
2、lusionsAgendaNVIDIA Blackwell Platform:Data Center Scale ArchitectureThe Full Stack Challenge for AI and Accelerated ComputingDOCABase CommandMagnum IOApplicationFrameworksHardwareAcceleration LibrariesPlatformSystemSoftwareCUDA-XCUDARTXDGXDGXHGXHGXOVXOVXRTXRTXNICDPUCPUGPUAGXAGXNVLINKSWITCHNVIDIA AI
3、NVIDIA OmniverseOver 400 NVIDIA CUDA-X Libraries Optimized libraries for each platform Targeting diverse application domains Built on our decades-long innovation Ever-expanding set of algorithmsBlackwell optimized to deliver maximum performanceSpeech and Translation AIComputer Vision and VLMRecommen
4、dersLLMScientific ComputingSearchDigital HumanPhysical AIData ProcessingNVIDIA Blackwell Platform BlackwellGPUGraceCPUNVSwitchChipBlueField-3ConnectX-7ConnectX-8Spectrum-4NVSwitch TrayCompute TrayQuantum-3Spectrum-X800Quantum-X800Blackwell GPUNVIDIA BlackwellTransformer EngineFP4/FP6 Tensor CoreAI S
5、uperchip208B Transistors5th Generation NVLinkScales to 576 GPUsSecure AIFull PerformanceEncryption and TEERAS Engine100%In-System Self-TestDecompression Engine800 GB/secNVIDIA Blackwell GPU Highest AI compute,memory bandwidth,and interconnect bandwidth ever in a single GPU Two reticle-limited GPUs m
6、erged into one:208B transistors in TSMC 4NP 20 PetaFLOPS FP4 AI 8 TB/s Memory Bandwidth|8-site HBM3e 1.8 TB/s Bidirectional NVLink bandwidth High-speed NVLink-C2C Link to Grace CPUHighest AI Compute in a Single GPU Build each GPU to reticle limit as intra-GPU communication provides:Highest communica