《NVSwitch HotChips 2022 r5.pdf》由会员分享,可在线阅读,更多相关《NVSwitch HotChips 2022 r5.pdf(23页珍藏版)》请在三个皮匠报告上搜索。
1、THE NVLINK-NETWORK SWITCH:NVIDIAS SWITCH CHIP FOR HIGH COMMUNICATION-BANDWIDTH SUPERPODSALEXANDER ISHII AND RYAN WELLS,SYSTEMS ARCHITECTS4th-Generation NVSwitch Chip1.Brief History of NVLink2.4th-Generation New Features3.Chip DetailsHopper-Generation SuperPODs1.NVSwitch-Enabled Platforms2.NVLink Net
2、work SuperPODs3.SuperPOD PerformanceNVLINK MOTIVATIONSGPU Operational Characteristics Match NVLink Spec Thread-Block execution structure efficiently feeds parallelized NVLinkarchitecture NVLink-Port Interfaces match data-exchange semantics of L2 as closely as possibleFaster than PCIe 100Gbps-per-lan
3、e(NVLink4)vs 32Gbps-per-lane(PCIe Gen5)Multiple NVLinks can be“ganged”to realize higher aggregate lane countsLower Overheads than Traditional Networks Target system scales(256 Hopper GPUs)allow complex features(e.g.,end-to-end retry,adaptive routing,packet reordering)to be traded-off against increas
4、ed port counts Simplified Application/Presentation/Session-layer functionality allows all to be embedded directly in CUDA programs/driverBandwidth and GPU-Synergistic OperationSMSMSMSMThread BlockThread BlockThread BlockThread BlockSM to SMGPC0L2NVLink-Port InterfacesNVLINK GENERATIONSEvolution In-s
5、tep with GPUsx86PCIex86PCIex86PCIex86PCIe2016P100-NVLink14 NVLinks40GB/s eachx820Gbaud-NRZ160GB/s total2017V100-NVLink26 NVLinks50GB/s eachx825Gbaud-NRZ300GB/s total2020A100-NVLink312 NVLinks50GB/s eachx450Gbaud-NRZ600GB/s total2022H100-NVLink418 NVLinks50GB/s eachx250Gbaud-PAM4900GB/s totalListed b
6、andwidths are full-duplex(total of both directions).Whitepaper:http:/ SERVER GENERATIONSAny-to-Any Connectivity with NVSwitch2016DGX-1(P100)140GB/s Bisection BW40GB/s AllReduce BW2018DGX-2(V100)2.4TB/s Bisection BW75GB/s AllReduce BW2020DGX A1002.4TB/s Bisection BW150GB/s AllReduce BW2022DGX H1003.6