1、Taylor Groves-LightmatterScaling-up Next-gen AI with OpticsOutlineThis presentation will cover:Problems in Scale-up Networking3D Integrated Optics with PassageApplication impact study(4.7T MOE)How do we get there togetherPassage3D stack of Photonics and Electronics integrates massive bandwidth,low p
2、ower,and datacenter reachExponential growth of model sizes and training workloads,faster than growth in device compute FLOPs,memory bandwidth,memory capacityRequires extensive use of network:GPU-to-GPU Parallelism:Tensor,Expert,Context,Pipeline,DataGPU-to-I/O:Checkpointing,contextcaching,prefillChai
3、n of thought,reasoning and retrieval augmented modelsAI ScalingAI Scaling2.3X5X4.5XVolta to Rubin 12XContinued Scaling Requires:Increase package size,number of logic and memory dies per packageIncreasing Power:at both the package and PodIncreasing Pod Size and Bandwidth:Tightly coupled Accelerator p
4、ackages with high bandwidth and low latencyProcess improvement not enough(15%per gen)But,conventional technologies have hit a wallChallenge:Package Area and ShorelineCommunication happens on the chip perimeterThere is not enough shorelineSerDes Challenge:Package Area and ShorelineScale UpScale Out10
5、24 GPUs100k+GPUsNetwork TypeNo.GPUsLatencyBandwidthPer GPUEnergyScale-out 100kmulti-hop2-10 us 1.6 Tbps16 pJ/bitScale-up 1024100-250 ns 12.8 Tbps 5 pJ/bitHow do we get the full GPU bandwidth to as many GPUs as possible in nanoseconds?Easy to scale and cheapCopper nearest neighborOptics in the edges
6、to loop backXPU needs just two high bandwidth ports per dimensionDimensions can correspond to a type of parallelismPipeline,Data,TensorMaps less well to things like MOETorus512 XPUs in 3D TorusMulti-Rail Single Layer of Switching Ease of programming:Supports all communication patterns with any proce