当前位置:首页 > 报告详情

共封装光器件(CPO)简介 - 架构、用例、操作和软件影响.pdf

上传人: 明**** 编号:1011502 2025-12-21 23页 1.05MB

1、Introduction to Co-Packaged Optics(CPO)Architecture,Use Cases,and Operational,Software ImplicationsNTT Innovative DevicesIntroduction to Co-Packaged Optics(CPO)Architecture,Use Cases,and Operational,Software ImplicationsWataru IshidaNTT Innovative DevicesCHIPLETS AND ADVANCED PACKAGING/PHOTONICSWhy

2、NVLink Uses Copper,Not OpticsPower Consumption“If we had to use optics,we would have had to use transceivers and retimer and those transceivers and retimer alone would have cost 20,000 watts”-Jensen Huang-GTC 2024Reliability“co-packaged optics,a promising new chip technology designed to reduce energ

3、y consumption,is not yet reliable enough for deployment in the companys flagship graphics processing units(GPUs).”-Jensen Huang-GTC 2025GPUNICGPUNICscale-upnetworkGPUNICGPUNICscale-upnetworkScale-up(NVLink/SUE)vs Scale-out(IB/Ethernet)networkscale-outnetworkBandwidth7.2Tbps(NVLinkGen5)vs 800Gbps(Con

4、nectX-8)Massive bandwidth needed for scale-upScalability72 vs more than 10kScale-up network size is limited by 1.5m copper reachcopperIn Synchronous Scale-Up,One Link Down Halts All051015202505000100001500020000Impact of Transceiver Failures(FIT 1000)on GPU trainingGPU CountRollback Overhead(%)Scale

5、-up networks use tightly synchronized collective communications(e.g.,AllReduce)Even one link failure breaks the operation -no failover or retryThis causes costly rollbacks in large-scale GPU trainingWhy Optical Modules Fail:Dust and LaserEnvironmental contamination(dust,debris)is the leading cause o

6、f transceiver failuresThe next major cause is internal laser failuresLaser Lifetime Is All About Temperature0100020003000400050006000020406080100120Estimated FIT vs Temperature(Arrhenius Model,normalized at 40C=100FIT)Estimated FITTemperature(C)Retimers,Heat,and the Limits of LPO at 200G/laneRetimer

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
根据报告的内容,全文主要内容概括如下: - **CPO(Co-Packaged Optics)技术**:一种新兴的芯片技术,旨在减少能耗,但尚未在高端GPU中可靠部署。 - **NVLink vs. 光学连接**:NVLink使用铜而非光学,因为光学连接的成本和可靠性问题。 - **网络扩展性**:NVLink Gen5提供7.2Tbps带宽,而ConnectX-8提供800Gbps,大规模扩展需要更多带宽。 - **可靠性挑战**:光学模块故障主要由尘埃和激光问题引起,激光寿命受温度影响。 - **CPO优势**:通过将光学引擎(OE)靠近ASIC,实现200G/lane以上的带宽,同时保持信号清洁和激光冷却。 - **集成模型**:CPO的集成模型包括预组装模块和单独组装模块,各有优缺点。 - **控制架构**:OE控制应如何与现有可插拔收发器相协调,以及是否需要CMIS注册映射支持。 - **效率提升**:CPO可以实现高达50%的功耗降低,并支持高效的直接液体冷却设计。 - **未来趋势**:CPO技术是不可避免的,需要关注实施细节,如集成模型、控制架构和操作准备。
未来数据中心的关键?" 性能对决!" 如何克服?
客服
商务合作
小程序
服务号
折叠