《An overview of CXL Memory expansion module Error Handling (RAS) Solution Implementation for better reliability.pdf》由会员分享,可在线阅读,更多相关《An overview of CXL Memory expansion module Error Handling (RAS) Solution Implementation for better reliability.pdf(14页珍藏版)》请在三个皮匠报告上搜索。
1、Anil Agrawal,HW Systems Engineer,MetaHiral Patel,HW Systems Engineer,MetaAn Overview of CXL Memory Expansion Module Error Handling(RAS)Solution Implementation for Better ReliabilityIT Ecosystem:Server&StorageAn Overview of CXL Memory Expansion Module Error Handling(RAS)Solution Implementation for Be
2、tter ReliabilityAnil Agrawal,HW Systems Engineer,MetaHiral Patel,HW Systems Engineer,MetaCXL Type 3 Device OverviewFault domainsError CoverageCall to ActionAgendaCXL:Compute Express LinkCXL Type 3 Device-Memory ExpansionType 1ProcessorUse CasesDDRAcceleratorNICCacheCXL.ioCXL.cachePROTOCOLSCXLPGAS NI
3、CNIC atomicsDDRCaching Devices/AcceleratorsType 2ProcessorUse CasesDDRAcceleratorCacheCXL.ioCXL.cacheCXL.memPROTOCOLSCXLGP GPUDense ComputationDDRAccelerators with MemoryType 3ProcessorMemory BuffersUse CasesDDRCXL.ioCXL.memPROTOCOLSCXLMemory Capacity ExpansionMemory BW ExpansionAlternative Media Ty
4、pes(DDR,LPDDR,emerging)DDRMemory Buffers?MEmoryMEmoryMEmoryMEmoryProcessorCore(s)CXL DeviceCXL Event BuffersDevice Memory ControllerHost Memory ControllerCXL Root PortPhysical LayerCXL.ioCXL.memCXL LinkCXL Device PortPhysical LayerCXL.ioCXL.memDIMM(s)DIMM(s)CXL Type 3 Device-Fault DomainsCXL Memory
5、FaultsData Bit(Cell)ErrorRow,Bank,Device FailureMulti-device FailureBus Faults(e.g.CMD,ADDR)Connector(Pin)FailureXXCXL Link FaultsBit errors,Bus faults,Protocol errors,Device internal errorsCXL Type 3 Device-Memory Faults Coverage Fault TypePossible Causes(Examples)Fault Coverage(RAS Feature)Data Bi
6、t(Cell)ErrorHigh energy particle strike.Soft Error(SE).Transient error.ECC,Demand Scrub,Patrol ScrubData Bit(Cell)ErrorStuck-at.Persistent error.OS soft page offline,PPRRow FailureMarginality.Persistent errorOS soft page offline,PPR,ChipkillBank FailureMarginality.Persistent faultChipkill,OS soft pa