1、Anil Agrawal,HW Systems Engineer,MetaHiral Patel,HW Systems Engineer,MetaAn Overview of CXL Memory Expansion Module Error Handling(RAS)Solution Implementation for Better ReliabilityIT Ecosystem:Server&StorageAn Overview of CXL Memory Expansion Module Error Handling(RAS)Solution Implementation for Be
2、tter ReliabilityAnil Agrawal,HW Systems Engineer,MetaHiral Patel,HW Systems Engineer,MetaCXL Type 3 Device OverviewFault domainsError CoverageCall to ActionAgendaCXL:Compute Express LinkCXL Type 3 Device-Memory ExpansionType 1ProcessorUse CasesDDRAcceleratorNICCacheCXL.ioCXL.cachePROTOCOLSCXLPGAS NI
3、CNIC atomicsDDRCaching Devices/AcceleratorsType 2ProcessorUse CasesDDRAcceleratorCacheCXL.ioCXL.cacheCXL.memPROTOCOLSCXLGP GPUDense ComputationDDRAccelerators with MemoryType 3ProcessorMemory BuffersUse CasesDDRCXL.ioCXL.memPROTOCOLSCXLMemory Capacity ExpansionMemory BW ExpansionAlternative Media Ty
4、pes(DDR,LPDDR,emerging)DDRMemory Buffers?MEmoryMEmoryMEmoryMEmoryProcessorCore(s)CXL DeviceCXL Event BuffersDevice Memory ControllerHost Memory ControllerCXL Root PortPhysical LayerCXL.ioCXL.memCXL LinkCXL Device PortPhysical LayerCXL.ioCXL.memDIMM(s)DIMM(s)CXL Type 3 Device-Fault DomainsCXL Memory
5、FaultsData Bit(Cell)ErrorRow,Bank,Device FailureMulti-device FailureBus Faults(e.g.CMD,ADDR)Connector(Pin)FailureXXCXL Link FaultsBit errors,Bus faults,Protocol errors,Device internal errorsCXL Type 3 Device-Memory Faults Coverage Fault TypePossible Causes(Examples)Fault Coverage(RAS Feature)Data Bi
6、t(Cell)ErrorHigh energy particle strike.Soft Error(SE).Transient error.ECC,Demand Scrub,Patrol ScrubData Bit(Cell)ErrorStuck-at.Persistent error.OS soft page offline,PPRRow FailureMarginality.Persistent errorOS soft page offline,PPR,ChipkillBank FailureMarginality.Persistent faultChipkill,OS soft pa