1、Granthana Rangaswamy,Arushi SharmaMatt Bergeron,Herman Chin,Salina Dbritto Yuvin Weerasinghe,Abhishek Tiwari INSPECT Proactive Link Failure Detection ToolINSPECT Proactive Link Failure Detection ToolGranthana Rangaswamy,Arushi SharmaMatt Bergeron,Herman Chin,Salina Dbritto Yuvin Weerasinghe,Abhishek
2、 Tiwari ARTIFICIAL INTELLIGENCE(AI)This presentation introduces INSPECT A Parametric Analysis Tool that monitors high speed interconnect performance for Meta Datacenter AI racks.The following topics will be discussed:Why is Parametric Analysis needed?Parametric Analysis Implementation Data Collectio
3、n Data Analysis Introducing INSPECT INSPECT Use-Cases Call to ActionPreviewWhy is Parametric Analysis Needed?Metas AI clusters will have millions of SerDes operating at 112 Gbps,224Gbps and beyond.As next generation AI systems emerge,the impact of SerDes related issues is expected to grow due to shr
4、inking margins,increasing speeds and more complexity.To minimize unplanned resource unavailability and job restarts,its crucial to proactively identify SerDes related anomalies.Compute BankSwitch BankBackplaneSerDes and AI clusters High speed fabric channels are typically point to point.The channel
5、behaves like a low pass filter attenuating the high frequency components.Compensating for inter symbol interference(ISI)is critical and is mainly done in digital domain.Above picture is a sample of the DAC/ADC-based architecture and numerous methods that can be used to equalize a lossy channel.Pic C
6、ourtesy:Circuits and Systems for Signal Processing(CASSP)Lab.Parametric Analysis ImplementationParametric Data CollectionSerDes dataForward Error Correction(FEC)statistics Signal to Noise Ratio(SNR)Bit Error Rate(BER)Transmitter Equalization ParametersFeed Forward Equalization(FF