《思科(Cisco):2024构建端到端混合云基础设施蓝图(英文版)(139页).pdf》由会员分享,可在线阅读,更多相关《思科(Cisco):2024构建端到端混合云基础设施蓝图(英文版)(139页).pdf(139页珍藏版)》请在三个皮匠报告上搜索。
1、#CiscoLiveNick Geyer,Cisco Systems Inc.Eugene Minchenko,Cisco Systems Inc.BRKCOM-1008The Blueprint to Building End-To-End Hybrid-Cloud AI InfrastructureCisco Webex App Questions?Use Cisco Webex App to chat with the speaker after the sessionFind this session in the Cisco Live Mobile AppClick“Join the
2、 Discussion”Install the Webex App or go directly to the Webex spaceEnter messages/questions in the Webex spaceHowWebex spaces will be moderated by the speaker until June 7,2024.1234https:/ your personal notes here 2024 Cisco and/or its affiliates.All rights reserved.Cisco PublicBRKCOM-10082#CiscoLiv
3、e 2024 Cisco and/or its affiliates.All rights reserved.Cisco PublicAgendaIntroductionAI Fundamentals&Impacts on Infrastructure Design DecisionsTraining Infrastructure&Network Considerations for AI EnvironmentsInferencing,Fine-Tuning,&Compute InfrastructureSizing for InferencingAI Infrastructure Auto
4、mation&Cisco Validated DesignsFuture Trends and Industry Impacts of AI Infrastructure DemandsSummaryBRKCOM-10083 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveAI sets a new standard for Infrastructure4BRKCOM-1008SustainabilityHow can we harness all the data available to
5、us to simplify data center operations?Is our network AI-ready,with the ability to support data training and inferencing use cases?How are we addressing corporate and regulatory sustainability requirements in our data center design?only 13%of Data Center management leaders say their network can accom
6、modate AI computational needs.AIOpsScale and Performance 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveWhat we knowUse the Model|InferencingBuild the Model|TrainingOptimize the Model|Fine-tuning&RAGEvery organizations AI approach and needs are differentBRKCOM-10085 2024
7、Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveWhat were hearing from IT infra and operationsNeed consistency;avoid new islands of operationsOptimize for utilization and efficiency in many dimensionssupport multiple projects,leverages GPUs wisely,power and cooling needs,lifecy
8、cle managementComprehensive security protocols and measuresSupport rapidly-evolvingsoftware ecosystemManage cloud vs.on-premvs.hosted modelStraddle the training fine tuning inferencing repeat modelBRKCOM-10086 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveDevelop AI tool
9、s across the Cisco portfolio that help manage networks more effectivelyDelivering better resultsProviding intelligent guidanceProviding better securitySolving day-to-day challengesDevelop products that help accelerate YOUR adoption of AI for your business solutionsHigh-speed networking for AI traini
10、ng and inference clustersFlexible compute building blocks to build AI compute clustersUsing AI to maximize YOUR experience with Cisco productsCisco productsEnabling YOUR infrastructureYOUR infrastructure to support adoption of AI applicationsOnInCiscos 2-Fold AI Strategy&Our Focus Today7BRKCOM-1008A
11、I Fundamentals&Impacts on Infrastructure Design Decisions 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveAI:Level Setting and DefinitionComputer Computer ScienceScienceData Data ScienceScienceGenerative Adversarial Networks(GAN)Generative Adversarial Networks(GAN)Transfor
12、merTransformer-Based LMBased LMSupervised LearningSupervised LearningUnsupervised LearningUnsupervised LearningReinforcement LearningReinforcement LearningChatGPTChatGPT,LLaMA2,LLaMA2 etcetcBRKCOM-10089 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveMost GenAI projects ar
13、e here10AI Infrastructure RequirementsBRKCOM-1008Infrastructure RequirementsMarket ProfileHyperscaler and Large enterprise.Compute:Compute:1,000/10,000 GPU ClustersNetwork:Network:InfiniBand/EthernetLarge enterprise trainingCompute:Compute:100/1,000 GPU ClustersNetwork:Network:InfiniBand/EthernetLar
14、ge production Inferencing and AI model lifecycleCompute:Compute:4-8 GPU/NodeNetwork:Network:InfiniBand/EthernetSmaller Production Inferencing and AI model lifecycle;Small parameter trainingCompute:Compute:2-4 GPU/NodeNetwork:Network:EthernetInitial testing of pretrained modelsCompute:Compute:CPU Onl
15、y 2 GPU/NodeNetwork:Network:EthernetAI InnovatorAI AdopterExtensive Model CustomizationExtensive Model CustomizationCustom foundation models or extensive fine tuning$10M+infrastructure and resourcesMonths of developmentModerate Model CustomizationModerate Model CustomizationPre-trained model.RAG,P-t
16、uning and fine tuning$M+infrastructure and resourcesWeeks of developmentLow Model CustomizationLow Model CustomizationGen AI-as-a-serviceConsumption model,$per inferenceFastest time to marketAI-as-a-ServiceAI Infrastructure Requirements Spectrum 2024 Cisco and/or its affiliates.All rights reserved.C
17、isco Public#CiscoLiveTraining100X100XFine Tuning10X10XInferencing1X1XRelativeRelativeComputeComputeModel StageModel StageAnalogyAnalogyLearning the English language:Patterns of wordsLearning Biology:Words,terms,conceptsAnswering biology questions“What is the role of mitochondria?”RelativeRelativeUti
18、lizationUtilizationLLM Training vs.Fine Tuning vs.InferencingBRKCOM-100811 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveAI Maturity ModelAlign customer capabilities to technology investment Business use for AI not yet Business use for AI not yet defineddefinedData cultu
19、re to support AI Data culture to support AI not establishednot establishedExec agenda for AI not a Exec agenda for AI not a prioritypriorityNo AI processes or No AI processes or technologies in place for technologies in place for implementationimplementationNo investment in No investment in infrastr
20、ucture to support AI infrastructure to support AI workloadsworkloadsExploratoryFormulated short term AI Formulated short term AI strategy,proof of concept strategy,proof of concept scenariosscenariosExec,board support for AI,Exec,board support for AI,not across all lines of not across all lines of b
21、usiness.Small skillset of business.Small skillset of data science on staff data science on staff Data advancement with Data advancement with policy and degree of policy and degree of governance using point governance using point solutionssolutionsTrial AI adjacent Trial AI adjacent technologies with
22、 future technologies with future budget allocationbudget allocationExperimentalAI Strategy based on long AI Strategy based on long term roadmap for new term roadmap for new services,services,Framework defined to Framework defined to assure quality,format,assure quality,format,ownershipownershipData
23、available in Realtime Data available in Realtime for predictive analysisfor predictive analysisA centralized platform A centralized platform model with premodel with pre-integrated integrated AI capabilitiesAI capabilities.TransformPlanDefined AI standalone Defined AI standalone strategy,platform in
24、 place strategy,platform in place for quick wins,dedicated for quick wins,dedicated AI budgetAI budgetDecentralized support Decentralized support across staff,adequate across staff,adequate resources for early stagesresources for early stagesData gathering,analytics to Data gathering,analytics to ce
25、ntralized platform for centralized platform for variety of use casesvariety of use casesAI used for internal AI used for internal processes processes Billing Billing automation,segment automation,segment analysisanalysisBRKCOM-100812 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#
26、CiscoLiveOperationalizing AI/ML is not trivialEveryone in your organization plays a critical role in a complex processBRKCOM-100813 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveAI and Infrastructure PipelinesData PreparationTraining and CustomizationInferenceA selected
27、model learns from the training data set and builds relationships When prompted the model interprets new,unseen data and creates a response based on its trainingPreparing structured or unstructured data to create a training data set for the modelStorageData EngineerData ScientistDevOps|SecOps|Infrast
28、ructureComputeNetworkPromptResponseUserCompute intensive often with GPU acceleration and high-speed low latency networkLower compute requirements,GPU acceleration and network demands.Requirements can increase with scaleHigh storage requirement for ETL,data cleansing and optimized for AI retrieval BR
29、KCOM-100814 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveFramework and Common SoftwareAI/ML/DL FrameworkDataInferencing&Ingestion End PointData IngestData PreprocessingModel TrainingModel ValidationModel DeploymentIO IntensiveCompute/GPU Intensive(Training)Latency Sensi
30、tive(Inferencing)Data Engineering Data Visualization Feature Identification Weekly Retraining Model Management Model Ranking and Validation Production Deployment Establishing Feedback loopBRKCOM-100815 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveThe need for flexible A
31、I accelerationExample:AI-enabled Video Conferencing AppFor mixed workloads AI Acceleration(GPU)General Purpose Compute(CPU)Inference for real-time transcription or translationGenerative AI for meeting summarizationGroup chatReal-time video and audio streamsRecordingScreen share1 For the diversity of
32、 AI workloads2 0-2 1-4 4-6 4 6 4-1 0 K+G P U S Large Foundational TrainingFine TuningEdge InferencingData Ingest and Preparation Data CenterInferencingNetwork ConsiderationsDedicated FabricsShared FabricBRKCOM-100816 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveRevoluti
33、onizing AI workloads with 5thGen Intel Xeon Scalable ProcessorsKubernetes(Red Hat OpenShift)Kubernetes(Red Hat OpenShift)Enhanced System CapabilitiesLarger last-level cache for improved data localityHigher core frequency and faster memory with DDR5Intel AVX-512 for non-deep learning vector computati
34、onsSoftware Optimization Software suite of optimized open-source frameworks and tools Intel Xeon optimizations integrated into popular deep learning frameworksTCO Benefits and CompatibilityLower operational costs and a smaller environmental footprintAvailable on UCS X-Series,C240,C220 platformsHigh
35、Performance FeaturesIntel AMX with built-in AI accelerator in each coreAccelerated computations and reduced memory bandwidth pressureSignificant memory reductions with BF16/INT8BRKCOM-100817 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveWill Organizations Build Large Clu
36、sters with over 1000 GPUs?BRKCOM-100818 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveInference and Fine Tuninghttps:/ 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLive99%of customers will notbe building infrastructure to train their own LLMs BR
37、KCOM-100820 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveMany customers will build GPU clusters in their existing DCs for training use case specific”smaller”models,for fine tuning existing models,and to do inferencing or generative AI.BRKCOM-100821 2024 Cisco and/or its
38、 affiliates.All rights reserved.Cisco Public#CiscoLiveSummarizationDialogSentiment AnalysisText GenerationTranslationCode GenerationLLMs are highly effective in text summarizationtasks,in areas such as Academic Research,Business Report summary,Legal Analysis,Education materials,Emails,etcSome exampl
39、es of use cases for LLM chatbots include Customer Service,Personal Assistants,Tech Support,News and Information.Use LLMs to determine sentiment in areas such as comments,responses,content moderation,feedback,Market Research.Language translation is a key use-case for LLMs in areas suchTravel&Tourism,
40、Legal,EmergencyServices,Education,Real-time translation.Use of LLMs for content creation,marketing,documentation,Business communication,product documentation,etcLLMs can be used to increase coding productivity,withtools such as co-pilot in areas web development,data analysis,Education tools,etc.Samp
41、le Large Language Model use CasesBRKCOM-100822 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveWhat is the use case?Am I Training?Fine Tuning?Inferencing?RAG?How much data am I training on?How many models am I training on?Am I using Private Data?Who is responsible for Mana
42、gement?Enterprise Considerations to Define Requirements23CostAccuracyModel SizeUser Experience(Response Time)Data FidelityConcurrent User/InputsBRKCOM-1008 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveWhere can this be runEnterprises can choose where any model should be
43、 trained.Primarily there are two options:On PremisesOn PremisesAlways available for enterprise to useFlexibility for large enterprise to leverage same cluster for different functionsData is stored locally/Data SovereigntyPublic CloudsPublic CloudsProvides flexibility,pay for what you needCost will g
44、row with more data and training Challenge:Cost of egress data from the cloud,latency and lock in.BRKCOM-100824 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveSmart Cloud,notnot Cloud FirstOnOn-Premise Data CenterPremise Data CenterExample Hyperscaler Cost ModelCloud Provi
45、der Lamba Labs$1.99ph per(H100)GPU.Potential Annual Cost:$210Million PAExample On-Prem Cost ModelCoLo,Servers,Storage and NWPotential Annual Cost:$130 Million PA(3 Years)Quantitative Trading Firm,London,UK(12,000 GPUs)Quantitative Trading Firm,London,UK(12,000 GPUs)BRKCOM-100825 2024 Cisco and/or it
46、s affiliates.All rights reserved.Cisco Public#CiscoLiveBringing it all togetherA helicopter view of an AI Deployment JourneyEdgeCoreCustomer dataAI EnterpriseCisco validated designsDeploy application for inferencingPeriodic model updates and infrastructure scaling as required Prep and inject data to
47、 fine tune the modelInstall common AI models from industry repositoriesDeploy AI-ready infrastructureNGC54321Cisco IntersightFlexPod AIFlashStack AIHCI AIBRKCOM-100826AI Training Infrastructure&Network Considerations for AI Environments 2024 Cisco and/or its affiliates.All rights reserved.Cisco Publ
48、ic#CiscoLiveTraining DataDatasetModelFunction(weighted parameters)Breaking-down Machine Learning The Process AlgorithmRetraining(as required)Training InfrastructureTrainingInference InfrastructureNew/Live DataInferenceOutputPredictiveGenerativeFeedbackFeedbackRetraining(as required)DecisionRecommend
49、ationTrendClassificationRecommendationBRKCOM-100828 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveArchitecting an AI/ML training cluster-ConsiderationsAI models and applications consume massive amounts of data,and the data is constantly growing So,there are many challeng
50、es for the infrastructure to grow at the same scale as the dataScalabilityScalabilityHigh BandwidthHigh BandwidthLow LatencyLow LatencyCongestion ManagementCongestion ManagementNo traffic dropsNo traffic dropsJOB JOB COMPLETION COMPLETION TIMETIMETrainingTraining&RetrainingRetrainingInferencingInfer
51、encingFeedbackFeedbackAI/MLLifecycleBRKCOM-100829 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveNetworkBandwidthNetworkLatencySensitivityComputeMemoryCapacityMemoryBandwidthLLM TrainigRanking TrainingNetworkBandwidthNetworkLatencySensitivityComputeMemoryCapacityMemoryBan
52、dwidthLLM InferenceRanking InferenceTraining and Inference Network BehaviorsBRKCOM-100830 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveAI Networking:RDMARemote Direct Memory AccessBenefits of RDMALow latency and CPU overheadHigh network utilizationEfficient data transfe
53、rSupported by all major operating systemsZero Copy NetworkingZero Copy NetworkingBRKCOM-100831 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveRemote Direct Memory Access(RDMA).InfiniBandRDMA allows AI/ML nodes to exchange data over a network by accessing the bytes directl
54、y in the RAMLatency is very low as CPU and kernel can be bypassedRDMA data was natively exchanged over InfiniBand fabricsLater,RoCEv2(RDMA over Converged Ethernet)protocol allowed the exchange over Ethernet fabricsNon-blocking,Lossless Ethernet transport,requires ECN and PFCSystem MemoryGPU MemorySy
55、stem MemoryCPUGPUCPURDMA NICRDMA NICPCIePCIeRoCEv2GPU MemoryGPUDirect Memory to NIC communicationBRKCOM-100832 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveAI Networking:RoCE v1/RoCE v2 Protocol StacksRDMA Over Converged EthernetRoCE v1RoCE v1Ethernet link layer protoco
56、lDedicated ether type(0 x8915)Can be used with or without VLAN tagRoCE v2RoCE v2Internet layer protocol can be routedDedicated UDP port(4791)UDP source port field is used to carry an opaque flow-identifierSoftwareHardwareBRKCOM-100833 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public
57、#CiscoLiveRoCEv2:PFC and ECN Together for Lossless TransportHow does it work?ECN is a layer 3 congestion avoidance protocolECN is an IP Layer Notification System allowing switches to indirectly inform the sources to slow down the throughput.WRED thresholds are set low in no-drop queue.Signal early f
58、or congestion with CNPs,gives enough time for end points to react.PFC is a layer 2 congestion avoidance protocol PFC thresholds are set higher than ECNOversubscription buffers can be filled quickly without giving time for ECN to react.PFC will react and mitigate congestion.BRKCOM-100834 2024 Cisco a
59、nd/or its affiliates.All rights reserved.Cisco Public#CiscoLiveData Center Quantized Congestion NotificationIP ECN or PFC cannot alone provide a valid Congestion Management frameworkIP ECN signalling might take too long to relieve the congestionPFC can could introduce other problems like Head Of Lin
60、e Blocking and unfairness for the flowsThe two of them together provide the desired result of having lossless RDMA communications across Ethernet networks(this is called DCQCNDCQCN)The requirements are:Ethernet devices compatible with both techniquesProper configurations appliedBRKCOM-100835 2024 Ci
61、sco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveAI/ML Flow Characteristics(Training Focused)BRKCOM-100836 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveWith the granular visibility provided by Cisco Nexus Dashboard Insights the network administrator c
62、an observe drops Tune thresholds until congestion hot spots clear and packet drops stop in normal traffic conditionsThis is the first and most important step to ensure that the AI/ML network will cope with regular traffic congestion occurrences effectivelyBringing Visibility to AI workloadsBRKCOM-10
63、0837 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveMonitoring These EventsDCQCN leaves the fabric congestion management in a self healing statusStill it is important to keep it under control:Frequently congested links can be discovered QoS policies can be tweaked with a
64、direct feedback from the monitoring toolsNexus ASICs can stream these metrics directly to Nexus Dashboard InsightsNDI will then collect,aggregate and visualize them all to provide insights to the operations teamBRKCOM-100838 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLive
65、Nexus Dashboard Insights Congestion VisibilityBRKCOM-100839 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveDesigning a Network for AI success40BRKCOM-1008Dedicated NetworkNon-Blocking,Lossless FabricHigh ThroughputNo OversubscriptionLow Jitter,Low LatencyClos TopologyVisi
66、bility is key!Visibility is key!Stalled/Idle JobOptimize job completion timeOn average 25%of Jobs Fail Expensive,wasted resources/time 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveDo I need a backend network?41BRKCOM-1008Backend NetworkFrontend NetworkLossless|High-Thro
67、ughput|Low Jitter|Low-Latency10G|25G|50G|100G|400G|800GFPGADPUGPUComputeStorageNexus Dashboard 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveCisco Nexus HyperFabricin partnership with NVIDIACisco 6000 Series SwitchesCisco Nexus HyperFabricOn-prem AI InfrastructureAI Clus
68、terA solution that will enable you to spend time on AI innovationnot on IT.Democratize AI InfrastructureUnified stackIncluding NVAIEHigh-performanceEthernetAI-native operational modelCloud managed operationsVisibility into full stack AIPods of plug-and-play data center fabricsBuilt on Cisco Silicon
69、One and Optics innovationsNVIDIAGPUNVIDIADPU/NICBlueField-3ServersVAST Storage42BRKCOM-1008 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveA Simplified Backend Network for AI Environments43Cisco Nexus HyperFabric AI clusterBuild new data centers Ease-of-use for IT general
70、ists Start small and grow fabrics(1+)Self-service for fabric tenants AI/ML/HPC fabricsSimple to deploy and manage Scalable AI-ready Ethernet fabric Manage multiple customer data centers Managed from cloud Remote hands assistance Downsize data centertooling footprint Data center anywhere with cloud c
71、ontroller Planning/design tools to help build rollout Extend data centers Plug-and-play deployment Easily expand to data center edge/coloSmall fabrics of 1-2 switches APICisco Nexus HyperFabric Use CasesCloud SaaS Controller Single global UI for all owned fabrics Single global API endpoint Underlay
72、and lifecycle automation BRKCOM-1008 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveBuilding High-performance AI/ML Ethernet FabricsMaximizing customer choice and optionsCisco Nexus HyperFabricAI ClusterNexus 9000 withNexus 9000 withNexus DashboardNexus DashboardCisco 800
73、0Cisco 8000CustomizableCustomizable SolutionSolutionBYO Management,BYO Management,SONiCSONiC/BYO/BYO-NOSNOSHyperscalersCisco validated SONiC or community sourcedCustomer assembled&operatedECMP*&Scheduled Ethernet*optionsGreenfield deployments400G-800GSilicon One switchesEnterprise/Public Sector/Comm
74、ercial Service ProvidersTier2 Web/AI aaSTier2 Web/AI aaSCisco Cloud Managed as a Service,Cisco Cloud Managed as a Service,Full StackFull StackPrivate Cloud Managed,Private Cloud Managed,InteroperableInteroperableEnterprise/Public Sector/Commercial Turnkey AI pod Nexus HyperFabric managed servers(BMC
75、),NICs,and switches Converged ethernet infra Greenfield deployments only 400G-800GCisco 6000(Silicon One)switchesGeneral purpose AI multi-pod fabricSimplified network operations with Nexus DashboardCVDs for converged ethernet infraGreenfield&brownfield deployments100G-400G-800GNexus(Cloud Scale&Sili
76、con One)switches*Shipping*FCS Target CY25*FCS Target 2H 2024*ShippingBRKCOM-100844 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveBuilding an AI Workload Pod for TrainingBackend network for training32 rack servers split across 2 racksScale up to 30 pods per spine 960 serv
77、ers 1920 GPUsFull RoCEv2 support on ComputeBackend SpinesBackend LeafsFront End Compute FabricClustered GPUs with Direct Memory AccessRoCE Enabled NICs100gbps400gbpsFrontend TORBRKCOM-100845 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLivePerformance TestingLinear Scalabil
78、ity demonstrated through benchmark tests on real life model simulation,showcasing consistent performance even with varying dataset sizes.Weather Simulation(MiniWeather)Nuclear Engineering(Minisweep)Cosmology(High Performance Geometric Multigrid)CVD LinkAccelerated DeploymentCentralized management an
79、d automationNVIDIA HPC-X Software Toolkit Setup&ConfigurationNetApp DataOps Toolkit to help developers,data scientists to perform numerous data management tasksCisco C240 M7with MLNX-CX72x200GCisco NexusN9K-C9364D-GX2ANetApp A800Cisco Nexus 1/10G-copper mgmt.switch100GbE100GbE10GbE Copper10GbE Coppe
80、rUCSC245 M6SXNVME SSD800 GBNVMEHWH800XNVME SSD800 GBNVMEHWH8002 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX821765432 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX
81、2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX1413121110920191817161524232221XNVME SSD800 GBNVMEHWH800XNVME SSD800 G
82、BNVMEHWH800UCSC245 M6SXNVME SSD800 GBNVMEHWH800XNVME SSD800 GBNVMEHWH8002 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX821765432 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6G
83、NSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX1413121110920191817161524232221XNVME SSD800 GBNVMEHWH800XNVM
84、E SSD800 GBNVMEHWH800UCSC245 M6SXNVME SSD800 GBNVMEHWH800XNVME SSD800 GBNVMEHWH8002 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX821765432 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 T
85、BHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX1413121110920191817161524232221XNVME SSD800 GBNVME
86、HWH800XNVME SSD800 GBNVMEHWH800UCSC245 M6SXNVME SSD800 GBNVMEHWH800XNVME SSD800 GBNVMEHWH8002 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX821765432 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSA
87、TA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX1413121110920191817161524232221XNVME SSD
88、800 GBNVMEHWH800XNVME SSD800 GBNVMEHWH800UCSC245 M6SXNVME SSD800 GBNVMEHWH800XNVME SSD800 GBNVMEHWH8002 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX821765432 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD
89、2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX141312111092019181716152423222
90、1XNVME SSD800 GBNVMEHWH800XNVME SSD800 GBNVMEHWH800UCSC245 M6SXNVME SSD800 GBNVMEHWH800XNVME SSD800 GBNVMEHWH8002 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX821765432 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA
91、HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX14131211109201918171
92、61524232221XNVME SSD800 GBNVMEHWH800XNVME SSD800 GBNVMEHWH800UCSC245 M6SXNVME SSD800 GBNVMEHWH800XNVME SSD800 GBNVMEHWH8002 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX821765432 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7
93、KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX1413121110
94、920191817161524232221XNVME SSD800 GBNVMEHWH800XNVME SSD800 GBNVMEHWH800UCSC245 M6SXNVME SSD800 GBNVMEHWH800XNVME SSD800 GBNVMEHWH8002 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX821765432 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDD
95、X2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX2 TBHD2T7KL6GNSATA HDDX
96、1413121110920191817161524232221XNVME SSD800 GBNVMEHWH800XNVME SSD800 GBNVMEHWH8001234LSCiscoNexusN9K-C9336C-FX2STSBCNENV2728252623242122192017181516131411129107856341231322930333435361234LSCiscoNexusN9K-C9336C-FX2STSBCNENV272825262324212219201718151613141112910785634123132293033343536535451524950123
97、45678910 1112 1314 15161718 1920 2122 2324 2526 2728 2930 31323334 3536 3738 3940 4142 4344 4546 474853545152495012345678910 1112 1314 15161718 1920 2122 2324 2526 2728 2930 31323334 3536 3738 3940 4142 4344 4546 4748Cisco UCS C-Series Rack Server and NetApp AFF A400 storage array connected to Cisco
98、 Nexus 93600CD-GX leaf switch with layer 2 configuration for a single rack testingBack-end Lossless Non-blocking 400G NetworkFront-end NetworkGPU Intensive Applications converged infrastructure exampleBRKCOM-100846 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveThe Bluepr
99、int For TodayBuilt to accommodate 1024 GPUs along with storage devices47BRKCOM-1008Inferencing,Fine-Tuning,&Compute Infrastructure 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveModel Inferencing Use CasesProductization PhaseFace recognition and computer visionSelf-drivin
100、g vehiclesConversational agentsMachine translationContent generation Images/Video/VoiceRecommender systemsAnalysis of medical imagesBRKCOM-100849 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveLarge Language Models(LLMs)Limitations for enterprise useHallucinationSourcesOu
101、tdatedCustomizeUpdateWhere did the information come from?Models maybe stale as quickly as it is releasedCannot personalize or use more current data Cannot edit the model to remove/change dataCan make stuff up,always has an answerBRKCOM-100850 2024 Cisco and/or its affiliates.All rights reserved.Cisc
102、o Public#CiscoLiveTraining LLMsResource-Intensive and costly Large Language Models are Large Language Models are Pre-trained on a large corpus of publicly available unlabeled data Requires periodic re-training to stay up to dateTraining takes 1000s of GPUs over a span of monthsGPT-3 Large 175B param
103、etersLlama 65B parametersTraining Set Tokens:300BVocabulary Size:50kNumber of GPUs:10k x V100Training Time:One MonthTraining Set Tokens:1-1.3TVocabulary Size:32kNumber of GPUs:2048 x A100Training Time:21 DaysBuilding LLMs from scratch is cost-prohibitive for the average EnterpriseBRKCOM-100851 2024
104、Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveUse Foundational ModelsStarting point for most EnterprisesCustomize or integrate directly for inferencing in enterprise applicationsFoundational models(FM)BERTGPTLlamaMistral AI Stable DiffusionCohereClaudeBLOOMPrePre-trained,trai
105、ned,generalgeneral-purpose purpose models models DownloadLLMs 100B Other Generative 1B Predictive 30 per second)User experience combination of low latency,throughput and accuracy68BRKCOM-1008 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveLLM Inference Estimating MemoryHo
106、w much memory does my model need?Model Memory:13 billion x 2Bytes/parameter=26GBExample:Llama2 13B parameters Model MemoryPrecision in Bytes x#of parameters(P)For a given precision:FP32,FP16,TF16 69BRKCOM-1008 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveLLM Inference E
107、stimating MemoryHow much memory does my model need?Memory(Inference):26GB+20%overhead=31.2GBExample:Llama2 13B parameters Memory(Inference)Model Memory+20%overheadFor a given precision:FP32,FP16,TF16 70BRKCOM-1008 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveGPU ModelGP
108、U ModelMemory(GB)Memory(GB)Memory Bandwidth Memory Bandwidth(GB/s)(GB/s)FP16 Tensor Core FP16 Tensor Core(TFLOP/s)(TFLOP/s)H100802000756A100801935312L40s48864362L424300121LLM Inference-GPU EstimationWhich GPU do I use?Based on model memory,number of GPUs needed to load a 13B parameter model=any GPU
109、with at least 32 GB Similarly,a 70B parameter model,would require:2 A100-80 GPUs(168GB/80GB)71BRKCOM-1008 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveLLM Inference-MethodologyHow many GPUs do I need for inference?For a given model and inferencing runtime,start with eno
110、ugh GPUs to load the model based on memory sizing Vary concurrent inference requests and measure throughput and latency metrics for a given token length(context)Vary batch sizes and measure throughput and latency-maximizes compute for non-RT use cases Add a second GPU and repeat concurrent inference
111、 request and batch size tests(as needed)Monitor GPU compute and memory utilization,along with inferencing performance,across all tests Select a configuration that optimally balances latency,throughput and costSample tool:https:/ Performance Comparison with Nvidia A100Llama 2 7B|NV-GPT-8B-Chat-4k-SFT
112、|Llama2 13BLlama 2 Llama 2 7B 7B Input tokens Length:128 and output Tokens Length:20Batch SizeGPUs1 112 214 418 811 122 224 428 82Average Latency(ms)Average Throughput(sentences/s)241.14.1249.98.0280.214.3336.423.8197.15.1204.19.8230.217.4312.625.5Optimized price to performance ratio with FLASHSTACK
113、 AIAI Infrastructure Automation 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLivePolicy based compute to scale operationsBRKCOM-100875 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveDev and DevOps teamsInfra and OpsAccelerate CI/CD processes and
114、extend infrastructure as code(IaC)workflows by integrating Intersight into your DevOps toolchainsSimplify lifecycle management with integrated infrastructure and workload orchestration toolsData centerEdgeColoIntegrate with DevOps to accelerate AI application delivery 2024 Cisco and/or its affiliate
115、s.All rights reserved.Cisco Public#CiscoLiveDay 0/2:Operations(Full Stack Bare Metal)Operational Challenges Lack of visibility across multiple infra and cluster deployments Difficulty gathering compliance and resource audits Capacity planning and inventory expansionK8s AdminHybrid CloudAdminClusterC
116、lusterClusterAdd/remove Bare Metal NodesCluster Life-cycle Cluster Upgrade/downgrade Observability Inventory(firmware,network,storage)Field alerts&alarms,security advisoriesTelemetry,metrics and actionable insightsHardware CompatibilityRBAC,Multi-tenantOptimizationSecuritySupportedAlertsTelemetry In
117、fra Health,Alerts,Alarms,SecurityInfra capacity management for expansionSaaSSaaSOnOn-PremPremEdge Site 1Edge Site 2Edge Site nIntersight Private Appliance-Optional(Air-Gap Use Case)UCSUCS-X at the X at the Edge sitesEdge sitesHybrid Cloud ConsoleHybrid Cloud ConsoleBRKCOM-100877 2024 Cisco and/or it
118、s affiliates.All rights reserved.Cisco Public#CiscoLiveOne-click Openshift cluster deploymentBRKCOM-100878 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveDeploy AI-ready infrastructureDeploy Red Hat OpenShift and other resourcesPipelines Artifacts RepoImage RegistryModel
119、RepoDeploy two projects*in Openshift AI3Model delivery pipelineApplication inferencing pipeline*Workbenches/namespaces*For demo purposesLoad LLM from Hugging Face and explore/evaluate5Save and upload model to Model Repo6Deploy model serving runtime Deploy LLM for inferencing7Deploy Vector Database f
120、or RAGAttu open-source GUIUnstructured DataIngest Enterprisedata to vector database9AI project deployment workflow exampleCoreEdgeCisco Intersight108421Deploy GUI front-end for Q/A Chatbot 11Deploy Q/A Chatbot App using Enterprise Knowledge BaseDeploy Enterprise Q/A Chatbot for inferencing12BRKCOM-1
121、00879 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveModel Delivery LifecycleStreamline and scale using MLOpsIteratePrepare DataGathering&preparing data for AIExperiment/Tune modelApply scientific rigor to understand data and build/customize modelServe and integrate with
122、AppModel available for production inferencingMonitor/Maintain modelTrack model quality,metrics and driftScalabilityGovernanceEfficiencyReliabilityAdaptabilityPace of AI/ML technology shifts require a strong foundation to adaptBRKCOM-100880Cisco Validated Designs(CVDs)for AI 2024 Cisco and/or its aff
123、iliates.All rights reserved.Cisco Public#CiscoLiveCisco Validated Designs(CVD)Cisco unified computing systemReady to Go solutions for faster time to valueAccelerateAccelerateReduce risk with tested architectures for standardized,repeatable deploymentsLess riskLess riskCVDs provide everything from sy
124、stem designs to implementation guides,and ansible automationExpert GuidanceExpert GuidanceSingle point of contact for solution.Cisco will coordinate with partners as needed to resolve issuesCisco TAC supportCisco TAC supportBRKCOM-100883Large LanguageModels(LLMs)NVIDIA AIEnterpriseDiscover the power
125、 of Large Language Model(LLM)inferencing as it seamlessly processes and generates human-like text in real-time.Gen AIHuggingFaceText-to-TextRetrieval Augmented Generation(RAG)NVIDIA AIEnterpriseExperience an enterprise-grade Retrieval Augmented Generation(RAG)chatbot delivering responses tailored to
126、 your enterprise-specific content.Cisco Compute CoverageFLEXPODFLASHSTACKNUTANIXUCS ONLYImage AnalysisIntel AMXDelve into the realm of Image Analysis,where advanced algorithms interpret and understand visual data with astonishing accuracy.Predictive AIKaggleKerasNeural NetworkImage-to-TextImage Synt
127、hesisNVIDIA AIEnterpriseImmerse yourself in the innovative world of text-to-image synthesis,where vivid images are conjured from descriptive language or existing photos.Gen AIHuggingFaceDiffusionModelsText-to-ImageVector DatabaseMLOpsRed Hat OpenShift AI Explore the cutting-edge of MLOps,where the e
128、fficiency of machine learning workflows meets the rigor of operational excellence.Gen AILangChainMistralvLLMNVIDIA NIMText-to-TextGen AINVIDIA TRT-LLMExplore Cisco validated AI demos showcasing a broad spectrum of AI technologies and practices ready to transform your business 2024 Cisco and/or its a
129、ffiliates.All rights reserved.Cisco Public#CiscoLiveFlexPod for Generative AI InferencingOptimized for AIOptimized for AIComprehensive suite of AI tools and frameworks with NVIDIA AI Enterprise that support optimizationfor NVIDIA GPUValidated NVIDIA NeMo with TRT-LLM that accelerates inference perfo
130、rmance of LLMs on NVIDIA GPUsMetrics dashboard for insights into cluster and GPU performance and behaviorAccelerated DeploymentDeployment validation of popular Inferencing Servers and AI models such as Stable Diffusion and Llama 2 LLMs with diverse model serving optionsAutomated deployment with Ansi
131、ble playbookAI at ScaleAI at ScaleScale discretely with future-ready and modular designBRKCOM-100885 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveFlashStack for Generative AI|Inferencing with LLMsFlashStackFlashStack InfrastructureInfrastructureCisco UCS X210 CCisco UCS
132、 X210 Computeompute nodesnodesCisco UCS X440p PCIe nodesCisco UCS X440p PCIe nodesPure Storage FlashBlade or FlashArrayPure Storage FlashBlade or FlashArrayNVIDIA GPU acceleratorsNVIDIA GPU acceleratorsVirtualizationVirtualizationVMware vSphereVMware vSphereRed Hat OpenShiftRed Hat OpenShiftControl
133、plane and worker virtual machinesControl plane and worker virtual machinesPortworxPortworx EnterpriseEnterpriseModel repository and storage for applicationsModel repository and storage for applicationsNVIDIA AI EnterpriseNVIDIA AI EnterpriseAdvanced AI platform with advanced integrationAdvanced AI p
134、latform with advanced integrationInferencing ServersInferencing ServersNVIDIA Triton,Text Generation Inference,NVIDIA Triton,Text Generation Inference,PyTorchPyTorchGenerative AI ModelsGenerative AI ModelsNemo GPT,Llama,Stable DiffusionNemo GPT,Llama,Stable DiffusionCisco Cisco IntersightIntersightF
135、oundational Architecture for Gen AIValidated NVIDIA NeMo Inference with TensorRT-LLM that accelerates inference performance of LLMs on NVIDIA GPUsValidated models using Text Generation Inference server from Hugging FaceMetrics dashboard for insights into infrastructure,cluster and GPU performance an
136、d behaviorConsistent PerformanceConsistent average latency and ThroughputBetter price to performance ratioSimplify and Accelerate Model DeploymentExtensive breadth of validation of AI models such as GPT,Stable Diffusion and Llama 2 LLMs with diverse model serving optionsAutomated deployment with Ans
137、ible playbookBRKCOM-100886 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveCisco and Nutanix partner for AI:The Power of TwoProven platformsCVDs and automated playbooksSecure foundationEnd-to-end resiliencyAI EverywhereExisting apps and new experiencesChat GPT-in-a-boxCisc
138、o Computeand NetworkingCisco IntersightNutanixCloud PlatformBRKCOM-100887 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveCisco Compute Hyperconverged GPT-in-a-BoxDeploy hybrid-cloud AI-ready clusters with Cisco Validated Designs(CVDs)CiscoIntersightNutanix AOSAHV Virtuali
139、zationKubernetesFoundation ModelsGenerative AI AppsNutanix Files Storage and Object StoragePyTorchKubeflowGPUGPU-enabledenabledOptimized Optimized GenAIGenAI infrastructureinfrastructureStreamlined governance with enterprise Streamlined governance with enterprise softwaresoftwareSustainable energy u
140、seSustainable energy useHybrid cloud is complexHybrid cloud is complexBusiness ChallengesRisk reduction&fast time to marketRisk reduction&fast time to marketStreamline operationsStreamline operationsProven performanceProven performanceProtect valuable dataProtect valuable dataSimplified hybrid cloud
141、 operationsSimplified hybrid cloud operationsBenefitsBRKCOM-100888 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveCVDs to simplify end-to-end AI infrastructureCVD playbooks supporting common AI modelsCVDs for simplified AI-ready infrastructureCVD blueprint for AI networks
142、N E W E X P A N D E D R O A D M A PComputer vision models(ResNet,EfficientNet,YOLO)Large language models(GPT3,BERT,T5)Generative models(GANs,VAEs)NVIDIA AIEnterpriseRed Hat OpenShift AIGPT-in-a-boxon Nutanix HyperconvergedGen-AI with Cloudera Data PlatformBest performing AI/ML networks,focus on appl
143、ication performanceIntelligent buffer,low latency,telemetryand RoCEv2Dynamiccongestion avoidanceOne IP network for both front-endand back-endAutomation for day-2 operationsValidated designsfor network and ecosystem partnersNGCDeveloper Cloud1 2 3BRKCOM-100889Future Trends and Industry Impacts of AI
144、Infrastructure Demands 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveWith a new kind of data centerAI drives a better futureFuture-readySimplified Cloud operationsSustainability&Power EfficiencyEdge Inferencing and fleet managementArtificial intelligenceMore programmabil
145、ity and controlMore efficient performance for new workloadsLess costly to build,deploy,and operateLess operational complexitySimple,sustainable,future-readyBRKCOM-100891 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLivePower&Cooling TrendsCPU,GPU and Switch ASIC power requi
146、rements moving from 350W TDP today to 400W+and far beyond in the coming year(s)Traditional fan cooling consumes lot of power and less efficient as system power increasesPassive cooling is approaching its limitationLiquid cooling technology to address future cooling requirement with significantly bet
147、ter cooling efficiency&reduced noise levelsClosed loop liquid cooling provides a retrofit solutionFuture Data Center designs will need to provision for Rack level liquid cooling infrastructure(with external Cooling Distribution Unit-CDU)CPU,500,21%Memory,480,20%Storage,360,15%Misc,150,7%GPU,600,25%F
148、AN,240,10%2U Server Power Total 2400 CPUMemoryStorageMiscPCIe-IOGPUFANBRKCOM-100892 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveLiquid Cooling TechnologiesSingle-Phase Immersion-PAO6:Zero GWP,cheaper,lower cooling capability-FC-40:Better cooling,higher GWP-Material com
149、patibilityTwo-Phase Immersion Single-Phase Cold Plate Two-Phase Cold Plate-Better cooling,FC-3284-Heatsink design is boiling enhancement coating-Material compatibility-High GWP-Better cooling,PG25-Zero GWP-Leaks can be catastrophic-Requires parallel connections to avoid pre-heat-Better cooling,R134a
150、,Novec7000 or other refrigerant-Enables highly dense systems,series connections ok-Leaks not catastrophicBRKCOM-100893 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLive Alternate protocol that runs across the standard PCIe physical layer Uses a flexible processor port that
151、can auto-negotiate to either the standard PCI transaction protocol or alternate CXL transaction protocols First generation CXL aligns to 32 Gbps PCIe 5.0 CXL usage expected to be a key driver for an aggressive timeline to PCIe 6.0 Allows you to build fungible platformsCompute Express Link(CXL)Disagg
152、regation TechnologiesBRKCOM-100894 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveUCS X-Fabric Technology For DisaggregationOpen,modular design enables compute and accelerator node connectivityUCS XUCS X-Fabric TechnologyFabric TechnologyInternal Fabric interconnects node
153、sIndustry standard PCIe,CXL TrafficUpgrade to future generationsChassis FrontChassis FrontChassis RearChassis RearComputeGPU NodeExpandability to address new use cases in future(memory&storage nodes)No midplane nor cables=easy upgradesOpen standards:PCIe 4/5/6,CXL*Not just another PCIe switchCompute
154、ComputeComputeGPU NodeComputeComputeCXL will evolve out of PCIe for next generation speeds,cache coherency,shared-IO,memoryBRKCOM-100895 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveExpanding Ecosystem of Viable GPU OptionsNative RoCEScaleup&outNative RoCEScaleup&out(5n
155、m)Native RoCEScaleup&out2024(7nm)Available via:-HLS-1 Server(x8)-SMC Server(x8)-SDSC-Public Cloud AWS:EC2Next GenerationAI Accelerator:Falcon Shores 12025Available via:-HLS-Gaudi2 Server(x8)-SMC Server(x8)-Aivres/IEI Server(x8)-Intel Dev CloudAvailable NowAvailable Now1H CY 2024Native RoCEScaleup&ou
156、tIn Development96BRKCOM-1008 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveUltra Ethernet Consortium-UEC97BRKCOM-1008https:/ultraethernet.org/uec-progresses-towards-v1-0-set-of-specifications/2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveUltr
157、a Ethernet Consortium-UEC98BRKCOM-1008 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveOpen Standard NVLink AlternativesIntroduction of Ultra Accelerator Link(UALink)99BRKCOM-1008AMD,Broadcom,CiscoCisco,Google,HPE,Intel,Meta,and Microsoft are announcing the formation of a
158、group that will form a new industry standard,UALink,to create the ecosystem.Low Latency,high bandwidth fabric for 100s of accelerators.Interconnect for GPUGPU Communications 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveSilicon PhotonicsBringing Higher Data Rates,Lower L
159、atency&Reduced Power Consumption 100BRKCOM-1008Fiber Optic PhotonicsOver length scales of hundreds or thousands of kilometers i.e undersea fiber optic links for internetMajority of optical link involves light in fiber optic cableSource Laser,Periodic Repeaters/amps and photodetector at receiver.All
160、components(lasers,amplifiers,photodetector optical modulators,splitters etc)are discrete and connected.=Very costlySilicon PhotonicsIntegrated Photonics TechnologyAll optical components directly created on same silicon-on-insulator(SOI)substrate pact photonics chips that can closely be integrated wi
161、th CMOS logic.All components are created on same substrate allowing optical components to be packed far denser than discreate optics can achieve.Summary 2024 Cisco and/or its affiliates.All rights reserved.Cisco Public#CiscoLiveComputeNetworkStorageFlexible GPU accelerationScalability,tight coupling
162、 with compute&networkingLossless,highperformance fabricsVery few customers will train the largest models The use cases must drive which AI models,methods,and techniques to utilizeAI is driving the next push for modernized data center facilities,upgraded networks,compute,and storage and operational m
163、odelsMajor investments are not Major investments are not required to start.You can get required to start.You can get started with CPU based started with CPU based acceleration and existing acceleration and existing infrastructureinfrastructureMost will use pre-trained models with their own data and
164、deploy associated inference modelsAI consultants play a vital role in assessment,guidance,and adoption.Take Aways and Closing-Cisco Makes AI Hybrid Cloud Possible A I i s p u s h i n g i n f r a s t r u c t u r e r e q u i r e m e n t sBRKCOM-1008102 2024 Cisco and/or its affiliates.All rights reser
165、ved.Cisco Public#CiscoLiveComplete Your Session EvaluationsComplete a minimum of 4 session surveys and the Overall Event Survey to be entered in a drawing to win 1 of 5 full conference passes to Cisco Live 2025.Earn 100 points per survey completed and compete on the Cisco Live Challenge leaderboard.
166、Level up and earn exclusive prizes!Complete your surveys in the Cisco Live mobile app.BRKCOM-1008103 2024 Cisco and/or its affiliates.All rights reserved.Cisco PublicContinue your educationVisit the Cisco Showcase for related demosBook your one-on-oneMeet the Engineer meetingAttend the interactive e
167、ducation with DevNet,Capture the Flag,and Walk-in LabsVisit the On-Demand Library for more sessions at www.CiscoL us at:,BRKCOM-1008104Thank you#CiscoLiveCongestion in the fabricCongestion could always happenCongestion can always happen even in a non-blocking switch/fabricLets consider the following
168、 example with some maths:16 ToR,each of them is dual-connected to every spine with 2x200Gbps linksEvery ToR has 3.2Tbps of uplink capacityEach ToR is attached to 26 dual-homed nodes via 100Gbps linksEvery node could be firing up 200Gbps of traffic without affecting the uplinks capacityBut where is t
169、his traffic going?.L1L1L16L16L2L2L15L15S1S1S2S2S3S3S4S42x400Gbps.100Gbps.Congestion could always happenIf traffic traffic aggregated in a node exceeds egress bandwidth capacity then we have congestionImpact depends on the data plane protocol.Protocols with congestion control capabilities,like TCP,ca
170、n auto-adjust the flow throughputOther protocols,like UDP,have no concept about congestion control.L1L1L16L16L2L2L15L15S1S1S2S2S3S3S4S42x400Gbps.100GbpsFlows Sum=300GbpsHow RoCEv2 Solves This?RoCEv2 MUST run over a lossless network,retransmission must be avoidedEthernet networks are lossy by design,
171、drops can happenRoCEv2 encapsulates data chunks over IP/UDP packetsUDP doesnt have a native congestion control mechanismRoCEv2 uses the Data Center Quantized Data Center Quantized Congestion NotificationCongestion Notification scheme that relies primarily on two existing flow control techniques:IP E
172、xplicit Congestion Notification(RFC 3168,1999)Priority Flow Control(802.1Qbb).L16L16L2L2L15L15S1S1S2S2S3S3S4S42x400Gbps.100GbpsFlows Sum=300GbpsData Center Quantized Congestion NotificationIP ECN or PFC cannot alone provide a valid Congestion Management frameworkIP ECN signalling might take too long
173、 to relieve the congestionPFC can could introduce other problems like Head Of Line Blocking and unfairness for the flowsThe two of them together provide the desired result of having lossless RDMA communications across Ethernet networks(this is called DCQCNDCQCN)The requirements are:Ethernet devices
174、compatible with both techniquesProper configurations appliedExplicit Congestion NotificationExplicit Congestion NotificationWRED high thresholdWRED low thresholdBuffer UtilizationECN is implemented via QoS queuing policies leveraging WRED(Weighted Random Early Detection)Buffer utilization is constan
175、tly monitored,when the buffer goes above the low threshold then somesome packets get marked with the ECN bits to 0b11.Only ECN capable packets are markedIf it goes above the high threshold then allall ECN capable packets are marked with 0b11MACIPUDPRoCEv2DSCPECN0b 00-Non ECN capable0b 01-ECN capable
176、0b10 -ECN capable0b11 -Congestion ExperiencedECN In Action With RoCEv2L3L2L4L1SpineReceiver-RSender S1Sender S2Sender S3200Gbps100GbpsS:S1;D:R X;ECN:0b10 ECN In Action With RoCEv2L3L2L4L1SpineReceiver-RSender S1Sender S2Sender S3200Gbps100Gbps50Gbps50GbpsS:S1;D:R X;ECN:0b10 S:S1;D:R X;ECN:0b10 ECN I
177、n Action With RoCEv2L3L2L4L1SpineReceiver-RSender S1Sender S2Sender S3200Gbps100Gbps50Gbps100Gbps50GbpsS:S1;D:R X;ECN:0b10 S:S2;D:R X;ECN:0b10 S:S1;D:R X;ECN:0b10 S:S2;D:R X;ECN:0b10 ECN In Action With RoCEv2L3L2L4L1SpineReceiver-RSender S1Sender S2Sender S3200Gbps100Gbps50Gbps150Gbps50Gbps50GbpsIMP
178、ORTANT:The next slides status changes and actions happen in nanosecondsnanosecondsS:S1;D:R X;ECN:0b10 S:S2;D:R X;ECN:0b10 S:S3;D:R X;ECN:0b10 S:S1;D:R X;ECN:0b10 S:S2;D:R X;ECN:0b10 S:S3;D:R X;ECN:0b10 ECN In Action With RoCEv2L3L2L4L1SpineReceiver-RSender S1Sender S2Sender S3200Gbps100Gbps50Gbps150
179、Gbps50Gbpsrandom-detect minimum-threshold 150 kbytes maximum-threshold 3000 kbytes drop-probability 7 weight 0 ecn50GbpsS:S1;D:R X;ECN:0b10 S:S2;D:R X;ECN:0b11 S:S3;D:R X;ECN:0b10 ECN In Action With RoCEv2L3L2L4L1SpineReceiver-RSender S1Sender S2Sender S3200Gbps100Gbps50Gbps150Gbps50Gbps50GbpsS:R;D:
180、S2 X;CNPsent once every X msecCNP=Congestion Notification PacketECN In Action With RoCEv2L3L2L4L1SpineReceiver-RSender S1Sender S2Sender S3200Gbps100Gbps50Gbps125Gbps50Gbps25GbpsS:S1;D:R X;ECN:0b11 S:S2;D:R X;ECN:0b11 S:S3;D:R X;ECN:0b11 L3L2L4L1SpineReceiver-RSender S1Sender S2Sender S350Gbps125Gbp
181、s50Gbps25GbpsS:R;D:S1 X;CNPsent once every X msecS:R;D:S2 X;CNPS:R;D:S3 X;CNPECN In Action With RoCEv2200Gbps100GbpsECN In Action With RoCEv2L3L2L4L1SpineReceiver-RSender S1Sender S2Sender S3200Gbps100Gbps25Gbps75Gbps25Gbps25GbpsS:S1;D:R X;ECN:0b10 S:S2;D:R X;ECN:0b10 S:S3;D:R X;ECN:0b10 ECN In Acti
182、on With RoCEv2ConsiderationsLatency between ECN marking and subsequent throttling of the throughput rate could be significant CNP packets must be prioritized!While notifications are running buffers might get fully saturated and this will cause a tail dropThis is why DCQCN combines ECN with PFC0%20%4
183、0%60%80%100%120%Buffer SaturationRoCEv2 QPriority Flow ControlPriority Flow ControlWith PFC we can define a no-drop queueEvery time the queue reaches a defined threshold the almost saturated device sends pause frames to the devices causing thatThe device which receives it will stop forwarding packet
184、s classified for that queue and will place them into its bufferThe process repeats from here until it reaches the original senders,at that point they will also stop temporarily sending packetsBy the time this happens all the buffers in the network should be flushed and forwarding can start againqueu
185、es 1 2 3n 4 5 6 78queues1 2 3n 4 5 6 78XOFFXONPAUSEPAUSEPFC and ECN Joining ForcesBufferPFC and ECN Joining ForcesWRED low thresholdWe mark some packetswith CEBufferPFC and ECN Joining ForcesWRED low thresholdWe mark some packetswith CEBufferWRED high thresholdWe mark all packetswith CEPFC and ECN J
186、oining ForcesxOFFPFC Frames are sent towards sourceWRED low thresholdWe mark some packetswith CEBufferWRED high thresholdWe mark all packetswith CEPFC and ECN Joining ForcesxOFFPFC Frames are sent towards sourcexONPFC Frames are nolonger sentWRED low thresholdWe mark some packetswith CEBufferWRED hi
187、gh thresholdWe mark all packetswith CEPriority Flow Control In Action With RoCEv2L3L2L4L1SpineReceiver-RSender S1Sender S2Sender S3200Gbps100Gbps50Gbps150Gbps50Gbps50GbpsS:R;D:S1 X;CNPS:R;D:S2 X;CNPS:R;D:S3 X;CNPS:S1;D:R X;ECN:0b11 S:S2;D:R X;ECN:0b11 S:S3;D:R X;ECN:0b11 Priority Flow Control In Act
188、ion With RoCEv2L3L2L4L1SpineReceiver-RSender S1Sender S2Sender S3200Gbps100Gbps50Gbps150Gbps50Gbps50GbpsS:R;D:S1 X;CNPS:R;D:S2 X;CNPS:R;D:S3 X;CNPS:S1;D:R X;ECN:0b11 S:S2;D:R X;ECN:0b11 S:S3;D:R X;ECN:0b11 Priority Flow Control In Action With RoCEv2L3L2L4L1SpineReceiver-RSender S1Sender S2Sender S32
189、00Gbps100Gbps50Gbps150Gbps50Gbps50GbpsPFCS:R;D:S1 X;CNPS:R;D:S2 X;CNPS:R;D:S3 X;CNPS:S1;D:R X;ECN:0b11 S:S2;D:R X;ECN:0b11 S:S3;D:R X;ECN:0b11 Priority Flow Control In Action With RoCEv2L3L2L4L1SpineReceiver-RSender S1Sender S2Sender S3200Gbps100Gbps50Gbps150Gbps50Gbps50GbpsPFCNote:Spine will also s
190、tart marking packets with ECN 0b11 before sending PFCsS:R;D:S1 X;CNPS:R;D:S2 X;CNPS:R;D:S3 X;CNPS:S1;D:R X;ECN:0b11 S:S2;D:R X;ECN:0b11 S:S3;D:R X;ECN:0b11 Priority Flow Control In Action With RoCEv2L3L2L4L1SpineReceiver-RSender S1Sender S2Sender S3200Gbps100Gbps50Gbps150Gbps50Gbps50GbpsPFCPFCS:R;D:
191、S1 X;CNPS:R;D:S2 X;CNPS:R;D:S3 X;CNPS:S1;D:R X;ECN:0b11 S:S2;D:R X;ECN:0b11 S:S3;D:R X;ECN:0b11 Priority Flow Control In Action With RoCEv2L3L2L4L1SpineReceiver-RSender S1Sender S2Sender S3200Gbps100Gbps50Gbps150Gbps50Gbps50GbpsPFCPFCNote:Every Switch will also start marking packets with ECN 0b11 be
192、fore sending PFCsS:R;D:S1 X;CNPS:R;D:S2 X;CNPS:R;D:S3 X;CNPS:S1;D:R X;ECN:0b11 S:S2;D:R X;ECN:0b11 S:S3;D:R X;ECN:0b11 Priority Flow Control In Action With RoCEv2L3L2L4L1SpineReceiver-RSender S1Sender S2Sender S3200Gbps100Gbps50Gbps150Gbps50Gbps50GbpsPFCPFCPFCPFCPFCS:R;D:S1 X;CNPS:R;D:S2 X;CNPS:R;D:
193、S3 X;CNPS:S1;D:R X;ECN:0b11 S:S2;D:R X;ECN:0b11 S:S3;D:R X;ECN:0b11 Priority Flow Control In Action With RoCEv2L3L2L4L1SpineReceiver-RSender S1Sender S2Sender S3200Gbps100Gbps20Gbps60Gbps20Gbps20GbpsECN and PFC What Each One BringsRoCEv2 can leverage the use of both ECN and PFC to achieve its goals(
194、i.e.lossless transport)ECN is an IP layer notification system.It allows the switches to indirectly inform the sources as soon as a threshold is reached and let them slow down the throughput PFC works at Layer 2 and serves as a way to use the buffer capacity of switches in the data path to temporaril
195、y ensure the no-drop queue is honoured.It effectively happens at each switch,hop-by-hop,back to the source,giving the source time to react without dropping packetsECN should react first,and PFC acts as a fail-safe if the reaction is not fast enoughIn any case the combo can help achieving a lossless
196、outcome required by AI/ML trafficThis collaboration of both is called Data Center Quantized Congestion Notification(DCQCN)All Nexus 9000 CloudScale ASICs support DCQCNAlternatives to ECN with WREDApproximate Fair DropNexus 9000 ASIC also implements advanced queuing algorithms that can avoid some non
197、-optimized WRED resultsAs an example WRED has no knowledge on which flows are consuming most of the bandwidth.ECN marking happens only based on probabilityAFD constantly tracks the amount of traffic exchanged and divides them in two categories:Elephant Flows:long and heavy which will be penalized(ECN marked)Mice Flows:short and light which will not be penalized(ECN marked)