1、 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.A I M 3 3 1 9Industrial-scale agentic web automation with vLLM and Amazon SageMaker AIArindam PaulSenior PMT-AWS ML ServicesVinay AroraSr WW Specialist SA-GenAI 2025
2、,Amazon Web Services,Inc.or its affiliates.All rights reserved.AgendaLLM Deployment ChallengesvLLM AWS Deep Learning ContainersAmazon SageMaker AIArchitecture Deep DiveDemo Agentic AIQ/A 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.More than 30%30%of enterprise apps will be pow
3、ered by generative AI and agentic AI by 2028A C C O R D I N G T O G A R T N E R 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.In todays presentation“Equity Analyst”Agent generating a detailed report from market/event signals Leveraging vLLM on SageMaker AIhttps:/ GLM-4.5V model
4、(Vision Language model https:/huggingface.co/zai-org/GLM-4.5V)Sneak Peek 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.LLM Deployment Challenges 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.
5、Complex Dependency ManagementLLM Deployment ChallengesCompatibility issues,across CUDA,PyTorch,NCCL,and DriversScaling Across NodesDistributed inference adds networking and coordination overheadSuboptimal GPU UtilizationIdle memory,inefficient batching,and underused accelerators Model Storage&Loadin
6、g BottlenecksLarge checkpoints cause slow startup and I/O constraints 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.vLLM AWS Deep Learning Containers 2025,Amazon Web Services,Inc.or its affiliates.All rights rese