1、 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.C N S 3 4 8LLMs in Production:Fast deployment with Amazon ECSSantiago Flores KanterCurtis Rissi 2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.Le
2、ts set some expectationsWhat is a Chalk Talk?Will you be able to interact and ask questions?What about my needs?2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.2025,Amazon Web Services,Inc.or its affiliates.All rights reserved.Lets quickly discuss some fundamentals 2025,Amazon Web
3、 Services,Inc.or its affiliates.All rights reserved.GenAI User TypesGenerative AIuser typesSkillsDeep end-to-end ML,NLP expertise and data science,labeler“squad“ProvidersEntities who build foundation models from scratch themselves and provide them as a product to tuner and consumerSome end-to-end ML
4、 expertise and knowledge of model deployment and inference.Strong domain knowledge for tuning including prompt engineering.Tuners Fine-tune foundational models from providers to fit custom requirements.Orchestrate the deployment of the model as a service for use by consumersNo ML expertise required.
5、Mostly application developers or end-users with understanding of the service capabilities.Only prompt engineering is required for better results.ConsumersInteract with generative AI services from provider or tuner by text prompting or visual interface to complete desired actionscan become 2025,Amazo
6、n Web Services,Inc.or its affiliates.All rights reserved.Spectrum of Inference on AWSAbstractionAmazon BedrockAWS LambdaAmazon ECSAWS EKSFully managed,fastest time to valueEvent driven inference,great for bursts and stateless APIsControlled GPU-powered inference with AWS-native s