《empower-large-language-models-llms-serving-in-production-with-cloud-native-ai-technologies-zhi-chang-xia-nanochang-hou-la-xia-nfbo-yun-nfllms-lize-cai-sap-yang-che-alibaba-cloud-intelligence.pdf》由会员分享,可在线阅读,更多相关《empower-large-language-models-llms-serving-in-production-with-cloud-native-ai-technologies-zhi-chang-xia-nanochang-hou-la-xia-nfbo-yun-nfllms-lize-cai-sap-yang-che-alibaba-cloud-intelligence.pdf(33页珍藏版)》请在三个皮匠报告上搜索。
1、Empower Large Language Models(LLMs)in Production With Cloud Native AI TechnologiesLize Cai,Senior Software Engineer,SAPYang Che,Senior Engineer,Alibaba CloudAbout usLize CaiSenior Software Engineer in SAPYang CheSenior Engineer in Alibaba CloudAgendaIntroductionLLM Challenges in ProductionManages LL
2、M lifecycle in the K8s way-KServeAccelerates LLM scaling from data perspective-FluidDemoFuture WorksQAIntroductionmaximelabonneIntroductionIt is a common use case to provide a playground to try out different models IntroductionBut it is not so easyLLM Challenges in ProductionNew requirements on serv
3、ing LLMNew inference APIs like text generation,embeddings.Streaming response is required for real-time user experience.Variety of models and runtimesTGI,vLLM,TRT-LLM etc.Llama,Mistral,Phi,Qwen etc.LLM services from cloud providersDifferent providers have their own spec(api and token calculation)whic
4、h leading to a poor user experience and increased maintenance efforts.High computing costthe need for expensive hardware,high energy consumption,and associated infrastructure expenses.Data privacyModel and request data can be sensitive and private for inference.Manages LLM lifecycle in the K8s way-K
5、ServeWhat is KServe?Highly scalable and standards-based cloud-native model inference platform on Kubernetes for trusted AI that encapsulates the complexity of deploying AI models to production.What is KServe?Core InferenceTransformer/PredictorServing RuntimesCustom Runtime SDKOpen Inference Protocol
6、Serverless AutoscalingCloud/PVC StorageAdvanced InferenceModelMesh for Multi-Model ServingInference GraphPayload LoggingRequest BatchingCanary RolloutModel Explanability&MonitoringText,Image,Tabular ExplainerBias DetectorAdversarial DetectorOutlier DetectorDrift DetectorWhat is KServe?Serving Runtim