《谢吉兵-LLM在eBay云原生模型推理平台的工程化落地.pdf》由会员分享,可在线阅读,更多相关《谢吉兵-LLM在eBay云原生模型推理平台的工程化落地.pdf(31页珍藏版)》请在三个皮匠报告上搜索。
1、ML-SummitML-Summitwww.cpp-www.ml-summit.orgwww.gosim.orgwww.pm-summit.orgML-SummitML-SummitML-SummitML-SummitML-SummitML-Summit谢谢吉吉兵兵 e eB Ba ay y机机器器学学习习平平台台软软件件开开发发工工程程师师硕士毕业于上海交通大学,专注于AI领域。曾就职于腾讯主要致力于定制化AI推理引擎开发以及工程化落地,现就职于eBay机器学习平台部门,致力于eBay云原生AI推理平台的建设。作为主要负责人主导了基于triton server的LLM zero code 部
2、署方案的开发,基于k8s的LLM auto scale解决方案的开发以及LLM benchmark自动化工具建设。他始终致力于推动人工智能技术的实际应用,力求通过技术创新为企业创造更大的价值。演演讲讲主主题题:L LL LMM在在e eB Ba ay y云云原原生生模模型型推推理理平平台台的的工工程程化化落落地地ML-SummitML-Summit2025 全球机器学习技术大会LLM在eBay云原生模型推理平台的工程化落地谢吉兵ML-SummitML-Summit目录CONTENTSBackgroundLLM Deployment and Inference ServiceLLM Servic
3、e Autoscaling LLM Self-Service BenchmarkingNext StepML-SummitML-SummitBackground01ML-SummitML-SummitEmpowering Business with LLM InnovationSmart Item Description Buyer Assistant BotRERecommendation Emails Subject Lines GenerationEnable explainable recommendationsML-SummitML-SummitChallenges to Serve
4、 LLMsjLarger Model SizeScalabilityModel files accessingDistributed inference:muti-GPU in single pod or multi-podGPU fragmentationHuge demand for GPUsInsufficient GPU utilization for varying workloadsProvide platform capabilities to enable rapid deployment for LLMs;Efficiently utilize GPU resources t
5、o serve more models.GoalsChallengesML-SummitML-SummiteBay Unified Inference Component OverviewREML-SummitML-SummitLLM Deployment and Inference Service02ML-SummitML-SummitUnified Inference ArchitectureExpanded the capability to serve LLMs based on the current inference platform architecture.High Avai
6、labilityHigh scalabilityIsolation High performanceML-SummitML-SummiteBay LLM Inference SolutionUser friendly method by unified triton serviceContainer based methodStandard triton docker image is providedSchema free Code freeSupport customized pre/post processors by ensemble modelsHigh flexibility:in