《俞育才--基于Ray构建高效的Data + AI计算平台.pdf》由会员分享,可在线阅读,更多相关《俞育才--基于Ray构建高效的Data + AI计算平台.pdf(35页珍藏版)》请在三个皮匠报告上搜索。
1、例:支撑海量数据的大数据平台与架构 例:支撑海量数据的大数据平台与架构 例:茹炳晟例:茹炳晟例:腾讯Tech Lead,腾讯研究院特约研究员正文要求:微软雅黑:最小字号 8号 宋体:最小字号 10号 等线:最小字号 12号基于Ray构建高效的Data+AI计算平台基于Ray构建高效的Data+AI计算平台俞育才俞育才eBay AI平台架构师CONTENTS背景:背景:eBay AI 2.0 计计划划模型开模型开发发和部署中的和部署中的问题问题基于基于Ray构建高效的构建高效的Data+AI平台平台未来的未来的计计划划目目录录Part 1背景:背景:eBay AI 2.0 计计划划eBay AI
2、 StrategyWe believe eBay is best positioned to capture upside from gen AI in 24,to the extent its seller-focused features drive listing velocity and quality.-Morgan Stanley Analyst(2024/04/18)https:/ High Efficient Data+AI PlatformGenerative AI revolution caused a step jump in large and complex mode
3、ls,increased GPU requirements.New use cases are rapidly increasing across eBay.Our infrastructure must quickly respond.ML PlatformUnified Feature StoreTraining PlatformUnified Inference Platform(UIP)MLP Control PlaneAI HubNotebooks&SDKsLeverage Ray AI runtimeExperimentation managementRay ClusterRay
4、JobRay ServiceTess(Kuberay)Ray DataRay TuneRay CoreRay TrainRay ServePart 2模型开模型开发发和部署中的和部署中的问题问题Model Development&DeploymentRun series of training experiments on eBay dataset to get best accuracyExplorations:for best model candidate e.g.from LLaMA,Mistral,BERT,etc.e.g.CV CLIP,ResNet Applied researc
5、hers develop model in python codeSW engineers develop Java code for productionServe model in productionE.g.,model pre/post-processingModel orchestrationMagical listingFind similar imagesTraining datasetsScale training and fine tuning Model exploration:coding and trainingBatch/NRT inference pipelines
6、Online inference pipelinesModel management serviceModel catalogServe LLM to or vision modelsGenerate embedding for similarity searchVector DBData loading and preprocessing pipelinesRelease model versionsResearcher runs small training experiments on different models and use case dataset to find best