1、AI Inference Performance Acceleration:Methods,Tools,and Deployment WorkflowsYifei Zhang/Lei QianByteDance Cloud Native Development EngineerAI+Cloud Native Technology Trendshttps:/ Native Technology Trendshttps:/lifearchitect.ai/timeline/https:/nhlocal.github.io/AiTimeline/AI+Cloud Native Technology
2、Trendshttps:/cf.io/reports/cloud-native-artificial-intelligence-whitepaper/2024.03AI+Cloud Native Technology TrendsAI+Cloud Native Technology TrendsAI Inferencehttps:/arxiv.org/abs/1706.03762https:/huggingface.co/blog/stable_diffusionAI InferenceAI InferenceStorage Access AccelerationScenario:runnin
3、g Stable Diffusion WebUI with serverless PodsProblem:slow scaling speedTime Consume:pull container images read Stable Diffusion modelsSolutions:Image Cache Fluid+AlluxioStorage Access Acceleration Kubernetes native manage dataset(compose、accelerate)support multiple storage runtime CNCF sandbox proje
4、ct https:/fluid-cloudnative.github.io/Storage Access AccelerationStorage Access AccelerationFluid Components:deploy on node/using serverless podAlluxio Master Pod:use serverless pod to avoid node unavailableAlluxio Worker Pod:deploy on existing nodes to increase node utilization scale up for high pe
5、rformance and scale down to minimize costsStorage Access Accelerationtime consumed for first text2img requestWhats next:faster and more stable storage runtime integrated with veTurboIO,an open source、high performance library for reading/writing models productization,more user-friendly validation at
6、a larger scaleBenchmark TestCustomer scenario:Choose the right card for inference for different modelsQuestion:What is the performance of the model?What kind of model is suitable for different cards?How to choose the best deployment configu