《Stable Diffusion 推理加速技巧解析.pdf》由会员分享,可在线阅读,更多相关《Stable Diffusion 推理加速技巧解析.pdf(23页珍藏版)》请在三个皮匠报告上搜索。
1、Stable Diffusion 推理加速技巧解析Stable DiffusionprocessText Encoder(CLIP Text)Image Information Creator(Unet+Scheduler)Image Decoder(Autoencoder decoder)77 x 768Token embeddings4 x 64 x 64Processed image information tensorUnetStep1UnetStep2UnetStep3UnetStep50UNet+Scheduler to gradually process/diffuse info
2、rmation in the information(latent)space.Input:text embeddings and a starting multi-dimensional array made up of noise.Output:A processed information arrayClipText for text encoding.Input:text.Output:77 token embeddings vectors,each in 768 dimensionsAutoencoder Decoder that paints the final image usi
3、ng the processed information array.Input:The processed information array(dimensions:(4,64,64)Output:The resulting image(dimensions:(3,512,512)which are(red/green/blue,width,height)Stable DiffusionFLOPsStarting Latent State:Sample Gaussian Noise from random seedUser promptClip text encoderTimeStep:tS
4、chedulerUpdate/denoise LatentRepeat T=50 timesVAE Image DecoderFinal denoised latentBx4xHxWOutput image:Bx3x8Hx8WFloating point operations:0.02714xU-NetFloating point operations:1.67952x10121012Floating point operations:2.53933x1012Floating point operations:83.976x1012For classifier-free guidanceFlo
5、ating point operations measured under:B=1H=W=64Negative prompt:Empty token if not provided2Bx77x768,text embeddings2Bx4xHxW,predicted Noise2Bx4xHxW,Latent TensorRT&BenchmarkStable Diffusion TensorRTTwo way to leverage SD trthttps:/ Community Exampleshttps:/ Demo Diffusion2Stable DiffusionInference b
6、enchmark UNET x 50 OOTB+Myelin+fMHA+fMHCA+LayerNorm+GroupNorm+SplitGeLU+Seq2Spatial+Graph Optimization+Preview Feature 0805 CLIP OOTB+Myelin+LayerNorm VAE OOTB+Myelin+GroupNorm792 ms4 ms19 ms815 msUNET x 50CLIPVAEPipeline TotalTensorRT 8.5.2.2,bs=1,wh=512x512Structured SparsitySparsity2:4 FINE-GRAIN