《基于物理条件约束的可信视觉生成大模型.pdf》由会员分享,可在线阅读,更多相关《基于物理条件约束的可信视觉生成大模型.pdf(39页珍藏版)》请在三个皮匠报告上搜索。
1、基于物理条件约束的可信视觉生成大模型朱思语 复旦大学演讲嘉宾朱思语复旦大学教授复旦大学人工智能创新与产业研究院研究员,长聘正教授,博士生导师。朱思语本科毕业于浙江大学,博士毕业于香港科技大学。在博士阶段,作为联合创始人创立了3D视觉公司Alituzre,并后来被苹果公司收购。2017年至2023年,在阿里云人工智能实验室担任总监。2023年起,任职于复旦大学人工智能创新与产业研究院,担任研究员和博士生导师。朱思语的主要研究方向包括视频和三维生成式模型,涉及基于视觉的三维和视频的重建、生成、理解、方针和模拟。他发表了60余篇高水平会议和期刊论文,包括CVPR、ICCV、ICLR和TPAMI等计算
2、机视觉和机器学习领域,包括Hallo,Champ,AnimateAnything等有一定行业影响力的视频生成大模型。在40余个计算机视觉国际比赛和榜单上取得第一名。Visual generative modelVAE:maximize variational lower boundInputOutputVideo generative methodsGAN:Adversarial trainingVAE:maximize variational lower boundFlow-based models:Invertible transform of distributionsDiffusion
3、 models:Gradually add Gaussian noise and then reverse The field of video generation has seen rapid development,reaching several milestones.Diffusion for visual generation(1)Denoising Diffusion Probabilistic Models(DDPMs)Diffusion for visual generation(2)Stochastic Differential Equations(Score SDEs)K
4、ey Elements of visual Diffusion Models Pixel diffusion(original input)Latent space diffusion Unet TransformerSora,breakthrough Consistency:consistency in 3D rendering,long-range coherence,and object permanence.High fidelity.Surprising length:extended video length capability(Sora:1 minute vs.previous
5、 systems:seconds).Flexible resolution:generation of videos across various durations,aspect ratios,and resolutions.Sora,key technologies The DiT framework by Meta(2022.12)is designed for video processing.Googles MAGViT(2022.12)focuses on Video Tokenization.Google DeepMind introduced NaViT(2023.07)to
6、support various resolutions and aspect ratios.OpenAIs DALL-E 3(2023.09)enhances Video Caption generation for improved conditioned video creation.Modeling the physical world We know that it is very complicated real physical model.probabilistic bayesian inference;probabilistic graphical models.determi
7、nistic mathematical equations;physics based simulation;control theory.Modeling the physical world We know that it is very complicated real physical model.probabilistic bayesian inference;probabilistic graphical models.deterministic mathematical equations;physics based simulation;control theory.Key e
8、lements of a physical world Given a Sora demo(the walking woman in the Tokyo street),the key elements of a physical world,in the graphical way.Appearance Geometry Lighting Motion&Animation AudioModeling the physical worldChick-ChickenEspressoSplit-CookieFlame-Steak CVPR Gaussian-Flow:4D Reconstructi
9、on with Dynamic 3D Gaussian ParticleModeling the physical world CVPR Gaussian-Flow:4D Reconstruction with Dynamic 3D Gaussian ParticleIt is hard to model the physical world In fact,the world is hard to model in a probablistic way.Sora resource consumption.1 billions of images;1 millions of hours of
10、video data;10 trillions tokens after tokenizing images and videosTraining with 5,000 A100s in parallel.It is hard to model the physical world Sora failure case in geometry and appearance.It is hard to model the physical world Sora failure case in lighting.It is hard to model the physical world Sora
11、failure case in motion and animation.It is hard to model the physical world VideoMV:Consistent Multi-View Generation Based on Large Video Generative Model Geometric enhancement is still needed for multi-view images.It is hard to model the physical world VideoMV:Consistent Multi-View Generation Based
12、 on Large Video Generative Model From a static aspects,SVD is able to model multi-view images.It is hard to model the physical world Stag4D:Spatial-Temporal Anchored Generative 4D Gaussians From a temporal aspects.It is hard to model the physical world STAG4D:Spatial-Temporal Anchored Generative 4D
13、Gaussians From a temporal aspects.It is hard to model the physical world Ilya Sutskever:compression is generalization.The best lossless compression for a dataset is the best generalization for data outside the dataset.Apply the deterministic conditions Different representations of deterministic cond
14、itions in the physical world.Much less data and parameters!GeometryLightingMotion&AnimationApply the deterministic conditions There are two ways to inject deterministic information.deterministic#1deterministic#2Image Human Animation Champ:Controllable and Consistent Human Image Animation with 3D Par
15、ametric GuidanceImage Human Animation Champ:Controllable and Consistent Human Image Animation with 3D Parametric GuidanceImage Human Animation Champ:Controllable and Consistent Human Image Animation with 3D Parametric GuidanceImage Portrait Animation Hallo:Hierarchical Audio-Driven Visual Synthesis
16、for Portrait Image AnimationImage Portrait Animation Hallo:Hierarchical Audio-Driven Visual Synthesis for Portrait Image AnimationImage Portrait Animation Hallo:Hierarchical Audio-Driven Visual Synthesis for Portrait Image AnimationDynamic Protein Structure Prediction 4D Diffusion for Dynamic Protei
17、n Structure Prediction with Reference Guided Temporal AlignmentDynamic Protein Structure Prediction 4D Diffusion for Dynamic Protein Structure Prediction with Reference Guided Temporal AlignmentFuture work Apply deterministic conditions to probabilistic diffusion.Less data and paramters!GeometryLightingMotion&AnimationTHANKS