《朱军-扩散策略学习的若干进展.pdf》由会员分享,可在线阅读,更多相关《朱军-扩散策略学习的若干进展.pdf(46页珍藏版)》请在三个皮匠报告上搜索。
1、ML-SummitML-Summitwww.cpp-www.ml-summit.orgwww.gosim.orgwww.pm-summit.orgML-SummitML-SummitML-SummitML-SummitML-SummitML-Summit朱朱军军 I IE EE EE E F Fe el ll lo oww,清清华华大大学学人人工工智智能能研研究究院院副副院院长长,生生数数科科技技创创始始人人兼兼首首席席科科学学家家清华大学计算机系博世AI教授、IEEE Fellow、清华大学人工智能研究院副院长、计算机系人智实验室主任、生数科技创始人兼首席科学家。主要从事机器学习基础理论和高
2、效算法研究。曾获中国科协求是杰出青年奖、科学探索奖、中国计算机学会自然科学一等奖、吴文俊人工智能自然科学一等奖、ICLR国际会议杰出论文奖等,入选国家级高层次人才计划、中国计算机学会青年科学家、MIT TR35中国先锋者等。演演讲讲主主题题:扩扩散散策策略略学学习习的的若若干干进进展展ML-SummitML-SummitDiffusion Policies:Reinforcement Learning with Diffusion ModelsJun ZhuTsinghua-Bosch Joint Center for MLDepartment of Computer Science and
3、Technology Tsinghua UniversityML-SummitML-SummitOffline RL:Data-driven;Open-loop RLML-SummitML-SummitOpen-loop RL leads to policy conservatism Online RLOffline RLBehavior(dataset)dist.Offline RL requires constrained policy optimization paradigmML-SummitML-SummitOpen-loop RL leads to policy conservat
4、ism Online RLOffline RLHighly inaccurate for unseen(s,a)pair without feedbackEstimation error can be corrected through Feedback LoopBehavior(dataset)dist.ML-SummitML-SummitBehavior modeling for offline RLConstrained Policy Optimization problem:has one optimal analytic solution:Resolving offline RL r
5、equires understanding the behavior distribution Generative modelingML-SummitML-SummitDiffusion Models for High-dim Data Generation Image,3DBlessing of Scale:self-supervisedly learn large models with a huge amount of unlabeled(multi-modal)data ProlificDreamer,NeurIPS 2023;CRM,ECCV 2024;DeepMesh,arXiv
6、 2025UniDiffuser,ICML 2023ML-SummitML-SummitDiffusion Models for High-dim Data Generation VideoVidu:the first high-performance video generator after Sora,released in April 27th,2024u:a Highly Consistent,Dynamic and Skilled Text-to-Video Generator with Diffusion Models,Bao et al.,arXiv 2024Vidu4D,Neu