报告预览

张崇洁_Semi-Supervsied Offline RL_watermark.pdf

编号：155533

PDF 35页 6.61MB 下载积分：VIP专享

下载报告请您先登录！

张崇洁_Semi-Supervsied Offline RL_watermark.pdf

1、November 26,2023Offline Reinforcement Learning withReward-Free DatasetsChongjie ZhangHu Hao(胡昊）Yiqin Yang（杨以钦）Machine Intelligence GroupIntelligent Decision-Making/Control2GameAIRecommended SystemIntelligent RobotMachine Intelligence GroupReinforcement LearningsOpportunities and Challenges Success i

2、n Artificial Domains Real-world challenges Online interaction is expensive&dangerous Healthcare,Robotics,Recommendation Sample complexity Transfer3DataInteractionMachine Intelligence GroupData-Driven Solution:Offline RL4Interaction costSample complexityTransferNo InteractionFinetuneSample from Targe

3、t TaskBig data from past interactionsTraining policy with many epochsOccasional interaction for more dataMachine Intelligence GroupOffline RL Setting =!,!,!,!#!$,(,)Objective:max#%&()+#),-#(-)%(%,%)Problem SettingPolicy is learned with a static dataset,which is collected by unknown behavior policy!I

4、nteractions are not allowedMachine Intelligence GroupChallenges of Offline RL Significant overestimation:Reward-free dataset Reward-free Datasets can be cheap,while dataset for a specific task can be expensive.6go right to get higher!Extrapolation Error+Bootstrapping!(|)(|)(!()Machine Intelligence G

5、roupOutline71.Offline RLwith EVLICLR22OtherDataOtherDataOtherData2.Provable DataSharingICLR23Online RLReward-freeData3.Behavior Extraction via Random IntentionsNeurIPS23Machine Intelligence GroupFirst Challenge:Significant Overestimationin Offline RL8Xiaoteng Ma*,Yiqin Yang*,Hao Hu*,Jun Yang,Chongji

6、e Zhang+,Qianchuan Zhao,Bin Liang,and Qihan Liu.Offline Reinforcement Learning with Value-based Episodic Memory.ICLR.2021.Machine Intelligence GroupReasons for Overestimation in Offline RL Extrapolation Error9True Q-ValueEstimated Q-ValueFigure 2.OOD action invalue estimationTheorem 11：Given a deter

友情提示

1、下载报告失败解决办法
2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。
3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

本文（张崇洁_Semi-Supervsied Offline RL_watermark.pdf）为本站（张5G）主动上传，三个皮匠报告文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知三个皮匠报告文库（点击联系客服），我们立即给予删除！

温馨提示：如果因为网速或其他原因下载失败请重新下载，重复下载不扣分。