张崇洁_Semi-Supervsied Offline RL_watermark.pdf

编号:155533 PDF 35页 6.61MB 下载积分:VIP专享
下载报告请您先登录!

张崇洁_Semi-Supervsied Offline RL_watermark.pdf

1、November 26,2023Offline Reinforcement Learning withReward-Free DatasetsChongjie ZhangHu Hao(胡昊)Yiqin Yang(杨以钦)Machine Intelligence GroupIntelligent Decision-Making/Control2GameAIRecommended SystemIntelligent RobotMachine Intelligence GroupReinforcement LearningsOpportunities and Challenges Success i

2、n Artificial Domains Real-world challenges Online interaction is expensive&dangerous Healthcare,Robotics,Recommendation Sample complexity Transfer3DataInteractionMachine Intelligence GroupData-Driven Solution:Offline RL4Interaction costSample complexityTransferNo InteractionFinetuneSample from Targe

3、t TaskBig data from past interactionsTraining policy with many epochsOccasional interaction for more dataMachine Intelligence GroupOffline RL Setting =!,!,!,!#!$,(,)Objective:max#%&()+#),-#(-)%(%,%)Problem SettingPolicy is learned with a static dataset,which is collected by unknown behavior policy!I

4、nteractions are not allowedMachine Intelligence GroupChallenges of Offline RL Significant overestimation:Reward-free dataset Reward-free Datasets can be cheap,while dataset for a specific task can be expensive.6go right to get higher!Extrapolation Error+Bootstrapping!(|)(|)(!()Machine Intelligence G

5、roupOutline71.Offline RLwith EVLICLR22OtherDataOtherDataOtherData2.Provable DataSharingICLR23Online RLReward-freeData3.Behavior Extraction via Random IntentionsNeurIPS23Machine Intelligence GroupFirst Challenge:Significant Overestimationin Offline RL8Xiaoteng Ma*,Yiqin Yang*,Hao Hu*,Jun Yang,Chongji

6、e Zhang+,Qianchuan Zhao,Bin Liang,and Qihan Liu.Offline Reinforcement Learning with Value-based Episodic Memory.ICLR.2021.Machine Intelligence GroupReasons for Overestimation in Offline RL Extrapolation Error9True Q-ValueEstimated Q-ValueFigure 2.OOD action invalue estimationTheorem 11:Given a deter

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(张崇洁_Semi-Supervsied Offline RL_watermark.pdf)为本站 (张5G) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
客服
商务合作
小程序
服务号
折叠