1、基于生成流网络的用户留存建模by Shuchang,Jul 2024Modeling User Retention through Generative Flow NetworksXiangyu Zhao*City University of Hong KongZiru Liu(刘子儒)City University of Hong KongShuchang Liu Kuaishou TechnologyZijian ZhangCity University of Hong KongQingpeng Cai*Kuaishou TechnologyLantao HuKuaishou Techno
2、logyPeng Jiang*Kuaishou TechnologyHan LiKuaishou TechnologyBin YangKuaishou TechnologyZhenghai XueNanyang Technology University2024 北京站背景和动机传统推荐系统长期价值优化:户与系统的多次交互 Markov Decision Process(MDP)Recommender SystemUserRequestVideo ListImmediateResponseOpen/ReturnAPPLeaveAPPOpen/ReturnAPPrequest tsession
3、isession i+1return time背景和动机传统推荐系统长期价值优化:户与系统的多次交互 Markov Decision Process(MDP)户的交互历史和当前的推荐请求 户状态(i.e.state)推荐列表作为动作 action期价值标 最化未来多步户 reward 的期望(i.e.discounted cumulative reward maximization)解决案:强化学习Recommender SystemUserRequestVideo ListImmediateResponseOpen/ReturnAPPLeaveAPPOpen/ReturnAPPrequest
4、 tsession isession i+1return time背景和动机传统推荐系统长期价值优化:户与系统的多次交互 Markov Decision Process(MDP)户的交互历史和当前的推荐请求 户状态(i.e.state)推荐列表作为动作 action期价值标 最化未来多步户 reward 的期望(i.e.discounted cumulative reward maximization)解决案:强化学习户留存/回访优化:必要性:是户对系统期期望和认同感的有效评估段,与 DAU 直接相关挑战:o Label 延迟性o时 天o 信号稀疏性o每 session 仅个 labelo 不
5、确定性o户在 app 外为法观测Recommender SystemUserRequestVideo ListImmediateResponseOpen/ReturnAPPLeaveAPPOpen/ReturnAPPrequest tsession isession i+1return time留存优化的现有案RLUR 1:actor-critic 框架留存 Critic:预估未来的回访信号 reward1 Cai,Qingpeng,et al.Reinforcing user retention in a billion scale short video recommender syste
6、m.Companion Proceedings of the ACM Web Conference 2023.留存优化的现有案RLUR 1:actor-critic 框架留存 Critic:预估未来的回访信号 reward即时反馈 Critic 预估:假设单次推荐好坏和惊喜性也会积累成 reward,最终对户回访造成影响1 Cai,Qingpeng,et al.Reinforcing user retention in a billion scale short video recommender system.Companion Proceedings of the ACM Web Conf