《基于因果推断的推荐系统.pdf》由会员分享,可在线阅读,更多相关《基于因果推断的推荐系统.pdf(86页珍藏版)》请在三个皮匠报告上搜索。
1、基于因果推断的推荐系统高宸清华大学 信息国家研究中心https:/ 2023:因果推断在线峰会推荐与因果推断论坛Background2 2 Why is causal inference needed in recommender system?Chen Gao et al.Causal inference in recommender systems:A survey and future directionsJ.arXiv preprint arXiv:2208.12397,2022.Outline3 3 Disentangled learning for user interest an
2、d conformity Disentangled learning for long-term and short-term interests Debiasing in short-video recommendationDisentangling User Interest and Conformity for Recommendation with Causal EmbeddingY.Zheng,Chen Gao,et al.Disentangling user interest and conformity for recommendation with causal embeddi
3、ngC/Proceedings of the Web Conference 2021.2021:2980-2991.4Background5 5 What are the causes behind each user-item interaction?There are two main causes:InterestConformitya best-sellerbuybuyhigh salestire,speed,.How users tend to follow other peopleGoal:Learn disentangled representations for interes
4、t and conformityMotivation6 6 Why learning disentangled representations?Causal recommendation under non-IID situations!IID:independent and identically distributed Robustness Recommenders are trained and updated in real-time Training data and test data are not IID Interpretability Improve user-friend
5、liness Facilitates algorithm developingtraining datatest datarepresentationCausal Recommendation7 7 Inverse Propensity Scoring(IPS)1propensityscore Propensity score is estimated from item popularity Intuition:impose lower weights on popular items,andboost unpopular items Interest and popularity are
6、bundled as one unifiedrepresentationTwo factors are entangled!1 Yang,L.,Cui,Y.,Xuan,Y.,Wang,C.,Belongie,S.,&Estrin,D.(2018,September).Unbiased offline recommender evaluation for missing-not-at-random implicit feedback.In Proceedings of the 12th ACM Conference on Recommender Systems(pp.279-287).Causa
7、l Recommendation8 8 Causal Embeddings(CausE)1 Require a large fraction of biased data and a small fractionof unbiased data Perform two MF on biased and unbiased data,respectively Impose L1/L2 regularization on two MFMF on smallunbiased dataMF on largebiased dataregularizationon two MFStill entangled
8、 representations!1 Bonner,S.,&Vasile,F.(2018,September).Causal embeddings for recommendation.In Proceedings of the 12th ACM conference on recommender systems(pp.104-112).Variety of conformity Conformity depends on both users and items One users conformity varies on different items,and conformity tow
9、ards one item varies for different users Learning disentangled representations is intrinsically hard Only observational data is accessible.No ground-truth for user interest.An interaction can come from one or both factor Careful designs are needed for combining the two factors tomake recommendations
10、.Disentangling interest and conformity9 9Methodology:Our DICE Model Disentangling Interest and Conformity with Causal Embedding(DICE)Challenge 1:Variety of conformity Our proposal:Adopt separate embeddings of interest and conformity for users and items 10 Benefit 1:Embedding proximity in high dimens
11、ional space can express the variety of conformity(challenge 1 addressed)Benefit 2:Independent modeling of interest and conformityinterestembeddingconformityembeddinguseritemMethodology:Our DICE Model Disentangling Interest and Conformity with Causal Embedding(DICE)Challenge 2:Learning disentangled r
12、epresentations is intrinsically hard Our proposal:Utilize the colliding effect from causal inference to obtain cause-specific data.11Intuition:Train interest/conformity embeddings with interactions that are caused by interest/conformityMethodology:Our DICE Model Disentangling Interest and Conformity
13、 with Causal Embedding(DICE)Challenge 3:Aggregation of the two factors is complicated Our proposal:Leverage multi-task curriculum learning to combine the two causes.12interestembeddingconformityembeddinguseritemdiscrepancylossconformitylossinterestlossclicklossconcat(a)Causal Graph(b)Causal Embeddin
14、gMethodology:Our DICE Model Disentangling Interest and Conformity with Causal Embedding(DICE)Causal Embedding Disentangled Representation Learning Multi-task Curriculum Learning13interestembeddingconformityembeddinguseritemdiscrepancylossconformitylossinterestlossclicklossconcat(a)Causal Graph(b)Cau
15、sal EmbeddingMethodology:Our DICE Model Causal graph and Structural Causal Model(SCM)14interestembeddingconformityembeddinguseritemdiscrepancylossconformitylossinterestlossclicklossconcat(a)Causal Graph(b)Causal Embeddingcausal graphSCMMethodology:Our DICE Model Causal embedding Separate embeddings
16、for interest and conformity User:(#$),(&()Item:(#$),(&()Use inner product to compute matching score Predict click by combining two causes15interestembeddingconformityembeddinguseritemdiscrepancylossconformitylossinterestlossclicklossconcat(a)Causal Graph(b)Causal EmbeddingMethodology:Our DICE Model
17、Mining cause-specific data with causal inference Immorality and collider16ABC Colliding effect A and B are independent A and B are NOT independent when conditioned on CimmoralitycolliderMethodology:Our DICE Model Mining cause-specific data with causal inference e.g.A:whether a student is talented B:
18、whether a student is hard-working C:whether a student passes an exam17ABC Bob passes the exam,and Bob is not talentedHe is hard-working with high probability Alice doesnt pass the exam,and Alice is talentedShe is most likely not hard-workingMethodology:Our DICE Model Mining cause-specific data with
19、causal inference The colliding effect can come to help!Click is the collider of interest and conformity!18interestembeddingconformityembeddinguseritemdiscrepancylossconformitylossinterestlossclicklossconcat(a)Causal Graph(b)Causal Embedding Use popularity as a proxy for conformity A clicked item wit
20、h low popularityhigh interest An unclicked item with high popularitylow interestMethodology:Our DICE Model Notation):interest matching probability matrix*:conformity matching probability matrix19Case 1:clicks a popular item,doesnt click an unpopular item Case 2:clicks an unpopular item,doesnt click
21、apopular item Methodology:Our DICE Model20 :whole training set(,):user,pos item,neg item!:negative samples more popular than positive samples:negative samples less popular than positive samples=+,+,Solution:train different embeddings with different cause-specific datainterestembeddingconformityembed
22、dinguseritemdiscrepancylossconformitylossinterestlossclicklossconcat(a)Causal Graph(b)Causal EmbeddingMethodology:Our DICE Model Main task:estimating clicks21+,Methodology:Our DICE Model Interest modeling Only use interest embedding22,interestembeddingconformityembeddinguseritemdiscrepancylossconfor
23、mitylossinterestlossclicklossconcat(a)Causal Graph(b)Causal EmbeddingMethodology:Our DICE Model Conformity modeling Only use conformity embedding23+,interestembeddingconformityembeddinguseritemdiscrepancylossconformitylossinterestlossclicklossconcat(a)Causal Graph(b)Causal EmbeddingMethodology:Our D
24、ICE Model Discrepancy task direct supervision on disentanglement24 L1-inv:1(#$,-.#)L2-inv:2(#$,-.#)distance correlation:#$,-.#=(#$,-.#)(#$)8(-.#)interestembeddingconformityembeddinguseritemdiscrepancylossconformitylossinterestlossclicklossconcat(a)Causal Graph(b)Causal EmbeddingMethodology:Our DICE
25、Model Multi-task learning25 Popularity based Negative Sampling with Margin(PNSM)Popularity of the positive item:Sample negative items with popularity:Larger than +Lower than Large:high confidence on inequalities,easy Small:low confidence on inequalities,hard Curriculum learning:an easy-to-hard strat
26、egy decay,and by a factor of 0.9 after each epochExperiments26 Datasets:Movielens-10M Netflix Evaluation:non-IID protocol(same with CausE1):Train:60%normal+10%intervened Validation:10%intervened Test:20%intervened Metrics:Recall,Hit Ratio,NDCG Recommendation models MF2 LightGCN31 Bonner,S.,&Vasile,F
27、.(2018,September).Causal embeddings for recommendation.In Proceedings of the 12th ACM conference on recommender systems(pp.104-112).2 Rendle,S.,Freudenthaler,C.,Gantner,Z.,&Schmidt-Thieme,L.(2012).BPR:Bayesian personalized ranking from implicit feedback.arXivpreprint arXiv:1205.2618.3 He,X.,Deng,K.,
28、Wang,X.,Li,Y.,Zhang,Y.,&Wang,M.(2020,July).Lightgcn:Simplifying and powering graph convolution network for recommendation.In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval(pp.639-648).Experiments27 RQ1:How does our proposed DICE framew
29、ork perform compared with state-of-the-art causal recommendation methods under non-IID circumstances?RQ2:Can the proposed DICE framework guarantee interpretability?RQ3:Can the proposed DICE framework guarantee robustness?Experiments2828 Overall ComparisonExperiments2929 Observations Our proposed DIC
30、E framework outperforms baselines with significant improvements with respect to all metrics on both datasets.Experiments3030 DICE is a highly general framework which can be combined with various recommendation models.ObservationsExperiments3131 Interpretability Conformity embeddings of items with di
31、fferent popularity form layers.Experiments3232 Interest embeddings of items with different popularity are uniformly distributed in the space.InterpretabilityExperiments3333 Conformity embeddings largely captures conformity,and interest embeddings squeeze out conformity InterpretabilityExperiments343
32、4 Robustness Test data with different strength of intervention DICE is more robust than IPS based method under different levels of interventionConclusion and Future Work3535 We propose to learn disentangled representations of user interest and conformity for recommendation with tools of causal infer
33、ence.A general framework DICE is developed which shows great robustness and interpretability under non-IID situations.Future work Extend DICE to incorporate more features.Learn disentangled representations for finer-grained user interest,e.g.price preference,brand preference Codes can be found at:ht
34、tps:/ Long and Short-Term Interests for Recommendation36Y.Zheng,Chen Gao*,et al.Disentangling long and short-term interests for recommendationC/Proceedings of the ACM Web Conference 2022.2022:2256-2267.User interests are difficult to track in recommender systems Stable:long-term interests Dynamic:sh
35、ort-term interests E.g.Background3737Long-term interestsShort-term interestsTDistinguishing between long and short-term(LS-term)interests is critical for recommendation accuracy!LS-term modeling can achieve adaptive recommendation Capture long and short-term interests separately Predict the current
36、driven interests Recommend according to the current interests Continuously browse the same category-short-term Switch between different categories-long-termMotivation3838Goal:Disentangle long and short-term interestsExisting solutions3939 Long-term:Collaborative Filtering(CF)NCF1,LightGCN2 A unified
37、 vector to represent interests Ignore short-term interests Short-term:Sequential recommenders GRU4REC3,SASRec4 Learn from item sequences Forget long-term interestsOnly capture one single aspect!1 He,X.,Liao,L.,Zhang,H.,Nie,L.,Hu,X.,&Chua,T.S.(2017,April).Neural collaborative filtering.In WWW 2017.2
38、He,X.,Deng,K.,Wang,X.,Li,Y.,Zhang,Y.,&Wang,M.(2020,July).Lightgcn:Simplifying and powering graph convolution network for recommendation.In SIGIR 2020.3 Hidasi,B.,Karatzoglou,A.,Baltrunas,L.,&Tikk,D.(2015).Session-based recommendations with recurrent neural networks.In ICLR 2016.4 Kang,W.C.,&McAuley,
39、J.(2018,November).Self-attentive sequential recommendation.In ICDM 2018.useriteminteractionitem sequenceuserExisting solutions4040 Mixed methods Only a few:SLi-Rec5 Directly combine CF and sequential methods Insufficient supervision on the learned LS-term interestsLS-term interests are entangled!5 Y
40、u,Z.,Lian,J.,Mahmoody,A.,Liu,G.,&Xie,X.(2019,August).Adaptive user modeling with long and short-term preferences for personalized recommendation.In IJCAI 2019.CFSequentialuser embedding LS-term interests reflect quite different aspects of user preferences Long-term:overall preferences,stable Short-t
41、erm:dynamic preferences,evolve rapidly Lacking labeled data for LS-term interests Collected data only contains implicit interactions No ground-truth for LS-term interests The importance of LS-term interests is uncertain Contribution of two kinds of interests varies ondifferent user-item interactions
42、Challenges4141Methodology:Our CLSR Model Contrastive learning framework of Long and Short-term interests for Recommendation(CLSR)Challenge 1:different dynamics of LS-term interests Our proposal:capture the two aspects with separatemechanisms42Long-term:time invariantShort-term:evolves from the last
43、timestampMethodology:Our CLSR Model Contrastive learning framework of Long and Short-term interests for Recommendation(CLSR)Challenge 1:different dynamics of LS-term interests Our proposal:capture the two aspects with separatemechanisms43$AttentionPooling,$RNNEvolution GRUAttentionPooling(),Long-ter
44、m encoderShort-term encoderchallenge 1 addressedEmbedding LayerLong-term Encoder Short-term Encoder,$Methodology:Our CLSR Model Contrastive learning framework of Long and Short-term interests for Recommendation(CLSR)Challenge 2:lacking supervision on LS-term interests Our proposal:self-supervise LS-
45、term interests with contrastive learning44$Embedding LayerWhole History Mean PoolingRecent History Mean Pooling,short-term proxylong-term proxyConstruct proxy labels for LS-term interests from interaction sequences themselves.Methodology:Our CLSR Model Contrastive learning framework of Long and Shor
46、t-term interests for Recommendation(CLSR)Challenge 2:lacking supervision on LS-term interests Our proposal:self-supervise LS-term interests with contrastive learning45Contrastive Loss,challenge 2 addressed Force the interest representation to be more similar to its corresponding proxy than the oppos
47、ite one Force the interest proxies to be more similar to its corresponding representation than the opposite oneMethodology:Our CLSR Model Contrastive learning framework of Long and Short-term interests for Recommendation(CLSR)Challenge 3:variant importance of LS-term interests Our proposal:fuse LS-t
48、erm interests adaptively Predict the importance with an attention network,considering historical items and the target item.46challenge 2 addressed$contextLong-term Encoder Short-term Encoder Fusion Predictor GRUAttention Network historytargetchallenge 3 addressedMethodology:Our CLSR Model Contrastiv
49、e learning framework of Long and Short-term interests for Recommendation(CLSR)Interaction prediction Concat user and target item MLP Model training Negative Log-likelihood for interaction prediction Jointly optimized with contrastive loss47Methodology:Our CLSR Model Contrastive learning framework of
50、 Long and Short-term interests for Recommendation(CLSR)48contrastive disentanglementseparateencodersadaptivefusionExperiments49 Datasets Taobao Kuaishou Baselines Long-term NCF1,DIN6,LightGCN2 Short-term Caser7,GRU4REC3,DIEN8,SASRec4,SURGE9 LS-term SLi-Rec5 Metrics AUC,GAUC,MRR,NDCG6 Zhou,G.,Zhu,X.,
51、Song,C.,Fan,Y.,Zhu,H.,Ma,X.,.&Gai,K.(2018,July).Deep interest network for click-through rate prediction.In KDD 2018.7 Tang,J.,&Wang,K.(2018,February).Personalized top-n sequential recommendation via convolutional sequence embedding.In WSDM 2018.8 Zhou,G.,Mou,N.,Fan,Y.,Pi,Q.,Bian,W.,Zhou,C.,.&Gai,K.(
52、2019,July).Deep interest evolution network for click-through rate prediction.In AAAI 2019.9 Chang,J.,Gao,C.,Zheng,Y.,Hui,Y.,Niu,Y.,Song,Y.,.&Li,Y.(2021,July).Sequential recommendation with graph neural networks.In SIGIR 2021.Experiments50 RQ1:How does the proposed framework perform compared with sta
53、te-of-the-art recommendation models?RQ2:Can CLSR achieves stronger disentanglement of LS-term interests against existing unsupervised baselines?RQ3:What is the effect of different components in CLSR?Experiments5151 Overall Comparison(RQ1)Short-term models generally performs better than long-term mod
54、els.It is critical to capture the sequential pattern of user interests.Experiments5252 Overall Comparison(RQ1)Joint modeling of LS-term interests does not always bring performance gains.It is insufficient to disentangle LS-term interests with no explicit supervision.Experiments5353 Overall Compariso
55、n(RQ1)Our proposed CLSR achieves the best performance compared against all baseline methods.Disentangled modeling of LS-term interests can achieve significant improvements.Experiments5454 Study on disentanglement of LS-term interests(RQ2)Comparison between using single and both interests.CLSR outper
56、forms SLi-Rec in all cases.Combining LS-term interests achieves better performance than using only one-side interests.Experiments5555 Study on disentanglement of LS-term interests(RQ2)Performance of predicting different behaviors CLSR has much lower than SLi-Rec in all cases.Short-term interests are
57、 generally more important than long-term interests,which is consistent with RQ1.High in SLi-Rec indicates that long-term interests representations contain much information of the undesired short-term interests,i.e.entanglement.Experiments5656 Ablation and hyper-parameter study(RQ3)Contrastive learni
58、ng Self-supervised contrastive learning is necessary for disentangling LS-term interests.Experiments5757 Ablation and hyper-parameter study(RQ3)Contrastive learning Self-supervised contrastive learning can help improve the performance of DIEN.CLSR is a general framework for learning LS-term interest
59、s.Experiments5858 Ablation and hyper-parameter study(RQ3)Contrastive learning Too large loss weight for contrastive learning may contradict with the main interaction prediction task.Experiments5959 Ablation and hyper-parameter study(RQ3)Adaptive fusion vs.fixed fusion Adaptive fusion outperforms all
60、 different values of fixed.It is necessary to fuse LS-term interests adaptively,and the attention network successfully achieves this goal.Conclusion and Future Work6060 We propose contrastive learning of long and short-term interests with self-supervision,which attains better overall performance com
61、pared with existing recommendation approaches.Experiments demonstrate that disentanglement of LS-term interests is critical for accurate recommendation.Future work Extend CLSR to include other designs of encoders and proxies for LS-term interest.Codes can be found at:https:/ Recommendation Optimizin
62、g Watch-Time-Gain under Duration Bias61Y.Zheng,Chen Gao*,et al.DVR:micro-video recommendation optimizing watch-time-gain under duration biasC/Proceedings of the 30th ACM International Conference on Multimedia.2022:334-345.Todays micro-video platforms,such as TikTok and Kuaishou,have been taking the
63、majority of Internet traffic.The number of micro-videos uploaded per day:million-scale-recommendation Existing approaches largely inherit the traditional video recommendation setup.1Background62621 Covington,Paul,Jay Adams,and Emre Sargin.Deep neural networks for youtube recommendations.In Proceedin
64、gs of the 10th ACM conference on recommender systems,pp.191-198.2016.How to measure user satisfaction and activeness on a recommended micro-video?What target should we train the model to predict?Existing works usually use watch time/percentage.Micro-videos with larger predicted watch time/percentage
65、 are ranked higher.Evaluation Metric6363Watch Time Longer Watch Time Shorter Watch Time TrainingServing Longer watch time does not necessarily indicate that the user is more interested in the micro-video.Watch time is biased towards videos with long duration.Motivation6464Watch time is inaccurate!Re
66、commend too many micro-videos that do not match user preference but with long duration.Different users upload micro-videos of different duration.Long-video publishers:Vlogs.Short-video publishers:Short funny videos.Motivation6565Watch time is unfair!Long-video publishers receive much more recommenda
67、tion traffic than short-video publisher.RQ1.How to measure users satisfaction and activeness towards micro-videos in an unbiased way?RQ2.How to learn unbiased user preferences on micro-videos of different duration and provide accurate recommendation?Challenges Micro-videos of different duration can
68、not be compared directly.The structural differences between recommendation models vary widely,the bias alleviation design is supposed to be general and model-agnostic.Research Questions6666Data Analysis6767 We adopt a real-world micro-video dataset.Wechat Channels We select 7 representative and stat
69、e-of-the-art models.LibFM2 Wide&Deep3 DeepFM4 NFM5 AFM6 AutoInt7 AFN8 We train the models to predict watch time.2 Rendle,Steffen.Factorization machines with libfm.ACM Transactions on Intelligent Systems and Technology(TIST)3,no.3(2012):1-22.3 Cheng,Heng-Tze,Levent Koc,Jeremiah Harmsen,Tal Shaked,Tus
70、har Chandra,Hrishi Aradhye,Glen Anderson et al.Wide&deep learning for recommender systems.In Proceedings of the 1st workshop on deep learning for recommender systems,pp.7-10.2016.4 Guo,Huifeng,Ruiming Tang,Yunming Ye,Zhenguo Li,and Xiuqiang He.DeepFM:a factorization-machine based neural network for
71、CTR prediction.arXivpreprint arXiv:1703.04247(2017).5 He,Xiangnan,and Tat-Seng Chua.Neural factorization machines for sparse predictive analytics.In Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval,pp.355-364.2017.6 Xiao,Jun,Hao Ye,Xiang
72、nan He,Hanwang Zhang,Fei Wu,and Tat-Seng Chua.Attentional factorization machines:Learning the weight of feature interactions via attention networks.arXiv preprint arXiv:1708.04617(2017).7 Song,Weiping,Chence Shi,Zhiping Xiao,Zhijian Duan,Yewen Xu,Ming Zhang,and Jian Tang.Autoint:Automatic feature in
73、teraction learning via self-attentive neural networks.In Proceedings of the 28th ACM International Conference on Information and Knowledge Management,pp.1161-1170.2019.8 Cheng,Weiyu,Yanyan Shen,and Linpeng Huang.Adaptive factorization network:Learning adaptive-order feature interactions.In Proceedin
74、gs of the AAAI Conference on Artificial Intelligence,vol.34,no.04,pp.3609-3616.2020.Data Analysis6868 Distribution shift of recommendation results Predicted and groundtruth watch time of micro-videos in different bins(1 second per bin).Models tend to amplify the duration bias(larger slope).Recommend
75、 too many micro-videos with long duration.Data Analysis6969 Inaccurate recommendation due to the distribution shift.Bad Cases(BC):groundtruth watch time The user prefers video A and has little interest in B.71 The watch time can be used as a metric only when it is compared with other data points of
76、micro-videos with similar duration.Methodology:WTG WTG:relative user engagement on a micro-video compared with the average engagement of all users on micro-videos with a similar duration.Divide all the micro-videos into equally wide bins according to their duration.72 Compare the watch time using da
77、ta points within each bin,rather than all bins.Methodology:WTG WTG:standardize watch time within each bin,making it independent with micro-video duration.73 Micro-videos of different duration can be compared by WTG.A short video of 1 WTG is better than a long video with 0.5WTG,even though the origin
78、al watch time of the long video might be longer.WTG distributes more uniformly and does NOT favor long or short videos.Methodology:WTG WTG:an unbiased metric measuring the user engagement of one single data sample.How to evaluate the performance of a list in top-recommendation with WTG?WTG:the avera
79、ge of the groundtruth WTG of the top recommended micro-videos74 DCWTG:assign larger weights to higher positionsMethodology:DVR An unbiased learning framework:Debiased VideoRecommendation(DVR)It is model-agnostic and it has no preset requirements for backbone models.75Any off-the-shelf recommendation
80、 modelsAdversarial debiasing modelMethodology:DVR An unbiased learning framework:Debiased VideoRecommendation(DVR)Input features User profiles Micro-video attributes Context features Delete duration from input features.Shortcut in existing recommendation models due to duration bias.Eliminate duratio
81、n bias fundamentally from the source.76Methodology:DVR An unbiased learning framework:Debiased VideoRecommendation(DVR)Prediction target Watch Time Gain(WTG)Alternative solution(DVR-):Predict watch time and then transform the predicted value to WTG.It is more difficult to predict watch time accurate
82、ly due to the unbalanced distribution of watch time and duration.77Methodology:DVR An unbiased learning framework:Debiased VideoRecommendation(DVR)Duration bias can still hide implicitly in the data.E.g.,correlation with other features Adversarial learning Add an extra regression layer to predict du
83、ration from the estimated WTG.Train the backbone model to output WTG that is independent of duration.78Methodology:DVR An unbiased learning framework:Debiased VideoRecommendation(DVR)Algorithm79Experiments80 Datasets Wechat Channels Kuaishou Baselines LibFM2 Wide&Deep3 DeepFM4 NFM5 AFM6 AutoInt7 AFN
84、8 Metrics WTG DCWTG#BCExperiments8181 Accuracy comparison between WTG and Watch Time(RQ1)We train two versions of each model Use watch time as target Use WTG as target Rank videos according to estimated watch time/WTG.Compare average groundtruth watch time of recommended top-micro-videos.With WTG as
85、 the target,models can generate better recommendation of both long and short micro-videos.Experiments8282 Fairness comparison between WTG and Watch Time(RQ1)We train two versions of each model Use watch time as target Use WTG as target Rank videos according to estimated watch time/WTG.Histogram on t
86、he duration of recommended micro-videos.Models trained with watch time mainly concentrate on the long duration side.Using WTG leads to more balanced recommendation traffic.Experiments8383 Effectiveness of DVR(RQ2)We combine DVR with all seven backbone models.Existing models without any debiasing des
87、ign achieves worse performance.DVR brings steady improvements in all cases.Experiments8484 Effectiveness of DVR(RQ2)Ablation study on three key debiasing designs.DD:delete duration from input features.WTG:use WTG as prediction target ADV:adversarial learning The three key designs eliminate duration
88、bias from three different perspectives,which are input(10%/50%),output(largest improvements),and model itself(10%).Conclusion and Future Work8585 We conduct a large-scale analysis to show that duration bias leads to inaccurate and unfair recommendation of micro-videos.A new measurement of watch time
89、 on micro-videos,WTG,is proposed which eliminates duration bias and can evaluate recommendation performance without favoring either long or short videos.A general framework DVR is further designed to help recommendation models learn unbiased user preferences.Future work Apply WTG and DVR in online systems.Codes can be found at:https:/ Disentangled learning for user interest and conformity Disentangled learning for long-term and short-term interests Debiasing in short-video https:/