《面向图数据分布外泛化的因果表示学习.pdf》由会员分享,可在线阅读,更多相关《面向图数据分布外泛化的因果表示学习.pdf(58页珍藏版)》请在三个皮匠报告上搜索。
1、DataFunSummit#2023面向图数据分布外泛化的因果表示学习陈永强-香港中文大学-博士研究生Yongqiang ChenCUHK,Tencent AI Lab2Towards Causal Representation Learning for Out-of-Distribution Generalization on Graphswith Yatao Bian,Yonggang Zhang,Kaiwen Zhou,Binghui Xie,Tongliang Liu,Bo Han,and James ChengOutOut-ofof-Distribution Generalizati
2、onDistribution GeneralizationModels trained with Empirical Risk Minimization(ERM)are often:-prone to spurious correlations-can hardly generalize to OOD data 3OutOut-ofof-Distribution GeneralizationDistribution GeneralizationThe goal of OOD generalization is:minf:XYmaxeEallLe(f)given a subset of trai
3、ning environments/domains ,where each corresponds to a dataset and a loss.Etr EallE EallDeLe4OutOut-ofof-Distribution GeneralizationDistribution GeneralizationLeveraging the Invariance Principle from causality,previous approaches aim to learn an invariant predictor:minf=w!eEtrLe(w ),s.t.w argminwLe(
4、w ),e Etr,that is simultaneously optimal across different environments/domains.(Peters et al.,2015;Arjovsky et al.,2019;Bottou et al.,2021;)5OutOut-ofof-Distribution GeneralizationDistribution Generalization(Peters et al.,2015;Arjovsky et al.,2019;Rosenfeld et al.,2021;Kamath et al.,2021;Ahuja et al
5、.,2021;)6OutOut-ofof-Distribution Generalization on GraphsDistribution Generalization on GraphsX78OutOut-ofof-Distribution Generalization on GraphsDistribution Generalization on Graphs(Knyazev et al.2019;Hu et al.,2020;Koh et al.,2021;Gui et al.,2022;Chen et al.,2022)A Graph Neural Network(GNN)makes
6、 predictions taking both structure-level and attribute-levelfeatures into account.9OutOut-ofof-Distribution Generalization on GraphsDistribution Generalization on Graphs(Knyazev et al.2019;Hu et al.,2020;Koh et al.,2021;Gui et al.,2022;Chen et al.,2022)OOD generalization on graphs is fundamentally m
7、ore challenging than that on Euclidean data:Structure-level shiftsAttribute-level shiftsMixture of structure-level and attribute-level shifts10OutOut-ofof-Distribution Generalization on GraphsDistribution Generalization on Graphs(Knyazev et al.2019;Hu et al.,2020;Koh et al.,2021;Gui et al.,2022;Chen
8、 et al.,2022)A Graph Neural Network(GNN)makes predictions taking both structure-level and attribute-levelfeatures into account.11OutOut-ofof-Distribution Generalization on GraphsDistribution Generalization on Graphs(Knyazev et al.2019;Hu et al.,2020;Koh et al.,2021;Gui et al.,2022;Chen et al.,2022)A
9、 Graph Neural Network(GNN)makes predictions taking both structure-level and attribute-levelfeatures into account.12OutOut-ofof-Distribution Generalization on GraphsDistribution Generalization on Graphs(Knyazev et al.2019;Hu et al.,2020;Koh et al.,2021;Gui et al.,2022;Chen et al.,2022)A Graph Neural
10、Network(GNN)makes predictions taking both structure-level and attribute-levelfeatures into account.13OutOut-ofof-Distribution Generalization on GraphsDistribution Generalization on Graphs(Knyazev et al.2019;Hu et al.,2020;Koh et al.,2021;Gui et al.,2022;Chen et al.,2022)14OutOut-ofof-Distribution Ge
11、neralization on GraphsDistribution Generalization on Graphs(Knyazev et al.2019;Hu et al.,2020;Koh et al.,2021;Gui et al.,2022;Chen et al.,2022)As existing approaches are downHow can we define and capture the invariance on graphs?Can we train a GNN that is generalizable to OOD data?15CIGA:Causality I
12、nspired Invariant Graph CIGA:Causality Inspired Invariant Graph LeArningLeArning16CIGA:Causality Inspired Invariant Graph CIGA:Causality Inspired Invariant Graph LeArningLeArning17CIGA:Causality Inspired Invariant Graph CIGA:Causality Inspired Invariant Graph LeArningLeArning18CIGA:Causality Inspire
13、d Invariant Graph CIGA:Causality Inspired Invariant Graph LeArningLeArning19CIGA:Causality Inspired Invariant Graph CIGA:Causality Inspired Invariant Graph LeArningLeArning20CIGA:Causality Inspired Invariant Graph CIGA:Causality Inspired Invariant Graph LeArningLeArning21CIGA:Causality Inspired Inva
14、riant Graph CIGA:Causality Inspired Invariant Graph LeArningLeArning22CIGA:Causality Inspired Invariant Graph CIGA:Causality Inspired Invariant Graph LeArningLeArning23CIGA:Causality Inspired Invariant Graph CIGA:Causality Inspired Invariant Graph LeArningLeArning24CIGA:Causality Inspired Invariant
15、Graph CIGA:Causality Inspired Invariant Graph LeArningLeArning25CIGA:Causality Inspired Invariant Graph CIGA:Causality Inspired Invariant Graph LeArningLeArning26CIGA:Causality Inspired Invariant Graph CIGA:Causality Inspired Invariant Graph LeArningLeArning27CIGA:Causality Inspired Invariant Graph
16、CIGA:Causality Inspired Invariant Graph LeArningLeArning28A Short Summary of CIGAA Short Summary of CIGASpotlight Presentation at NeurIPS 202229To appear at NeurIPS 2023.Spotlight Presentation at ICLR23 Domain Generalization workshop.30OutOut-ofof-Distribution Generalization on GraphsDistribution Ge
17、neralization on GraphsInvariant graph representation learning aims to identify an invariant subgraph among graphs from different environments or domains:(Wu et al.,2022ab;Miao et al.,2022;Chen et al.,2022)Environment#1:Class“House”Environment#2:Class“House”31OutOut-ofof-Distribution Generalization o
18、n GraphsDistribution Generalization on GraphsInvariant graph representation learning aims to identify an invariant subgraph among graphs from different environments or domains:(Wu et al.,2022ab;Miao et al.,2022;Chen et al.,2022)Environment#1:Class“House”Environment#2:Class“House”Extractor“House”Extr
19、acted Invariant SubgraphClassifier32The“Free Lunch Dilemma”in OOD Generalization on GraphsThe“Free Lunch Dilemma”in OOD Generalization on GraphsHowever,the environments or domains:information are usually expensive to obtain for graph structured data:(Wu et al.,2022ab;Miao et al.,2022;Chen et al.,202
20、2)Environment#?:Class“House”Environment#?:Class“House”Extractor“House”Extracted“Invariant”SubgraphClassifier3334The“Free Lunch Dilemma”in OOD Generalization on GraphsThe“Free Lunch Dilemma”in OOD Generalization on GraphsLet us considering the data generative model with Partial Informative Invariant
21、Features:(Yu et al.,2021;Miao et al.,2022;)“House”Invariant correlationSpurious correlation35The“Free Lunch Dilemma”in OOD Generalization on GraphsThe“Free Lunch Dilemma”in OOD Generalization on GraphsOne line of works aim to generate new environments based on the existing extracted subgraphs:(Wu et
22、 al.,2022ab;Liu et al.,2022)ExtractorExtracted“Invariant”SubgraphEnvironment#1:Class“House”Environment#2:Class“House”GeneratorEnvironment#3:Class“House”36The“Free Lunch Dilemma”in OOD Generalization on GraphsThe“Free Lunch Dilemma”in OOD Generalization on GraphsOne line of works aim to generate new
23、environments based on the existing extracted subgraphs:ExtractorExtracted“Invariant”SubgraphGeneratorEnvironment#3:Class“House”Environment#?:Class“House”Environment#?:Class“House”More severe biases!(Wu et al.,2022ab;Liu et al.,2022)37The“Free Lunch Dilemma”in OOD Generalization on GraphsThe“Free Lun
24、ch Dilemma”in OOD Generalization on GraphsAnother line of works aim to infer environment labels for learning the underlying invariance:InferenceEnvironment#?:Class“House”Environment#?:Class“House”(Li et al.,2022;Yang et al.,2022)Environment#1:Class“House”Environment#2:Class“House”38The“Free Lunch Di
25、lemma”in OOD Generalization on GraphsThe“Free Lunch Dilemma”in OOD Generalization on GraphsAnother line of works aim to infer environment labels for learning the underlying invariance:InferenceEnvironment#?:Class“Grid”Environment#?:Class“Grid”(Li et al.,2022;Yang et al.,2022)Environment#1:Class“Grid
26、”Environment#2:Class“Grid”What if the underlyinginvariant subgraph is REVERSED?39Impossibility Results for OOD Generalization on GraphsImpossibility Results for OOD Generalization on GraphsOOD generalization on graphs is fundamentally more challenging than that on Euclidean data:?Environment#?:Class
27、“House”Environment#?:Class“House”It is fundamentally impossible to identify the underlying invariant subgraph without further inductive biases.4041Failures of Environment GenerationFailures of Environment GenerationHow can we address environments generation failures?ExtractorExtracted Invariant Subg
28、raphInvariant SubgraphEnvironment#?:Class“House”Environment#?:Class“House”(Wu et al.,2022ab;Liu et al.,2022)For any spurious subgraph,there exists For any spurious subgraph,there exists two underlying environments,such two underlying environments,such thatthat the spurious correlation varies.the spu
29、rious correlation varies.42Failures of Environment InferenceFailures of Environment InferenceHow can we address environment inference failures?Environment#?:Class“House”Environment#?:Class“House”(Li et al.,2022;Yang et al.,2022)Environment#?:Class“Grid”Environment#?:Class“Grid”ORORFor all For all en
30、vironments,environments,either spurious either spurious correlation is correlation is stronger or stronger or weaker.weaker.EitherEither43Invariant Graph Learning with Minimal AssumptionsInvariant Graph Learning with Minimal AssumptionsHow can we address environment inference failures?*ZIN:When and
31、How to Learn Invariance by Environment Inference?For all environments,either spurious For all environments,either spurious correlation is stronger or weaker.correlation is stronger or weaker.For any spurious subgraph,there exists For any spurious subgraph,there exists two underlying environments,suc
32、h two underlying environments,such thatthat the spurious correlation varies.the spurious correlation varies.Environment Generation?Environment Inference?*More assumptions needed!44Invariant Graph Learning with Minimal AssumptionsInvariant Graph Learning with Minimal AssumptionsHow can we address env
33、ironment inference failures?For all environments,either spurious For all environments,either spurious correlation is stronger or weaker.correlation is stronger or weaker.For any spurious subgraph,there exists For any spurious subgraph,there exists two underlying environments,such two underlying envi
34、ronments,such thatthat the spurious correlation varies.the spurious correlation varies.Environment Generation?Environment Inference?Invariant correlation stronger:CIGA(Chen et al.,2022)Spurious correlation stronger:DisC(Fan et al.,2022)(Lin et al.,2022;Fan et al.,2022;Chen et al.,2022)More assumptio
35、ns needed!45GALA:invariant GALA:invariant GrAphGrAph Learning AssistantLearning AssistantTo begin with,we need to first understand the reasons for the failures of CIGA:CycleHouseGnGp“Cycle”“House”Training DataProxy PredictionsGraph Labels“House”“Cycle”“Cycle”“House”Supervised Contrastive LearningGra
36、ph Labels“House”“Cycle”46GALA:invariant GALA:invariant GrAphGrAph Learning AssistantLearning AssistantImproving the contrastive invariant subgraph extraction via an Environment Assistant:CycleHouseGnGp“Cycle”“House”Training DataProxy PredictionsGraph Labels“House”“Cycle”47GALA:invariant GALA:invaria
37、nt GrAphGrAph Learning AssistantLearning AssistantConsider the following dataset dominated by spurious features:Step 1.Obtain Environment Assistant PredictionsStep 1.Obtain Environment Assistant Predictions48GALA:invariant GALA:invariant GrAphGrAph Learning AssistantLearning AssistantConsider the fo
38、llowing dataset dominated by spurious features:Step 2.Contrasting samples Step 2.Contrasting samples from the positive from the positive and negative graphs with the same classand negative graphs with the same class49ProofProof-ofof-Concept ExperimentsConcept ExperimentsGiven the same data generatio
39、n process,and the Given the same data generation process,and the aforementioned aforementioned variationvariation sufficiency sufficiency and and variation consistency variation consistency assumptions,when the environment assistant model learns assumptions,when the environment assistant model learn
40、s properly distinguishes the variations of the spurious subgraphs,GALA provably properly distinguishes the variations of the spurious subgraphs,GALA provably identifies the invariant subgraph for OOD generalization.identifies the invariant subgraph for OOD generalization.Stronger spurious correlatio
41、nsStronger invariant correlations50RealReal-World ExperimentsWorld ExperimentsGALA consistently improves the OOD generalization performance under various real-world graphdistribution shifts on a number of realistic graph benchmarks:51A Short Summary of GALAA Short Summary of GALATo appear at NeurIPS
42、 2023Spotlight Presentation at ICLR23 Domain Generalization workshopWe conducted a retrospective study on the faithfulness of the augmented environment information for OOD generalization on graphs.By showing the impossibility results,we developed a set of minimal assumptions for feasible invariant g
43、raph learning.We proposed a provable feasible approach GALA under the assumptions.Extensive experiments with 11 datasets verified the superiority of GALA.Workshop ver.(Soon)52Ongoing work.53The Interpretable And(OOD)Generalizable GNN ArchitecturesThe Interpretable And(OOD)Generalizable GNN Architect
44、uresInvariant graph representation learning adopts the interpretable GNN architecture that aims to identify an invariant subgraph among graphs from different environments or domains:(Wu et al.,2022ab;Miao et al.,2022;Chen et al.,2022,2023)Environment#1:Class“House”Environment#2:Class“House”Extractor
45、“House”Extracted Invariant SubgraphClassifier54The Interpretable And(OOD)Generalizable GNN ArchitecturesThe Interpretable And(OOD)Generalizable GNN ArchitecturesInvariant graph representation learning adopts the interpretable GNN architecture that aims to identify an invariant subgraph among graphs
46、from different environments or domains:(Wu et al.,2022ab;Miao et al.,2022;Chen et al.,2022,2023)G Dtr30%25%20%25%!Gc=EGcgGGcCycleHousefc(!Gc)=fc(EGcgGGc)Soft Subgraph ClassificationSoft Subgraph ClassificationSoft Subgraph ExtractionSoft Subgraph Extraction55Expressivity Issue of Interpretable GNNsE
47、xpressivity Issue of Interpretable GNNsComputing the mutual information between the“soft subgraph”is critical and can be formulated with Subgraph Multilinear Extension(Wu et al.,2022ab;Miao et al.,2022;Chen et al.,2022,2023)G Dtr30%25%20%25%!Gc=EGcgGGcCycleHousefc(!Gc)=fc(EGcgGGc)Soft Subgraph Class
48、ificationSoft Subgraph ClassificationSoft Subgraph ExtractionSoft Subgraph Extraction25%20%25%30%EGcgGfc(Gc)56GMT:Graph Multilinear NetworkGMT:Graph Multilinear Network(Wu et al.,2022ab;Miao et al.,2022;Chen et al.,2022,2023)G Dtr30%25%20%25%!Gc=EGcgGGcCycleHousefc(!Gc)=fc(EGcgGGc)Soft Subgraph Clas
49、sificationSoft Subgraph ClassificationSoft Subgraph ExtractionSoft Subgraph Extraction25%20%25%30%EGcgGfc(Gc)=GMTThe theoretical connection also motivates to maximize the expressivity of interpretable GNNs,by more accurately approximating the Subgraph Multilinear Extension57Preliminary Experimental
50、ResultsPreliminary Experimental ResultsWith improved expressivity,GMT can improve both interpretability and OOD generalizability.Interpretability(OOD)generalizability58CIGA:causal representation on graphs is a promising paradigm for the challenging OOD generalization problem on graphsGALA:causal rep
51、resentation on requires proper modeling of graph data generative process as well as inductive biases on the latentsGMT:the architecture is also critical and can pose a bottleneck for learning graph causal representationsContact:yqchencse.cuhk.edu.hk Thank you!Website Summary&Future DirectionsSummary&Future Directions