《Learning Substructure Invariance for Out-of-Distribution Molecular Representations.pdf》由会员分享,可在线阅读,更多相关《Learning Substructure Invariance for Out-of-Distribution Molecular Representations.pdf(25页珍藏版)》请在三个皮匠报告上搜索。
1、DataFunSummitDataFunSummit#20232023Learning Substructure Invariance for Out-of-Distribution Molecular Representations Presented by Nianzu Yang,PhD candidate SJTU-ReThinkLabFormulation:denotes the support of environments,is the prediction model and represents a loss function.The risk function under a
2、 given environment e:Background-OoD1MoleOODNianzu Yang Out-of-Distribution Generalization:Assume that there is a potential environment variable accounting for the distribution shift between the training and testing data.In general cases the goal is to predict the target label given the associated in
3、put .Background-Invariant Learning2MoleOODNianzu Yang Invariant Learning is an emerging line for solving the OOD generalization problem.These methods propose to find an invariant predictor that could uncover invariant relationships between inputs and targets across all environments.The invariant pre
4、dictor aims to learn an invariant representation satisfying such a invariance principle.Invariance Principle:1)sufficiency:shows sufficient predictive power for the target2)invariance:contributes to equal performance for the downstream tasks across all environmentsA molecular graph can be represente
5、d as ,where is the graphs node set corresponding to atoms constituting the molecule and denotes the graphs edge sets corresponding to chemical bonds.Background-MRL3MoleOODNianzu Yang Molecular Representation Learning(MRL)aims at embedding a molecule into a vector in latent space as a foundation mode
6、l,on top of which the learned representations could be used for a variety of downstream tasks.SMILES-based methodsStructure-based methodsOoD Molecular Represention Learning4MoleOODNianzu Yang OOD General Formulation:OoD on MRL:Motivating Examples5MoleOODNianzu YangKey Observation:the(bio)chemical pr
7、operties of a molecule are usually associated with a few privileged molecular substructuresthe shared hydroxy(-OH)/carboxy(-COOH)good water solubilityEnvironment Inference6MoleOODNianzu Yang Reasons for necessity Manual specifications of the environments may be unavailableLabeling is time-consuming
8、Directly utilizing existing environment labels may be problematicThere is few molecules per environment on average.A Variational Inference-based method Introduce a variational distribution to approximate The learning objective:Invariant Predictor7MoleOODNianzu Yang Goal:minimize the expectation of r
9、isks from different environments known in the training data:from the perspective of information theory Treating the outputs of and as distribution and :the molecule encoder :the final predictor :the denotation of The equivalent tractable objective in practical instantiation:8MoleOODNianzu YangTheore
10、tical Justification Theorem 1.With treated as a variational distribution,minimizing term contributes to ,letting show equal performance for the downstream tasks across all environments,i.e.Theorem 2.Regarding as a variational distribution,minimizing term equals to ,letting show sufficient predictive
11、 power for downstream tasks.Overview of MoleOOD9MoleOODNianzu Yang(a)Environment Inferencequery(b)Molecular Representation LearningComplete Encoder Substructure Encoder decomposePredictorsubstructuresattentivepooling OHEnvironment Classifier Conditional GNN OH two-stage training strategy to search f
12、or optimal parameters 1)optimizing the environment-inference model:2)optimizing the molecule encoder and the predictor:Experiments on OGB benchmark10MoleOODTable.ROC-AUC results on four datasets from OGB benchmark MoleOOD achieves consistent significant improvements across four read-world datasets w
13、ith different backbones(GCN,GIN and GraphSAGE)our method can achieve up to 5.9%improvementNianzu YangExperiments on DrugOOD benchmark11MoleOODTable.ROC-AUC results for six datasets from DrugOOD benchmarkNianzu YangDrugOOD provides more diverse splitting indicators than OGB,including assay,scaffold a
14、nd sizeExcept on IC50-size,our method outperforms all baselines across all datasets our method can achieve up to 3.9%improvementAblation Study12MoleOODTable.Ablation study on EC50-Assay/Scaffold/Size datasetsNianzu YangWe analyze the contributions of different model components to the final performan
15、ce.Conclusion13MoleOODNianzu Yang Proposes to leverage the invariance principle which opens a new perspective for handling substructure-aware distribution shifts.Practical applicability for molecular OOD learning where the manual specifications of the environments are often unavailable.Extensive exp
16、eriments on ten public datasets demonstrate our model yields consistent and significant improvements.Combinatorial Drug RecommendationThinkLabMoleRecNianzu Yang et al.Accuracy is important!Safety is also important!Overview of MoleRecThinkLabMoleRecNianzu Yang et al.MoleRecThinkLabMoleRecNianzu Yang
17、et al.Three components:1.Patient Representation Module:Encode the longitudinal diagnosis and procedure information of patients2.Medication Representation Module:Generate substructure-aware representations for drugs by patients different condition3.Prediction Module:Responsible for making prescriptio
18、n only using substructure-aware representations of drugs Medication Representation ModuleThinkLabMoleRecNianzu Yang et al.Substructure Interaction Module(SIM):Model high-order interactions among substructuresSubstructure Relevancy Module(SRM):Responsible for explicitly modeling the degree at which t
19、he treatment is dependent on each substructureLearning ObjectiveThinkLabMoleRecNianzu Yang et al.Multi-Label Prediction Loss:DDI Loss:Combined Controllable Loss Function:Weight Annealing for DDI Loss(WA)ThinkLabMoleRecNianzu Yang et al.Linear A dj ustingW A1 ExperimentsThinkLabMoleRecNianzu Yang et
20、al.Performance ComparisonExperimentsThinkLabMoleRecNianzu Yang et al.Ablation StudyOpen SourceThinkLabMoleRecNianzu Yang et al.MoleRecPyHealthConclusionThinkLabMoleRecNianzu Yang et al.We propose a novel substructure-aware attentive method to explicitly model drug information at the substructure lev
21、el in medication combination recommendation.We introduce an adaptive weight adjusting approach based on annealing to handle the constrained optimization problem of drug recommendation considering both accuracy and safety criteria.Our method shows notable improvement in accuracy and safety over state-of-the-art methods on MIMIC-III,with ablation studies confirming the effectiveness of our two new techniques.DataFunSummitDataFunSummit#20232023Thanks!