Generative Bayesian Networks for Augmentation of Molecular Data from Commercial Genetic Panels

Generative Bayesian Networks for Augmentation of Molecular Data from Commercial Genetic Panels

Dillon Tracy, Jeff Sherman, Maayan Baron

SUMMARY

● We introduce a generative Bayesian network method for synthesizing annotated patient feature profiles using a constrained set of genes from limited real-world molecular data, looking specifically at somatic mutations and lung and breast cancer.

Abstract #7373

● This approach addresses challenges posed by widely clinically available, yet molecularly sparse tumor data, enhancing the value of established real-world clinicogenomic datasets and potentially advancing precision oncology through personalized treatment guidance, enriched data analysis and novel biomarker identification.

BACKGROUND

● This issue of molecular sparsity is exacerbated by earlier assays, resulting in real-world clinicogenomic databases that are very rich in longitudinal clinical follow-up, but restricted in their applicability to research pursuits such as biomarker discovery.

● The number of genes on commercial NGS panels continues to increase over time, reflecting the discovery of more biomarkers in cancer research and the translation of these discoveries into clinical practice.

\downarrow\mathrm{r e}},

● We hypothesized that by modeling the joint distribution of both observed and unobserved molecular features in a large tumor cohort using a Bayesian network and Gibbs sampling, we could effectively infer and synthesize comprehensive mutational profiles for tumors with otherwise limited data from commercial NGS panels (Fig. 1).

Plausible samples with larger geneset

Figure 2. A directed graph inferred from TCGA LUSC mutations was generated using findr [1]. Only highly connected nodes (above degree 15) shown. Nodes are limited to 12 outgoing and 4 incoming links. The complete graph has 689 genes and 2644 edges. The fitted network model tabulates or predicts (depending on size) the mutation probability for each node, conditioned on its parents’ states. The model’s conditional probability table for the SOX10 vertex is shown.

Characterizing Drug Response Using Generated Patient Mutational Data One application of this type of generative model is in downstream modeling and biomarker discovery. Using an internal drug response prediction model we found variations in augmented profiles typically induced small perturbations to modeled drug response (Fig. 4). Interestingly, when outputs were discordant between limited and expanded actual gene panel inputs, synthetic data were more concordant with results from expanded panels (Fig. 5). Moreover, synthetic data enables

parent child coeff
FLI1 SOX10 0.645
IRF2 SOX10 0.779
NFE2 SOX10 0.473
PTPRO SOX10 0.779

A Marginal mutation probabilities by gene, TCGA LUSC cohort

GENIE-DFCI-004310-577

Figure 4. Consistency in drug sensitivity predictions using real and synthesized mutational profiles for fulvestrant in a BRCA patient. Fulvestrant drug response prediction scores were generated for a single BRCA patient using 5000 profiles (metapanel, 757g) generated from an actual 190-gene set. Predictive performance for this drug response model was assessed using actual data as input from a 190-gene (panel, 190g) and 757-gene panel (envelope, 757g). The outcome demonstrates high consistency between real and synthesized profiles (higher AUC = increased resistance).

Patient GENIE - MSK - P - 0000106 - T01 - IM3 | tamoxifen Panel mutations in FLCN, TP53

Figure 5. Tamoxifen response predictions between synthetic and expanded gene panels are concordant. Tamoxifen response scores were obtained for an additional BRCA patient using a synthetic 757-gene (metapanel, 757g) generated from an actual 190 gene mutation panel result for this patient. Predictions were discordant between the panel (panel, 190g; predicted sensitive) and the generated profile (envelope, 757g; predicted resistant), highlighting the impact in predictive power of larger gene panels. Remarkably, the insensitive peak in the predicted response distribution (metapanel, 757g) closely matched the actual 757g panel, demonstrating the robustness of our method. Solid line demarcates the sensitive/insensitive boundary for the binary classifier whose feature importance appears in the SHAP analysis (right). Positive and red SHAP values indicate REL mutations are linked to tamoxifen sensitivity.

  1. Wang L, Audenaert P, Michoel T. High-Dimensional Bayesian Network Inference From Systems Genetics Data Using Genetic Node Ordering. Front Genet. 2019 Dec 20;10:1196. doi: 10.3389/fgene.2019.01196. PMID: 31921278; PMCID: PMC6933017.
  2. Lee J, Choi MK, Song IS. Recent Advances in Doxorubicin Formulation to Enhance Pharmacokinetics and Tumor Targeteing. Pharmaceuticals (Basel). 2023 May 29;16(6):802. doi: 10.3390/ph16060802. PMID: 37375753; PMCID: PMC10301446.
  1. Malash, I., Mansour, O., Gaafar, R. et al. Her2/EGFR-PDGFR pathway aberrations associated with tamoxifen response in metastatic breast cancer patients. J Egypt Natl Canc Inst 34, 31 (2022). https://doi.org/10.1186/s43046-022-00132-5
  2. Chouhan S, Singh S, Athavale D, Ramteke P, Vanuopadath M, Nair BG, Nair SS, Bhat MK. Sensitization of hepatocellular carcinoma cells towards doxorubicin and sorafenib is facilitated by glucose dependent alterations in reactive oxygen species, P-glycoprotein and DKK4. J Biosci. 2020;45:97. PMID: 32713860.
  3. Williams MM, Cook RS. Bcl-2 family proteins in breast development and cancer: could Mcl-1 targeting overcome therapeutic resistance? Oncotarget. 2015 Feb 28;6(6):3519-30. doi: 10.18632/oncotarget.2792. PMID: 25784482; PMCID: PMC4414133.

2020;45:97. PMID: 32713860. 6. Williams MM, Cook RS. Bcl-2 family proteins in breast development and cancer: could Mcl-1 targeting overcome therapeutic resistance? Oncotarget. 2015 Feb 28;6(6):3519-30. doi: 10.18632/oncotarget.2792. PMID: 25784482; PMCID: PMC4414133. 7.