EvAM-Tools

 


Table of contents



EvAM-Tools


EvAM-Tools is an R package and Shiny web app that provides tools for evolutionary accumulation, or event accumulation, models. We use code from “Cancer Progression Models” (CPM) but these are not limited to cancer (the key idea is that events are gained one by one, but not lost). EvAM-Tools is also available as an R package (see https://github.com/rdiaz02/EvAM-Tools).

This web interface provides a GUI to the package and focuses on allowing fast construction, manipulation, and exploration of CPM models, and making it easy to gain an intuitive understanding of what these methods infer from different data sets as well as what kind of data are to be expected under these models. You can analyze your data, create cross-sectional data from scratch (by giving genotype frequencies), or generate synthetic data under different CPMs. You can compare results from different methods/models, as well as experiment and understand the consequences of changes in the input data on the returned inferences. You can also examine how a given method performs when data have been generated under another (or its own) model. Additional examples of use are discussed in https://github.com/rdiaz02/EvAM-Tools#some-examples-of-use and in the “EvAM-Tools: examples” additional documentation file.

Funding: Supported by grant PID2019-111256RB-I00 funded by MCIN/AEI/10.13039/501100011033 and Comunidad de Madrid’s PEJ-2019-AI/BMD-13961 to R. Diaz-Uriarte.

micin-aei logo

 

A two-paragraph summary about cross-sectional data and CPMs


In cross-sectional data a single sample is obtained from each subject or patient. That single sample represents the “observed genotype” of, for example, the tumor of that patient. Genotype can refer to single point mutations, insertions, deletions, or any other genetic modification. In this app, as is often done by CPM software, we store cross-sectional data in a matrix, where rows are patients or subjects, and columns are genes; the data is a 1 if the event was observed and 0 if it was not.

Cancer progression models (CPMs) or, more generally, event accumulation models, use these cross-sectional data to try to infer restrictions in the order of accumulation of events; for example, that a mutation on gene B is always preceded by a mutation in gene A (maybe because mutating B when A is not mutated). Some cancer progression models, such as MHN, instead of modeling deterministic restrictions, model facilitating/inhibiting interactions between genes, for example that having a mutation in gene A makes it very likely to gain a mutation in gene B. A longer explanation is provided in What CPMs are included in EvAM-Tools?, below, and many more details in EvAM-Tools: methods’ details and FAQ. Finally, note we have talked about “genotype” and “mutation”, but CPMs have been used with non-genetic data too, and thus our preference for the expression “event accumulation models”; as said above, the key idea is that events are gained one by one, but not lost, and that we can consider the different subjects/patients in the cross-sectional data as replicate evolutionary experiments or runs where all individuals are under the same constraints (e.g., genetic constraints if we are dealing with mutations).

  


How to use this web interface?



Web app: overview of workflow and functionality


The figure below provides an overview of the major workflows with the web app:


Overview EvAM-Tools web app

The web app encompasses, thus, different major functionalities and workflows, mainly:

  1. Inference of CPMs from user data uploaded from a file.

  2. Exploration of the inferences that different CPM methods yield from manually constructed synthetic data.

  3. Construction of CPM models (DAGs with their rates/probabilities and MHN models) and simulation of synthetic data from them.

    3.1. Examination of the consequences of different CPM models and their parameters on the simulated data.

    3.2. Analysis of data simulated under one model with methods that have different models (e.g., data simulated from CBN analyzed with OT and OncoBN).

    3.3. Analysis of data simulated under one model with manual modification of specific genotype frequencies prior to analyses (e.g., data simulated under CBN but where, prior to analysis, we remove all observations with the WT genotype and the genotype with all loci mutated).

Furthermore, note that in all cases, when data are analyzed, in addition to returning the fitted models, the web app also returns the analysis of the CPMs in terms of their predictions such as predicted genotype frequencies and transition probabilities between genotypes.

The figure below highlights the different major functionalities and workflows, as numbered above, over-imposed on the previous figure:


Overview EvAM-Tools web app, with main functionalities highlighted.

We explain now in more detail the functionality, options, input, and output, of the web app. Commented examples that illustrate each of those workflows are provided in the EvAM-Tools: examples additional documentation file.

  


User input


To start using the web app, go first to the User input tab (on top of the page). Here you can:  

  • Enter cross-sectional data directly by either:
    • Uploading a file.
    • Entering genotype frequencies manually
       
  • Generate cross-sectional data from CPM models. Follow these steps:
    1. Specify the CPM model first. You can use:

      1.1. Models that use DAGs to specify restrictions: OT, OncoBN (in both its conjunctive and disjunctive versions), CBN and H-ESBCN (H-ESBCN allows you to model AND, OR, and XOR dependency relationships). You will specify the DAG and the rates (CBN, H-ESBCN)/conditional probabilities (OT, OncoBN) of events conditional on their parents.

      1.2. MHN, that models inhibiting/facilitating relationships between genes using baseline hazard rates and multiplicative effects between genes (specified in the log-Θ matrix).  

    2. Simulate data from the CPM model. In addition to the number of samples, you can specify the amount of observational noise (and, for OT and OncoBN, deviations from the model).
       

      Note that simulating data from CPMs allows you to get an intuitive feeling for what different CPM models and their parameters mean in terms of the genotype frequency data they produce.

   

Cross-sectional data that have been uploaded or simulated from CPM models can be further modified by altering genotype counts. Moreover, it is possible to specify cross-sectional data and DAG/MHN models with user-specified gene names. Finally, from the “User input” tab you can also save the cross-sectional data.

To make it easier to play with the tool, we provide predefined cross-sectional data sets under “Enter genotype frequencies manually”, as well as predefined DAG and MHN models (from which you can generate data by clicking on “Generate data from DAG [MHN model]”). You can also modify the predefined DAGs and MHNs before generating data.

  


Analyze data: Run evamtools


  1. Change, if you want, the options under “Advanced options and CPMs to use” (on the right of the screen). These options include what CPM methods to use as well as parameters of the methods.
  2. Click on “Run evamtools”.
  3. Results will be shown in the Results tab.

  


Results


The results include:  

  • The fitted CPMs themselves, including the DAGs with their rates/conditional probabilities (depending on the model) and the MHN log-Θ matrix.  

  • Predictions derived from the fitted models, including:  

    • Transition probabilities: conditional probability of transition to a genotype (obtained using competing exponentials from the transition rate matrix for all methods except OT and OncoBN). For OT and OncoBN this is actually an abuse of the untimed oncogenetic tree model; see the Evamtools: methods’ details and FAQ for details.

    • Transition rates: for models that provide them (CBN, H-ESBCN, MHN) transition rates of the continuous-time Markov chain that models the transition from one genotype to another. This option is not available for OT and OncoBN, as these do not return rates.

    • Predicted genotype relative frequencies: the predicted genotype frequencies from the fitted models.

    • Sampled genotype counts: Counts, or absolute genotype frequencies obtained by generating a finite sample (of the size you chose) with the probabilities given by the predicted genotype frequencies. If you add noise, the sampled genotype counts include observational (e.g., genotyping) noise.

<!-- See remove_note_sogt_1 -->	

 

The results are displayed using a combination of figures and tabular output. Specifically:

  • The first row of figures shows the fitted CPMs: DAGs with their rates/probabilities and MHN log-Θ matrix.

    • The edges of the DAGs are annotated with the lambda (CBN, HESBCN), weight (OT) or θ (OncoBN).

    • Remember: for DAGs, these are DAGs that have genes (not genotypes) as nodes. They represent the order restrictions of the events.

    • For MHN there is no DAG of restrictions; we show the fitted log-Θ matrix rounded to two decimal places. The diagonal entries are the log-baseline rates, and the off-diagonal the log of the multiplicative effects of the effector event (the columns) on the affected event (rows).

    • You can represent the results of all the fitted models or only of a subset (select those using “CPMs to show”).     

  • The second row of figures shows the predictions derived from the fitted models. These same predictions are also displayed in tabular output on the bottom right. On the left side panel (“Customize the visualization”), you choose what predictions you want to display.   

  • The plots that show Transition probabilities and Transition rates (again, on the second row of figures) have genotypes (not genes) as nodes.

    • You can show, for these transition plots, only some of the most relevant paths; again, modify options under “Customize the visualization”.
    • These plots might include genotypes never observed in the sample; these are shown in light green.
    • For easier visualization, in very busy plots, instead of the Genotypes you might want to show the last gene (or event) mutated or gained; change this options under “Type of label”.
    • (As visualizing the acquisition of mutations in a complex network can be challenging, for the transition probabilities/rates plots we use the representation of the hypergraph transition graph from HyperTraPS — Greenbury et al., 2020. HyperTraPS: Inferring probabilistic patterns of trait acquisition in evolutionary and disease progression pathways. Cell systems, 10, 39–51, https://doi.org/10.1016/j.cels.2019.10.009)

To help interpret the results, we also show a histogram of the genotype counts of the analyzed data.

Finally, you can also download the tabular results, fitted models, and the analyzed data. To download figures, either use screen captures or use your web browser to download them (e.g., right click on a figure to obtain a menu with a “Save image as” entry —if you need higher resolution or original PDF images, you will need to use the R package itself).

  


Additional documentation


Additional documents are available from https://rdiaz02.github.io/EvAM-Tools .

For users of the web app, the most relevant are: EvAM-Tools: examples and EvAM-Tools: methods’ details and FAQ.

  


Example files for upload


In https://github.com/rdiaz02/EvAM-Tools/tree/main/examples_for_upload there are several files in CSV format ready to be used as examples for upload. The two files mentioned in the documentation are: ov2.csv and BRCA_ba_s.csv.

  


Session timeouts, RAM and elapsed time execution limits, aborting a run


  • Timeouts: Inactive connections will timeout after 2 hours. The page will become gray, and if you refresh (e.g., F5 in most browsers) after this time, you will not get back your results, figures, etc, but start another session.

  • RAM and time limits: Maximum RAM of any process is limited to 2 GB. Likewise, the analyses should be aborted after 1.5 hours of elapsed (not CPU —we parallelize the runs) time. If you want to use the Shiny app without these limits, install a local copy. (To modify the time limit, change the value of variable EVAM_MAX_ELAPSED, in the definition of function “server”, in file “server.R”. The RAM limit is imposed on the Docker containers we use; to remove it, run Docker without the memory limit.) Note: because of what we do to enforce these limits, running over limits might not be signalled by an explicit error, but rather by a graying out or a complete refresh of the session.

  • Aborting a run: Sometimes you might want to abort a run (e.g., you might have accidentally sent a run that will take a very long time). This is not possible if you run in our servers. What if you do not care about the long running, not-yet-fished run, and want to start a new one? If, from the same computer and browser, you open a new tab to https://iib.uam.es/evamtools it is very likely that the request will be served by the exact same session and docker process as the previous run; thus, if R has not finished running, you would have to wait for the previous session to finish (and even connecting to a Shiny session might not work).

    To continue using EvAM-Tools you can try one or more of these:

    • Force a refresh or reload of the page (e.g., “Ctrl + Shift + r”, “Ctrl + F5”).
    • Close the browser, and open it again.
    • Start a new connection from a different web browser.
    • Start a new connection from an incognito session of the same web browser.
    • (Note: some of the above, with some browsers, can return a “Proxy error” message. This is due to time outs: until the busy shiny/R process is not done, the connection to that very shiny/R process might fail. Just use a different web browser.)
    • Use a different computer.

    None of those will abort the old process; that old process will eventually finish or be aborted. But you will be able to continue using EvAM-Tools. (That said, please make considerate use of this service: it is provided free of charge, so do not abuse it.)

  


How long does it take to run?


It depends on the number of genes or features and methods used. For six genes, and if you do not use H-ESBCN nor MC-CBN, it should take about 20 seconds. If you do not use CBN either (i.e., if you only use MHN, OT, and OncoBN) it should run in less than 8 seconds. Model fitting itself is parallelized, but other parts of the program cannot be (e.g., displaying the final figures).

  


What CPMs are included in EvAM-Tools?


  • Oncogenetic Trees (OT): Restrictions in the accumulation of mutations (or events) are represented as a tree. Hence, a parent node can have many children, but children have a single parent. OTs are untimed (edge weights represent conditional probabilities of observing a given mutation, when the sample is taken, given the parents are observed).

  • Conjuntive Bayesian Networks (CBN): This model generalizes the tree-based restriction of OT to a directed acyclic graph (DAG). A node can have multiple parents, and it denotes that all of the parents have to be present for the children to appear. Therefore, relationships are conjuntive (AND relationships between the parents). These are timed models, and the parameters of the models are rates given that all parents have been observed. We include both H-CBN as well as MC-CBN.

  • Hidden Extended Suppes-Bayes Causal Networks (H-ESBCN): Somewhat similar to CBN, but it includes automatic detection of logical formulas AND, OR, and XOR. H-ESBCN is used by its authors as part of Progression Models of Cancer Evolution (PMCE). Like CBN, it returns rates.

  • OncoBN: Similar to OT, in the sense of being an untimed oncogenetic model, but allows both AND (the conjunctive or CBN model) and OR relationships (the disjunctive or DBN model).

  • Mutual Hazard networks (MHN): With MHN dependencies are not deterministic and events can make other events more like or less likely (inhibiting influence). The fitted parameters are multiplicative hazards that represent how one event influences other events.

For details, please see the EvAM-Tools: methods’ details and FAQ.

 


Default options and default CPMs run


  • In the Shiny app, by default we run CBN, OT, OncoBN, and MHN. If you want to run H-ESBCN or MC-CBN, or not run some of the above methods, (de)select them under Advanced options and CPMs to use. (H-ESBCN or MC-CBN are not run by default, as they can take a long time).
  • OncoBN can be run using a conjunctive or a disjunctive model. The default used in the Shiny app (and the evam function in the package) is the disjunctive model. You can use the conjunctive one by selecting it under Advanced options and CPMs to use, in OncoBN options, Model.
  • Most methods have other options that can be modified. Again, check Advanced options and CPMs to use.

References and related repositories


OT
  • Desper, R., Jiang, F., Kallioniemi, O. P., Moch, H., Papadimitriou, C. H., & Sch"affer, A A (1999). Inferring tree models for oncogenesis from comparative genome hybridization data. J Comput Biol, 6(1), 37–51.

  • Szabo, A., & Boucher, K. M. (2008). Oncogenetic Trees. In W. Tan, & L. Hanin (Eds.), Handbook of Cancer Models with Applications (pp. 1–24). : World Scientific.

  • Oncotree R package: https://CRAN.R-project.org/package=Oncotree

 

H-CBN and MC-CBN

 

MHN

 

H-ESBCN (PMCE)

 

OncoBN (DBN)

 

Conditional prediction of genotypes and probabilities of paths from CPMs

 


Where is the code? Terms of use. Citing. Copyright


The complete source code for the package and the shiny app, as well information about how to run the shiny app locally, is available from https://github.com/rdiaz02/EvAM-Tools.

This app is free to use, but please cite it if you use it; see Citing EvAM-Tools. Confidentiality and security: if you have confidential data, you might want not to upload it here, and instead install the package locally.

 

Authors, contact and bug reports

Most of the files for this app (and the package) are copyright Ramon Diaz-Uriarte and Pablo Herrera-Nieto (and released under the Affero GPL v3 license —https://www.gnu.org/licenses/agpl-3.0.html) except some files for HESBCN, MHN, and the CBN code; see full details in https://github.com/rdiaz02/EvAM-Tools#copyright-and-origin-of-files.

For bug reports, please, submit them using the repository https://github.com/rdiaz02/EvAM-Tools.

 

Citing EvAM-Tools

If you use EvAM-Tools (the package or the web app), please cite the Bioinformatics paper:

In addition, if possible, also provide a link to the web app itself, https://iib.uam.es/evamtools (if you used the web app) or the code repository, https://github.com/rdiaz02/EvAM-Tools.

 


Funding


Supported by grant PID2019-111256RB-I00 funded by MCIN/AEI/10.13039/501100011033 and Comunidad de Madrid’s PEJ-2019-AI/BMD-13961 to R. Diaz-Uriarte.

micin-aei logo

 


Cookies


We use cookies to keep “sticky sessions” to the pool of servers (load balanced using HAproxy). By using the app, you confirm you are OK with this.

Cross-sectional data. Upload, create, generate, modify:


Enter
cross-sectional data:



Examples and user's data:




evamtools R package version: 2.1.16
(See additional details for all options in the help of the evam and sample_evam functions available from the 'Package evamtools' help files in the Additional documentation).

Beware: MCCBN may take hours to run. H-ESBCN often takes longer than the remaining methods (except MCCBN) for small numbers of genes (5 or less). For 7 or more genes, CBN can be much slower than OT, OncoBN, or MHN (e.g., data analyzed in < 1 second by those three methods can take 45 with CBN), and also often slower than H-ESBCN.

(Paths to the maximum/maxima and their probabilities. These are not part of the tabular or graphical output (because of their possibly huge number) but if requested are included in the result object you can download)

Generate a finite sample of genotypes according to the predicted frequencies of the model.
Number of genotypes to generate when generating a finite sample of genotypes according to the predicted frequencies of from model.
If > 0, the proportion of observations in the sampled matrix with error (for instance, genotyping error). This proportion of observations will have 0s flipped to 1s, and 1s flipped to 0s.

MHN options

Lambda: penalty term in fitting algorithm. Default = 1/number of rows of data set. (Do not enter anything, unless you want to use a value different from the default).

OT options

For large models this may take quite some time

CBN options

Number of OMP threads; large numbers do necessarily lead to faster computations.

H-ESBCN options

Number of MCMC iterations: Argument '-n | --number_samples' in the H-ESBCN C code. EvAM's web app default is 200000, larger than the original default of 100000. You might want to increase it to 500000 or 1000000 being aware that this will result in longer running times.

OncoBN options

Epsilon: Penalty term for mutations not conformig to estimated network. Default is min(colMeans(data)/2). (Do not enter anything, unless you want to a value different from the default).

MCCBN options

Data name

Download CPM results
and analyzed data

evamtools R package version: 2.1.16