Skip to content
/ P2C2M2 Public

R package to test the multi-species coalescent model using trees estimated with starBEAST2 and to estimate false-positive error rates based on samples from the posterior predictive distributions

Notifications You must be signed in to change notification settings

arleyc/P2C2M2

Repository files navigation

P2C2M2

P2C2M2 provides functions to read default output from BEAST 2 (Bouckaert et al. 2019) and StarBEAST2 (Ogilvie et al. 2017) and conduct posterior predictive checks of coalescent models (Reid et al. 2014) based on the functions originally provided in the P2C2M package (Gruenstaeudl et al. 2016) to parse tree and xml files from BEAST and *BEAST. It also implements estimation of type-I (false-positive) error rates using pseudo-observed datasets (“pods”) sampled from the posterior predictive distribution.

Installation

You can install the development version of P2C2M2 from GitHub with:

# install.packages("devtools")
devtools::install_github("arleyc/P2C2M2")

Input files

In order to generate the tree files annotated with the required branch rate estimates, the XML file has to be modified by hand to include the following code before running BEAST 2:

#For strict clocks:

Insert the following:
substitutions=“true” branchratemodel=“@StrictClock.c:Locus_Name”
after id=“TreeWithMetaDataLogger.t:Locus_Name” and before spec=“beast.base.evolution.TreeWithMetaDataLogger”

#For UCLN or RLC clocks:

Insert the following:
substitutions=“true”
after id=“TreeWithMetaDataLogger.t:Locus_Name” and before spec=“beast.base.evolution.TreeWithMetaDataLogger”

Example

Basic example showing how to run the full pipeline with a single function using the provided dataset to test the multispecies coalescent model and to estimate false positive error rates per locus and across loci:

library(P2C2M2)

# The absolute path to the input directory is set
inPath <- system.file("extdata", "thomomys/", package="P2C2M2")

# The name of the xml-file generated by BEAUTi and located in 
# "inPath" is set
inFile <- "thomomys.xml"

# Posterior predictive simulations with a setting of 2 simulation 
# replicates are preformed
suppressWarnings(thomomys <- p2c2m2.complete(inPath, inFile, num.reps=2, error.rate = TRUE))
#> 
#> bindings, loci, and alignments read
#> 
#> locus 26 is done
#> 
#> locus 29 is done
#> 
#> locus 47 is done
#> 
#> locus 53 is done
#> 
#> locus 59 is done
#> 
#> locus 64 is done
#> 
#> locus 72 is done
#> 
#> gene trees done
#> 
#> species trees done
#> 
#> Calculating metrics for 3 trees of the posterior distribution
#> 
#> Calculating metrics for the trees of the post. predictive distribution
#> 
#> Calculating differences btw. the posterior and the post. predictive distribution
#> 
#> Calculating differences btw. 1000 pods and the post. predictive distribution to estimate error rates

thomomys$results$alpha0.01
#> $perGene
#>    LCWT[2]                NDC[2]               
#> 26 "-14.27 (±37.45) n.s." "-2 (±7.21) n.s."    
#> 29 "-20.41 (±38.11) n.s." "-2.33 (±10.41) n.s."
#> 47 "9.85 (±2.45) *"       "2.33 (±4.16) n.s."  
#> 53 "-20.49 (±2.29) *"     "5.33 (±2.08) *"     
#> 59 "-3.14 (±7.53) n.s."   "-2.33 (±3.06) n.s." 
#> 64 "-22.06 (±7.76) *"     "10 (±6) *"          
#> 72 "10.64 (±28.26) n.s."  "-8.67 (±7.23) *"    
#> 
#> $acrGenes
#>        LCWT[2]                NDC[2]               
#> Sum    "-43.29 (±93.14) n.s." "-0.57 (±20.15) n.s."
#> Mean   "-6.18 (±13.31) n.s."  "-0.08 (±2.88) n.s." 
#> Median "-14.21 (±18.06) n.s." "0.86 (±5.52) n.s."  
#> Mode   "8.04 (±26.61) n.s."   "-2.57 (±11.84) n.s."
#> CV     "0.8 (±2.36) n.s."     "-5.77 (±6.53) n.s." 
#> 
#> $perGene.error
#>    LCWT[2] NDC[2]
#> 26   0.657  0.678
#> 29   0.689  0.678
#> 47   0.689  0.665
#> 53   0.689  0.657
#> 59   0.654  0.678
#> 64   0.654  0.665
#> 72   0.689  0.678
#> 
#> $acrGenes.error
#>        LCWT[2] NDC[2]
#> Sum      0.657  0.665
#> Mean     0.657  0.665
#> Median   0.000  0.000
#> Mode     0.000  0.000
#> CV       0.657  0.665

thomomys$results$legend
#> [1] "Differences between the posterior and the posterior predictive distributions per locus and across loci. Each cell contains the following information in said order: mean, standard deviation, significance level. Error rates (if estimated with option error.rate=TRUE) are based on differences between the pods and the posterior predictive distributions. Codes in square brackets indicate the number of tails. Alpha values are automatically adjusted for the number of tails."

References

Bouckaert, R., Vaughan, T.G., Barido-Sottani, J., Duchêne, S., Fourment, M., Gavryushkina, A., Heled, J., Jones, G., Kühnert, D., De Maio, N., Matschiner, M., Mendes, F.K., Müller, N.F., Ogilvie, H.A., du Plessis, L., Popinga, A., Rambaut, A., Rasmussen, D., Siveroni, I., Suchard, M.A., Wu, C.H., Xie, D., Zhang, C., Stadler, T., Drummond, A.J. (2019) BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Computational Biology, 15(4), e1006650.

Gruenstaeudl, M., Reid, N.M., Wheeler, G.R., Carstens, B.C. (2016) Posterior predictive checks of coalescent models: P2C2M, an R package. Molecular Ecology Resources, 6, 193–205.

Ogilvie, H.A., Bouckaert, R.R., Drummond, A.J. (2017) StarBEAST2 brings faster species tree inference and accurate estimates of substitution rates. Molecular Biology and Evolution, 34(8), 2101-–2114.

Reid, N.M., Brown, J.M., Satler, J.D., Pelletier, T.A., McVay, J.D., Hird, S.M., Carstens, B.C. (2014) Poor fit to the multi-species coalescent model is widely detectable in empirical data. Systematic Biology, 63, 322–333.

Citation

Villamil, J., Morando, M., Avila, L.J., Lanna, F.M., Fonseca, E.M., Sites, Jr., J.W., Camargo, A. Revisiting the problem of Multispecies Coalescent Model fit with an example from a complete molecular phylogeny of the Liolaemus wiegmannii species group (Squamata: Liolaemidae), in review.

About

R package to test the multi-species coalescent model using trees estimated with starBEAST2 and to estimate false-positive error rates based on samples from the posterior predictive distributions

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published