Skip to content

The main purpose of this thesis work is to substitute the slow Maximum Likelihood Estimation (MLE) methods traditionally used to infer parameters with ML methods. Additionally, the goal is to create a model predictor that, given a phylogenetic tree, can automatically guess the model and infer its parameters.

Notifications You must be signed in to change notification settings

Mattia-Colbertaldo/PhyloML

Repository files navigation

Phylogenetic Tree Simulation and Analysis

Author: Mattia Colbertaldo
Date: February 2024


Work in Progress

Description

This repository contains code and data for a thesis project focused on simulating phylogenetic trees under different models and inferring their parameters using machine learning (ML) techniques.

The main purpose of this thesis work is to substitute the slow Maximum Likelihood Estimation (MLE) methods traditionally used to infer parameters with ML methods. Additionally, the goal is to create a model predictor that, given a phylogenetic tree, can automatically guess the model and infer its parameters.

Workflow

  1. Simulations:

    • The simulations.r file is used to simulate phylogenetic trees under different models using functions from the diversitree package. The trees are then saved to file for further analysis.
    • In the ranges.r file, ranges of parameters for simulating trees are obtained through MLE inference on phylogenetic trees from real-world data.
  2. Parameter Inference:

    • CDV_full_tree.py encodes the trees with the CDV representation, preparing them for ML analysis.
    • Summary_Statistics.py encodes the trees with the SS representation.
    • In the AllModels_SS.ipynb notebook, encoded trees are read, data is managed, and they are input into a Convolutional Neural Network (CNN) created to train it to infer the parameters of the model. The code is universal for all models in the diversitree package.
  3. Model Predictor:

    • The ModelPredictor.ipynb notebook trains different Neural Networks to try to infer the model given a tree. I suggest to look at the SS one because it is always the most updated.

Models

We explore various models for simulating phylogenetic trees, including:

  • BD
  • BiSSE
  • MuSSE
  • QuaSSE
  • GeoSSE
  • BiSSEness
  • ClaSSE

Additionally, we compare simulations with the Constant Birth-Death model.


Usage

Feel free to explore the code and data provided in this repository. For detailed instructions on running simulations, inferring parameters, and training model predictors, refer to the respective files and notebooks.


References

For further reading and understanding of the models, methods, and ML techniques used, please refer to the cited references.


If you have any questions or suggestions, don't hesitate to reach out. Happy exploring!

About

The main purpose of this thesis work is to substitute the slow Maximum Likelihood Estimation (MLE) methods traditionally used to infer parameters with ML methods. Additionally, the goal is to create a model predictor that, given a phylogenetic tree, can automatically guess the model and infer its parameters.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages