Skip to content
Jaime Huerta-Cepas edited this page Jun 14, 2016 · 9 revisions

Main objective

  • To develop an ETE module to search for custom tree patterns in large collections of trees.

Resources

General Ideas

  • pattern could be defined as a Newick structure, where rules and filters are encoded in a hopefully user-friendly vocabulary. For instance:
((sp=Hsa,sp=Pta,name=Hsa001),dups>1, name, H in species, )[dist>0.1*starter]
  • The matcher should be a python object transparent to the user, so querying can be done like:
TreeMatcher(pattern).find_matching_nodes([tree1, tree2, tree3])
  • The tree matcher object should allow for different types of searches. For instance,
TreeMatcher.is_match(node)
TreeMatcher.has_match(tree)
TreeMatcher.find_matching_nodes([tree1, tree2], matches_per_tree=1)
TreeMatcher.find_matching_nodes([tree1, tree2], matches_per_tree=1)
etc...
  • Search for the most optimal way to find matches. Check recursion, heuristic methods, etc.. Think the matching algorithm should be able to search over thousands of trees.

  • Develop a way to auto generate patterns from a bunch of real trees. In example, finding the commonalities from a a group of trees and generate a pattern expression that can be used to find similar structures.

  • tree patterns should allow common operators and have a basic language to permit user defined functions and filters.

@ = target node
OR   || 
AND   &&
NOT   !
OPERATORS >= > < <= != == ~= IN
custom functions
@leaf
@size
@contains 
function arguments = {}
  • Develop a visualization layout to compare trees and patterns

  • Implement parallel tree matching to scan large collections of tree (i.e., with multiprocessing)

Clone this wiki locally