Skip to content
This repository has been archived by the owner on Mar 10, 2023. It is now read-only.

Update to sourmash 4.0 #130

Open
olgabot opened this issue Mar 9, 2021 · 0 comments
Open

Update to sourmash 4.0 #130

olgabot opened this issue Mar 9, 2021 · 0 comments
Labels
enhancement New feature or request

Comments

@olgabot
Copy link
Collaborator

olgabot commented Mar 9, 2021

Is your feature request related to a problem? Please describe

The newest Sourmash 4.0 release is much faster (many operations moved from Python to Rust), and includes a new sourmash sketch command that allows for separately making DNA and protein sketches. This is super helpful as the parameters for protein sketches vs DNA sketches are different. Additionally, it now has native support for amino acid k-mer sizes! E.g. I'd like to do:

  • DNA nucleotide k-mer size=21
  • Protein amino acid k-mer size=10
  • Dayhoff amino acid k-mer size=17

Right now, this has to be done all in one command, and all alphabets get k-merized at each k-size, which doesn't really make sense. Dayhoff with nucleotide k=21 has far too low information content to be usable. While DNA has 4^21 options at ksize=21, since Dayhoff is an amino acid alphabet, the k-mer size is really 21/7, so 6^7 << 4^21 and doesn't have enough information to distinguish between cell types. It's basically random at that point.

https://github.com/dib-lab/sourmash/blob/5e66db91e62353de2b79f23cd198ef6f5c5544d1/doc/sourmash-sketch.md

Describe the solution you'd like

Add Sourmash 4.0: https://anaconda.org/bioconda/sourmash
(released ~1 week ago)

Describe alternatives you've considered

Could stay with current Sourmash but this is the future!!

Additional context

NA

@olgabot olgabot added the enhancement New feature or request label Mar 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant