sourmash is a tool for biological sequence analysis and comparisons.
betterplot
is a sourmash plugin that provides improved plotting/viz
and cluster examination for sourmash-based sketch comparisons.
sourmash compare
and
sourmash plot
produce basic distance matrix plots that are useful for comparing and
visualizing the relationships between dozens to hundreds of
genomes. And this is one of the most popular use cases for sourmash!
However, the visualization can be improved a lot beyond the basic viz
that sourmash plot
produces. There are a lot of only slightly more
complicated use cases for comparing, clustering, and visualizing many
genomes!
And this plugin exists to explore some of these use cases!
Specific goals:
- provide a variety of plotting and exploration commands that can be used with sourmash tools;
- provide both command-line functionality and functions that can be imported and used in Jupyter notebooks;
- (maybe) explore other backends than matplotlib;
and who knows what else??
pip install sourmash_plugin_betterplot
See the examples below.
The command lines below are executable in the examples/
subdirectory
of the repository after installing the plugin.
Compare 3 sketches, and cluster.
This command:
sourmash compare sketches/{2,47,63}.sig.zip -o 3sketches.cmp
--labels-to 3sketches.cmp.labels_to.csv
sourmash scripts plot2 3sketches.cmp 3sketches.cmp.labels_to.csv \
-o examples/plot2.3sketches.cmp.png
produces this plot:
Compare 3 sketches, cluster, and show a cut point.
This command:
sourmash compare sketches/{2,47,63}.sig.zip -o 3sketches.cmp
--labels-to 3sketches.cmp.labels_to.csv
sourmash scripts plot2 3sketches.cmp 3sketches.cmp.labels_to_csv \
-o examples/plot2.cut.3sketches.cmp.png \
--cut-point=1.2
produces this plot:
Compare 10 sketches, cluster, and use a cut point to extract
multiple clusters. Use --dendrogram-only
to plot just the dendrogram.
This command:
sourmash compare sketches/{2,47,48,49,51,52,53,59,60}.sig.zip \
-o 10sketches.cmp \
--labels-to 10sketches.cmp.labels_to.csv
sourmash scripts plot2 10sketches.cmp 10sketches.cmp.labels_to.csv \
-o plot2.cut.dendro.10sketches.cmp.png \
--cut-point=1.35 --cluster-out --dendrogram-only
produces this plot:
as well as a set of 6 clusters to 10sketches.cmp.*.csv
.
Use MDS to display a comparison.
These commands:
sourmash compare sketches/{2,47,48,49,51,52,53,59,60}.sig.zip \
-o 10sketches.cmp \
--labels-to 10sketches.cmp.labels_to.csv
sourmash scripts mds 10sketches.cmp 10sketches.cmp.labels_to.csv \
-o mds.10sketches.cmp.png \
-C 10sketches-categories.csv
Use MDS to display a sparse comparison created using the
branchwater plugin's
pairwise
command. The output of pairwise
is distinct from the
sourmash compare
output: pairwise
produces a sparse CSV file that
contains just the matches above threshold, while sourmash compare
produces a dense numpy matrix.
These commands:
sourmash sig cat sketches/{2,47,48,49,51,52,53,59,60}.sig.zip \
-o 10sketches.sig.zip
sourmash scripts pairwise 10sketches.sig.zip -o 10sketches.pairwise.csv
sourmash scripts mds 10sketches.cmp \
-o mds.10sketches.cmp.png \
-C 10sketches-categories.csv
These commands:
# build pairwise
sourmash sig cat sketches/{2,47,48,49,51,52,53,59,60}.sig.zip \
-o 10sketches.sig.zip
sourmash scripts pairwise 10sketches.sig.zip -o 10sketches.pairwise.csv
# convert pairwise
sourmash scripts pairwise_to_compare 10sketches.pairwise.csv \
-o 10sketches.pairwise.cmp --write-all \
--labels-to 10sketches.pairwise.cmp.labels_to.csv
# plot!
sourmash scripts plot2 10sketches.pairwise.cmp \
10sketches.pairwise.cmp.labels_to.csv \
-o plot2.pairwise.10sketches.cmp.png
produce this plot:
The
seaborn
clustermap
offers some nice visualization options.
These commands:
sourmash compare sketches/{2,47,48,49,51,52,53,59,60}.sig.zip \
-o 10sketches.cmp \
--labels-to 10sketches.cmp.labels_to.csv
sourmash scripts plot3 10sketches.cmp 10sketches.cmp.labels_to.csv \
-o plot3.10sketches.cmp.png -C 10sketches-categories.csv
produce this plot:
These commands:
sourmash sig cat sketches/{2,47,48,49,51,52,53,59,60}.sig.zip \
-o 10sketches.sig.zip
sourmash scripts manysearch 10sketches.sig.zip \
sketches/shew21.sig.zip -o 10sketches.manysearch.csv
sourmash scripts clustermap1 10sketches.manysearch.csv \
-o clustermap1.10sketches.png
-u containment -R 10sketches-categories.csv
produce:
We suggest filing issues in the main sourmash issue tracker as that receives more attention!
betterplot
is developed at
https://github.com/sourmash-bio/sourmash_plugin_betterplot.
See environment.yml
for the dependencies needed to develop betterplot
.
Run:
make examples
to run the examples.
For now, the examples serve as the tests; eventually we will add unit tests.
Bump version number in pyproject.toml
and push.
Make a new release on github.
Then pull, and:
python -m build
followed by twine upload dist/...
.
CTB May 2024