Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data interoperability between other MinHash implementations #3

Open
edawson opened this issue Jul 5, 2016 · 1 comment
Open

Data interoperability between other MinHash implementations #3

edawson opened this issue Jul 5, 2016 · 1 comment

Comments

@edawson
Copy link
Owner

edawson commented Jul 5, 2016

As noted by Titus in an issue, Minhash sketches should be stable and cross-compatible between programs. It makes sense to fall into line with this convention. To this end, rkmh should output either Mash's capnproto schema or a text file akin to sourmash's YAML sigs.

@edawson
Copy link
Owner Author

edawson commented Aug 12, 2016

Commit b9224 partially solves this, but there are still a few items left:

  1. JSON output is ordered alphabetically, which may or may not be okay.
  2. There is no way (at least from the CLI) to change hash seeds, hash functions, alphabet, whether to use non-canonical bases, or the number of hash bits.
  3. Reads/Refs are spat out individually at the moment, which is different than Mash/Sourmash, which collate sets of reads/refs into sketches. I guess there should be a switch for this a la mash -i?
  4. There is no error handling in rkmh hash yet, and there're probably still some performance issues. I have no idea if the structures json.hpp provides are thread-safe, nor have I tested with multiple threads.
  5. There's no way to read the sketches back in, so they're essentially useless at the moment (outside of for debugging purposes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant