Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-genome reference and query FASTAs for many-to-many queries #123

Open
ryneches opened this issue Oct 3, 2023 · 0 comments
Open

Multi-genome reference and query FASTAs for many-to-many queries #123

ryneches opened this issue Oct 3, 2023 · 0 comments

Comments

@ryneches
Copy link

ryneches commented Oct 3, 2023

When comparing many small genomes, it is not possible to create individual files for each genome. For example, IMG/VR v4.1 contains 5,576,197 viral genomes. It is not possible to create this many files on most file systems, particularly in HPC environments where network file systems like NFS and Lustre are usually deployed.

For this situation, how about supplying a single FASTA file for all contigs, and a query.txt and reference.txt structured something like this?

{genome_name}\t{contig_1},{contig_2},{contig_3}....\n
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant