Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Having issues to create a NR db with taxonomy info #824

Open
ahof1704 opened this issue Mar 15, 2024 · 4 comments
Open

Having issues to create a NR db with taxonomy info #824

ahof1704 opened this issue Mar 15, 2024 · 4 comments

Comments

@ahof1704
Copy link

I would like to run the seq similarity search against the homo sapien samples in the NR dataset. For that I downloaded the dataset as follows:

mmseqs databases NR nr tmp

Then I attempted to filter for the tax I want

mmseqs filtertaxseqdb nr nr_human --taxon-list 9606

However, I am getting the following error:
nr_mapping is empty. Rerun createtaxdb to recreate taxonomy mapping.

It is unclear what this suggestion to recreate the taxonomy means or what this command should look like. I would appreciate any help with that.

Thanks
Antonio

@ahof1704
Copy link
Author

Hi,

I have tried following the step described on the wiki to create the mapping for NR.

I have download and ensured I have the taxonomy folder

ls -lh /root/mmseqs2_db/taxonomy/
Permissions Size User Date Modified Name
drwxr-sr-x     - root 25 Mar 13:35  .ipynb_checkpoints/
.rw-rw-r--   20M 9019 12 Mar 21:27  citations.dmp
.rw-rw-r--  4.7M 9019 12 Mar 21:25  delnodes.dmp
.rw-rw-r--   452 9019 12 Mar 21:20  division.dmp
.rw-rw-r--   16k 9019 12 Mar 21:27  gc.prt
.rw-rw-r--  4.9k 9019 12 Mar 21:20  gencode.dmp
.rw-rw-r--  3.9M 9019 12 Mar 21:25  images.dmp
.rw-rw-r--  1.4M 9019 12 Mar 21:25  merged.dmp
.rw-rw-r--  244M 9019 12 Mar 21:27  names.dmp
.rw-rw-r--  194M 9019 12 Mar 21:27  nodes.dmp
.rw-rw----  3.1k 4544 27 Apr  2023  readme.txt
.rw-rw-r--   65M root 12 Mar 21:28  taxdump.tar.gz

But when attempting to extract the fasta and the tax id mapping, I get the following error:

cd /root/mmseqs2_db
blastdbcmd -db nr -entry all > nr.fna
BLAST Database error: No alias or index file found for nucleotide database [nr] in search path [/root/mmseqs2_db::]

I have ensured that the files for nr are available in that path

ls -lh /root/mmseqs2_db/nr*
Permissions Size User Date Modified Name
.rw-rw-r--   13G root 15 Mar 16:54  /root/mmseqs2_db/nr
.rw-rw-r--     4 root 15 Mar 16:54  /root/mmseqs2_db/nr.dbtype
.rw-r--r--     0 root 25 Mar 13:46  /root/mmseqs2_db/nr.fna
.rw-rw-r--  779M root 15 Mar 16:54  /root/mmseqs2_db/nr.index
.rw-rw-r--  790M root 15 Mar 16:55  /root/mmseqs2_db/nr.lookup
.rw-rw-r--     8 root 15 Mar 16:52  /root/mmseqs2_db/nr.source
.rw-rw-r--    11 root 15 Mar 17:03  /root/mmseqs2_db/nr.version
.rw-rw-r--  4.0G root 15 Mar 16:52  /root/mmseqs2_db/nr_h
.rw-rw-r--     4 root 15 Mar 16:52  /root/mmseqs2_db/nr_h.dbtype
.rw-rw-r--  748M root 15 Mar 16:55  /root/mmseqs2_db/nr_h.index
.rw-rw-r--     0 root 15 Mar 16:55  /root/mmseqs2_db/nr_mapping
.rw-rw-r--  708M root 15 Mar 16:55  /root/mmseqs2_db/nr_taxonomy

nr.fna is still empty. Not sure if this is a required step in order to create the nr_mapping. I would appreciate any help in getting the tax info for the NR dataset.

Thanks!

@ahof1704
Copy link
Author

ahof1704 commented Apr 1, 2024

Hi, I would really appreciate some help with this. Thanks!

@milot-mirdita
Copy link
Member

I remember blastdbcmd having issues, however, I don't remember what was wrong.

We use a different workflow to assign taxids for the NR:

The download part for the accession2taxid files:

downloadFile "https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz" "${TMP_PATH}/nr.gz"

@ahof1704
Copy link
Author

Sorry, not sure if I follow. Am I supposed to do any differently from what I described above to filter NR by taxonomy info?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants