Skip to content

Randomly download genomes from NCBI RefSeq and Genbank

Notifications You must be signed in to change notification settings

bielasilva/mock_ncbi_download

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

Mock Genomes NCBI Download

Usage mock_ncbi_down.py [-h] -d DIRECTORY_DOWNLOAD -c {archaea,bacteria,fungi,protozoa,viral}

  • -d/--directory_download -> Directory where the files will be saved
  • -c/--databases -> Databases to be downloaded
  • -r -> Sets program to only download from RefSeq

This script randomly selects genomes from RefSeq and Genbank based on the following criteria:

  1. RefSeq
  • It is a Complete Genome;
  • It has not already been downloaded.
  1. GenBank
  • It is a Complete Genome;
  • Its specie is not on RefSeq;
  • It was not rejected from RefSeq;
  • It has not already been downloaded;

It's gonna be downloaded the following number of genomes:

RefSeq Genbank
Archaea 15 2 *
Bacteria 15 5
Fungi 5 5
Protozoa 2 2
viral 10 10

Due to due to there being only 2 genomes in the Genbank Archaea database which fell in those categories, the number of downloads was set to it.

About

Randomly download genomes from NCBI RefSeq and Genbank

Topics

Resources

Stars

Watchers

Forks

Languages