Some simple tools in python for MDU
Use it to correctly merge lanes from an Illumina run into the a single FASTQ.
Get help:
mdu-merge-ngs-lanes --help
Basic usage:
mdu-merge-ngs-lanes -i /path/to/fastq_folder -o /path/to/output > cmd.sh
Advanced usage:
You can split the output to muliple subfolders of the output folder by adding --subfolder
to the command line. The option can be used multiple times, and takes two space separated values as input:
path
regex
. The path
gives a name of the subfolder in the output folder, and the regex
expression
determines which samples go in that subfolder.
For instance, the command below will split samples starting the NTC in to a subfolder called ntc
,
while all other samples will be added to a subfolder called data
.
mdu-merge-ngs-lanes -i /path/to/fastq -o /path/to/output --subfolder 'data' '(?!NTC).*' --subfolder 'ntc' '(?<=NTC).*' > cmd.sh
Use to it to upload FASTQ data to NCBI SRA.
Requires a file with tab-separated values of MDU ID
and AUSMDUID
. For example:
mdu1\tausmdu1
mdu2\tausmdu2
Getting help:
mdu-sra-uploads --help
Usage: mdu-sra-upload [OPTIONS] ISOLATES
Options:
-f, --folder TEXT Folder on NCBI to upload. Used to find the reads
when submitting via the SRA portal. [default:
mdu]
-r, --reads-folder TEXT Where reads are located (uses MDU_READS env
variable if available).
-k, --ascp-key TEXT Path to ascp ssh upload key (uses ASCP_UPLOAD_KEY
env variable if available). This can be obtained
from the SRA Submission Portal.
-s, --sra-subfolder TEXT SRA subfolder owned by you where data will copied
to (uses SRA_SUBFOLDER env variable is available).
--help Show this message and exit.
Basic usage:
cd /path/for/upload
# copy paste isolates.txt
mdu-sra-uploads isolates.txt
# when completing the submission, search for pre-uploaded files in the folder called mdu
MDU_READS
: full path to where FASTQ data is storedASCP_UPLOAD_KEY
: full path to where your Aspera NCBI upload key is located (obtain one from the SRA submission portal under the Aspera command line instructions)SRA_FOLDER
: path to your folder at SRA. Usually composed by youremail
plus an "_" and some random alphanumeric characters. This can be obtained from SRA submission portal under the Aspera command line instructions (e.g.,john.doe@doe.industries.com_qEWo9
).
To develop with the same environment use vagrant
and virtualbox
:
vagrant up
vagrant ssh
Once logged in to the VM, the shared folder is in /vagrant
.