Virulence and Antibiotic Resistance Genes database in Region Östergötland

Repository contents

Raw (unprocessed) virulence and antibiotic resistance (VAR) sequence files

raw/Diagnostic_genes.fa: Antibiotic and resistance genes file
raw/Diagnostic_genes_phenotypes.tsv: Tsv file containing phenotypes of the genes listed in raw/Diagnostic_genes.fa

Processed (intermediary) VAR sequence files

proc/diagnostic_genes.fa: Antibiotic and resistance genes file
proc/phenotypes.tsv: Tsv file containing phenotypes related with genes
proc/non-coding.txt: A list of sequence names from raw/Diagnostic_genes.fa estimated by Ariba to be non-coding when running ariba prepareref command on them
proc/*.{html,md}: Html and markdown files of Jupyter notebooks. These are here just for archiving purposes.

The processed intermediary files were produced with jupyter notebooks in bin-directory inside conda environment defined by environment.yml.

The final VAR sequence files

final/coding.fa: According to Ariba estimated coding sequences augmented by phenotype information in the fasta headers
final/non-coding.fa: According to Ariba estimated non-coding sequences augmented by phenotype information in the fasta headers
final/coding_non-coding.fa: A combination of the two above

The phenotype information is appended in the fasta headers preceded by ||| in order to make it more machine readable.

Jupyter notebooks used to create final VAR sequences from the intermediary files (see 2. Processed (intermediary) VAR sequence files above)

bin/add_phenos_to_fasta.ipynb: This Jupyter notebook appends corresponding phenotype data to fasta headers. This makes the phenotype data more accessible in downstream steps.
bin/gather_seqs.ipynb: This Jupyter notebook reads a list of sequence identifiers from proc/non-coding.txt and collects those sequences from final/coding_non-coding.fa into final/non-coding.fa file as well as the ones left over to final/coding.fa.

Makefile

Makefile: This runs most of what needs to be run.

Environment definition file

environment.yml defines the environment inside which jupyter notebooks can be run and what the environment is for the binder instance that can be launched by clicking the badge in the main README.md file.

How to add new genes?

Clone the repository in the path of your choosing:

git clone https://github.com/Genomic-Medicine-Linkoping/var-genes-ro.git

Create conda environment required for running the jupyter notebooks Navigate first to the cloned var-genes-ro directory and then run the following command:

conda env create -f environment.yml

Add the newest fasta and phenotypes files to raw directory It is important that the fasta file is named Diagnostic_genes.fa and the phenotypes file Diagnostic_genes_phenotypes.tsv. The phenotypes file should be in tsv format.
Create final files using make Run the following make command. This command prepares gene and phenotypes files and creates the final fasta database files for use with e.g. JASEN pipeline.

make

Note 1: This database is used at the university hospital of Linköping (Region Östergötland), Sweden.

Note 2: It is strongly recommended to perform verification of these sequences before taking them in to use in your own clinical setting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

final

final

proc

proc

raw

raw

.gitattributes

.gitattributes

.gitignore

.gitignore

LICENSE

LICENSE

Makefile

Makefile

README.md

README.md

environment.yml

environment.yml

Repository files navigation

Virulence and Antibiotic Resistance Genes database in Region Östergötland

Repository contents

How to add new genes?

About

Releases 3

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
bin		bin
final		final
proc		proc
raw		raw
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
environment.yml		environment.yml

License

Genomic-Medicine-Linkoping/var-genes-ro

Folders and files

Latest commit

History

Repository files navigation

Virulence and Antibiotic Resistance Genes database in Region Östergötland

Repository contents

How to add new genes?

About

Topics

Resources

License

Stars

Watchers

Forks

Languages