Skip to content

emollier/Kleborate

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kleborate

Kleborate is a tool to screen genome assemblies of Klebsiella pneumoniae and the Klebsiella pneumoniae species complex (KpSC) for:

  • MLST sequence type
  • species (e.g. K. pneumoniae, K. quasipneumoniae, K. variicola, etc.)
  • ICEKp associated virulence loci: yersiniabactin (ybt), colibactin (clb)
  • virulence plasmid associated loci: salmochelin (iro), aerobactin (iuc), hypermucoidy (rmpA, rmpA2)
  • antimicrobial resistance genes, including quinolone resistance SNPs and colistin resistance truncations
  • K (capsule) and O antigen (LPS) serotype prediction, via wzi alleles and Kaptive

For Klebsiella outside of the KpSC, Kleborate will accurately determine the species and will report the presence of any accessory genes detected (AMR, virulence, K & O types); however species-focused markers (mutational resistance, MLST) will not be reported.

To learn more about taxonomy and population genomics of the Klebsiella pneumoniae and the species complex, and what we know so far about the distribution of AMR, virulence and K types in the Klebsiella pneumoniae population, see Wyres, Lam & Holt, 2020, Nature Reviews Microbiology.

Citing Kleborate

A manuscript describing the Kleborate software in full is currently in preparation. (Note that the BLAST logic has been checked in the light of this article describing a common misconception regarding the BLAST parameter -max_target_seqs.)

In the meantime, if you use Kleborate, please cite the component schemes that you report:

Yersiniabactin and colibactin (ICEKp) Lam, MMC. et al. Genetic diversity, mobilisation and spread of the yersiniabactin-encoding mobile element ICEKp in Klebsiella pneumoniae populations. Microbial Genomics (2018).

Aerobactin and salmochelin: Lam, MMC. et al. Tracking key virulence loci encoding aerobactin and salmochelin siderophore synthesis in Klebsiella pneumoniae. Genome Medicine (2018).

Kaptive for capsule (K) serotyping: Wyres, KL. et al. Identification of Klebsiella capsule synthesis loci from whole genome data. Microbial Genomics (2016).

Kaptive for O antigen (LPS) serotyping: Wick, RR et. al. Kaptive Web: user-friendly capsule and lipopolysaccharide serotype prediction for Klebsiella genomes. Journal of Clinical Microbiology (2018).

Table of Contents

Background

Klebsiella pneumoniae (Kp) is a commensal bacterium that causes opportunistic infections in hospitals. K. pneumoniae are intrinsically resistant to ampicillin and frequently acquire additional antimicrobial resistances through horizontal gene transfer and chromosomal mutations. A handful of hypervirulent lineages are also recognised, which encode a constellation of acquired virulence factors and can cause invasive disease outside the hospital setting. Evidence is now mounting that other K. pneumoniae strains carrying one or more of these acquired factors – including siderophores (yersiniabactin, salmochelin and aerobactin), regulators of hypermucoidy (rmpA/rmpA2 genes) and/or the genotoxin colibactin – can also be highly pathogenic and cause more severe disease both inside and outside hospitals. Capsule (K) and LPS (O) antigen variation is also of great interest to the research community due to its importance in host-pathogen and phage interactions, and thus potential relevance to alternative control measures such as vaccines, immunotherapy and phage therapy. K. pneumoniae has six close relatives (species and subspecies) known as the K. pneumoniae species complex (KpSC); these are difficult to distinguish from one another in clinical labs using biotyping or MALDI-TOF and are often confused for K. pneumoniae sensu stricto.

To make it easier to extract clinically relevant genotyping information on K. pneumoniae and the species complex suing genome data, we have developed Kleborate, a genomic surveillance tool designed to (a) accurately identify species and sequence types, and (b) identify the key acquired genetic features for which there is strong evidence of association with either antibiotic resistance or hypervirulence in K. pneumoniae sensu stricto. While many generic tools can be used to identify sequence types or resistance determinants from bacterial genomes, we hope that this organism-specific tool will help avoid many of the common confusions faced by people working with K. pneumoniae genomes and also facilitate monitoring for the convergence of antibiotic resistance with the hypervirulence-associated factors noted above.

Requirements

Software requirements:

  • Python 3 (v3.5 or later should work)
  • setuptools (required to install Kleborate)
    • To install: pip install setuptools
  • Biopython
    • To install: pip install biopython
  • BLAST+
    • Version 2.7.1 or later is needed, as earlier versions have a bug with the culling_limit parameter and/or tblastx results.
    • We test Kleborate on BLAST+ v2.7.1. Later versions will probably also work but stick to v2.7.1 if you want to play it safe.
  • Mash
    • You can download a pre-compiled version from the Mash releases page (both Mac and Linux binaries are available) and copy the executable somewhere into your PATH (e.g. /usr/local/bin).
    • Alternatively, you can install Mash on a Mac with Homebrew: brew install mash

Input files: Kleborate takes Klebsiella genome assemblies (either completed or draft) in fasta format (can be gzipped). If you have unassembled reads, try assembling them with our Unicycler assembler which works great on Illumina or hybrid Illumina + Nanopore/PacBio reads).

Installation

Kleborate can be installed to your system for easy usage:

git clone --recursive https://github.com/katholt/Kleborate.git
cd Kleborate
python setup.py install
kleborate -h

Alternatively, you can clone and run Kleborate without installation directly from its source directory:

git clone --recursive https://github.com/katholt/Kleborate.git
Kleborate/kleborate-runner.py -h

See examples below to test out your installation on some public genome data. And if you'd like to thoroughly check that everything works as intended, you can also run this repo's automated tests after installation.

Note that Kleborate depends on a git submodule (Kaptive) which is why --recursive is required when cloning. If you update your local copy of Kleborate using git pull, you should also run git submodule update to ensure that its Kaptive is also up-to-date.

Updating the MLST database

Each Kleborate release includes a copy of the K. pneumoniae species complex MLST database to screen against. The version included is current at the time of the release, however the K. pneumoniae species complex BIGSdb is being updated all the time with new STs, so Kleborate users may wish to update their copy of Kleborate regularly with the latest MLST database.

The MLST database is made up of 2 files, which are located in the Kleborate/kleborate/data directory:

  • Klebsiella_pneumoniae.fasta (allele seuqences)
  • kpneumoniae.txt (sequence type definitions)

A python3 script to download the latest versions of these 2 files is provided in the Kleborate/scripts directory, the downloaded files can then just be copied into the Kleborate/kleborate/data directory

cd Kleborate/scripts
python getmlst.py --species "Klebsiella pneumoniae"
mv Klebsiella_pneumoniae.fasta ../kleborate/data
mv kpneumoniae.txt ../kleborate/data

Basic usage

Screen some genomes for MLST and virulence loci:
kleborate -o results.txt -a *.fasta

Also screen for resistance genes:
kleborate --resistance -o results.txt -a *.fasta

Turn on all of Kleborate's optional screens (resistance genes, species check and both K and O loci):
kleborate --all -o results.txt -a *.fasta

Screen everything in a set of gzipped assemblies:
kleborate --all -o results.txt -a *.fasta.gz

Full usage

usage: kleborate -a ASSEMBLIES [ASSEMBLIES ...] [-r] [-s] [--kaptive_k]
                 [--kaptive_o] [-k] [--all] [-o OUTFILE]
                 [--kaptive_k_outfile KAPTIVE_K_OUTFILE]
                 [--kaptive_o_outfile KAPTIVE_O_OUTFILE] [-h] [--version]

Kleborate: a tool for characterising virulence and resistance in Klebsiella

Required arguments:
  -a ASSEMBLIES [ASSEMBLIES ...], --assemblies ASSEMBLIES [ASSEMBLIES ...]
                        FASTA file(s) for assemblies, can be gzipped (.gz)

Screening options:
  -r, --resistance      Turn on resistance genes screening (default: no
                        resistance gene screening)
  --kaptive_k           Turn on Kaptive screening of K loci (default: do not
                        run Kaptive for K loci)
  --kaptive_o           Turn on Kaptive screening of O loci (default: do not
                        run Kaptive for O loci)
  -k, --kaptive         Equivalent to --kaptive_k --kaptive_o
  --all                 Equivalent to --resistance --kaptive

Output options:
  -o OUTFILE, --outfile OUTFILE
                        File for detailed output (default:
                        Kleborate_results.txt)
  --kaptive_k_outfile KAPTIVE_K_OUTFILE
                        File for full Kaptive K locus output (default: do not
                        save Kaptive K locus results to separate file)
  --kaptive_o_outfile KAPTIVE_O_OUTFILE
                        File for full Kaptive O locus output (default: do not
                        save Kaptive O locus results to separate file)

Help:
  -h, --help            Show this help message and exit
  --version             Show program's version number and exit

Screening details

Assembly quality metrics

The quality and completeness of Kleborate results depends on the quality of the input genome assemblies. We provide some basic assembly statistics (contig count, N50, largest contig size, detection of ambiguous bases) to help users understand their Kleborate results in the context of assembly quality, but we recommend users conduct more comprehensive QC themselves before running Kleborate (e.g. screen for contamination, etc).

Klebsiella species

Kleborate will attempt to identify the species of each input assembly. It does this by comparing the assembly using Mash to a curated set of Klebsiella assemblies from NCBI and reporting the species of the closest match. Kleborate considers a Mash distance of ≤ 0.01 to be a strong species match. A distance of > 0.01 and ≤ 0.03 is a weak match and might indicate that your sample is a novel lineage or a hybrid between multiple Klebsiella species.

Here is an annotated tree of the reference assemblies, made by mashtree:

Klebsiella species tree

Kleborate is designed for detailed genotyping of the well-studied K. pneumoniae species complex (KpSC) labelled on the tree, which includes the seven species listed in the table below. These were previously considered as phylogroups within K. pneumoniae. We've included the phylogroup numbers in the table below to allow backwards compatibility, but these are not reported in the Kleborate output. See this review for an overview of the complex.

Species Kp phylogroupa Kp phylogroup (alternative)b Reference
K. pneumoniae Kp1 KpI Brenner, D.J. 1979 Int J Syst Evol Microbiol 29: 38-41
K. quasipneumoniae subsp quasipneumoniae Kp2 KpIIa Brisse et al. 2014 Int J Syst Evol Microbiol 64:3146-52
K. quasipneumoniae subsp similipneumoniae Kp4 KpIIb Brisse et al. 2014 Int J Syst Evol Microbiol 64:3146-52
K. variicola subsp variicola Kp3 KpIII Rosenblueth et al. 2004 Syst Appl Microbiol 27:27-35 (described as subsp variicola in this paper)
K. variicola subsp tropica Kp5 - Rodrigues et al. 2019 Res Microbiol S0923-2508:30019-1 (described as subsp tropicalensis in this paper)
K. quasivariicola Kp6 - Long et al. 2017 Genome Announc 5: e01057-17
K. africana Kp7 - Rodrigues et al. 2019 Res Microbiol S0923-2508:30019-1 (described as africanensis in this paper)

a Kp phylogroup numbers as described in Rodrigues et al. 2019

b alternative (older) Kp phylogroup numbers as described in Brisse et al. 2001 and Fevre et al. 2005 prior to the identification of K. variicola subsp tropica, K. quasivariicola and K. africana.

More distant Klebsiella species (oxytoca, michiganensis, grimontii and aerogenes) will be accurately identified by Kleborate, although please note that the diversty and relevance of K. pneumoniae virulence factors in these species is not yet well understood.

Kleborate will also yield reliable species identifications across the family Enterobacteriaceae, as different species sometimes end up in Klebsiella collections. These names are again assigned based on the clades in a mashtree, but were not as carefully curated as the Klebsiella species (so take them with a grain of salt).

MLST

Genomes identified by Kleborate as belonging to the K. pneumoniae species complex are then subjected to multi-locus sequence typing (MLST) using the 7-locus scheme described at the K. pneumoniae BIGSdb hosted at the Pasteur Institute. Note that this scheme is not specific to K. pneumoniae sensu stricto but covers the whole K. pneumoniae species complex. A copy of the MLST alleles and ST definitions is stored in the data directory of this repository. See above for instructions on how to update the MLST database in your copy of Kleborate.

Notes on Kleborate's MLST calls:

  • Kleborate makes an effort to report the closest matching ST if a precise match is not found.
  • Imprecise allele matches are indicated with a *.
  • Imprecise ST calls are indicated with -nLV, where n indicates the number of loci that disagree with the ST reported. So 258-1LV indicates a single-locus variant of (SLV) of ST258, i.e. 6/7 loci match ST258.

Note that allele definitions for ST1047 and ST1078 were changed in the MLST database in Feburary 2018, and these new allele combinations are incorporated in Kleborate since v0.4.0. This is highly unusual and other allele and ST assignment should be stable across versions.

allele ST1047 old ST1047 current ST1078 old ST1078 current
gapA 10 2 16 4
infB 20 1 18 5
mdh 1 2 1 1
pgi 1 20 76 3
phoE 9 7 47 12
rpoB 11 1 1 4
tonB 14 4 124 46

Acquired virulence loci

Kleborate examines four key acquired virulence gene clusters that contribute to hypervirulence in K. pneumoniae: the siderophores yersiniabactin (ybt), aerobactin (iuc) and salmochelin (iro), and the genotoxin colibactin (clb). (We also screen for the hypermucoidy genes rmpA and rmpA2, details below).

  • For each of these loci, Kleborate will call a sequence type using the same logic as the MLST described above, using the locus-specific schemes defined in the BIGSdb.
  • Kleborate will also report the lineage associated with the virulence sequence types, as outlined below and detailed in the corresponding papers (for yersiniabactin, we also report the predicted ICEKp structure based on the ybt lineage assignment).
  • If the locus is not detected, Kleborate reports the ST as 0 and the lineage as -.

Yersiniabactin and colibactin (primarily mobilised by ICEKp)

We recently explored the diversity of the K. pneumoniae integrative conjugative element (ICEKp), which mobilises the yersiniabactin locus ybt, using genomic analysis of a diverse set of 2498 Klebsiella (see this paper). Overall, we found ybt in about a third of all K. pneumoniae genomes and clb in about 14%. We identified 17 distinct lineages of ybt (see figure) embedded within 14 structural variants of ICEKp that can integrate at any of four tRNA-Asn sites in the chromosome. Three of the 17 ybt lineages were associated with three lineages of colibactin, with which they are co-located in the same ICE structure designated ICEKp10. One ICE structure (ICEKp1) carries the salmochelin synthesis locus iro and rmpA hypermucoidy gene in addition to ybt (lineage 2). Additionally, we identified a lineage of ybt that is plasmid-encoded, representing a new mechanism for ybt dispersal in K. pneumoniae populations. Based on this analysis, we developed a MLST-style approach for assigning yersiniabactin sequence types (YbST) and colibactin sequence types (CbST), which is implemented in Kleborate. Annotated reference sequences for each ICEKp variant are included in the data directory of this repository).

ICEKp is occasionally found in other species within the KpSC, and even in other genera of Enterobacteriaceae (see paper), however most of the known variation included in the database is derived from K. pneumoniae.

ybt tree

Aerobactin and salmochelin (primarily mobilised by virulence plasmids)

We further explored the genetic diversity of the aerobactin (iuc) and salmochelin (iro) loci among a dataset of 2733 Klebsiella genomes (see this paper). We identified five iro and six iuc lineages (see figure), each of which was associated with a specific location within K. pneumoniae genomes. The most common lineages were iuc1 and iro1, which are found together on the virulence plasmid KpVP-1 (typified by pK2044 or pLVPK common to the hypervirulent clones ST23, ST86, etc). iuc2 and iro2 lineages were associated with the alternative virulence plasmid KpVP-2 (typified by Kp52.145 plasmid II from the K2 ST66 lab strain known as Kp52.145 or B5055). iuc5 and iro5 originate from E. coli and are carried (often together) on E. coli plasmids that can transfer to K. pneumoniae. The lineages iuc2A, iuc3 and iro4 were associated with other novel plasmids that had not been previously described in K. pneumoniae but sequences for which are included in the paper. In addition, we found the salmochelin locus present in ICEKp1 constitutes its own lineage iro3, and the aerobactin locus present in the chromosome of ST67 K. pneumoniae subsp rhinoscleromatis strains constitutes its own lineage iuc4. Based on this analysis, we developed a MLST-style approach for assigning aerobactin sequence types (AbST) and salmochelin sequence types (SmST) which is implemented in Kleborate.

iuc and iro trees

Please note that the aerobactin iuc and salmochelin iro lineage names have been updated between Kleborate version 0.2.0 and 0.3.0 to match the nomenclature used in the paper. The AbST and SmST allele numbers are unchanged. Lineage name re-assignments are:

v0.2.0 v0.3.0 location (see paper for details)
iuc 2 iuc 1 KpVP-1 (e.g. pLVPK)
iuc 3B iuc 2 KpVP-2 (e.g. Kp52.145 plasmid II)
iuc 3A iuc 2A other plasmids
iuc 4 iuc 3 other plasmids
iuc 5 iuc 4 rhinoscleromatis chromosome
iuc 1 iuc 5 E. coli variant
iro 3 iro 1 KpVP-1 (e.g. pLVPK)
iro 4 iro 2 KpVP-2
iro 5 iro 3 ICEKp1
iro 2 iro 4 Enterobacter variant
iro 1 iro 5 E. coli variant

Hypermucoidy genes

Kleborate screens for alleles of the rmpA and rmpA2 genes which can result in a hypermucoid phenotype by upregulating capsule production.

  • The two genes share ~83% nucleotide identity so are easily distinguished, and are reported in separate columns.
  • Alleles for each gene are sourced from the BIGSdb. For rmpA, we have also mapped these alleles to the various known locations for rmpA in Klebsiella (i.e. major virulence plasmids KpVP-1 and KpVP-2; other virulences plasmids simply designated as VP; ICEKp1 and the chromosome in rhinoscleromatis).
  • Unique (non-overlapping) nucleotide BLAST hits with >95% identity and >50% coverage are reported. Note multiple hits to the same gene are reported if found (e.g. the NTUH-K2044 genome carries rmpA in the virulence plasmid and also in ICEKp1, which is reported in the rmpA column as rmpA_11(ICEKp1),rmpA_2(KpVP-1)).
  • Truncations in the rmpA and rmpA2 genes are expressed as a percentage of the amino acid length from the start codon, e.g. rmpA_5-54% indicates the RmpA protein is truncated after 54% length of the intact amino acid sequence. These truncations appear to be common, due to insertions and deletions within a poly-G tract, and almost certainly result in loss of protein function.

Antimicrobial resistance determinants

By using the --resistance option, Kleborate will screen for acquired resistance genes and some chromosomal mutations for which there is good evidence of association with drug resistance.

Acquired AMR genes

Kleborate screens input genomes against the ARG-Annot database of acquired resistance genes (updated version from SRST2), which includes allelic variants. It attempts to report the best matching variant for each locus in the genome:

  • Exact nucleotide matches are reported with no further annotation (e.g. "TEM-15").
  • If no exact nucleotide match is found, Kleborate searches for an exact amino acid match, and will report this with a "^" symbol (e.g. "TEM-15^" indicates an exact match to the TEM-15 protein sequence but with 1 or more nucleotide differences). If no exact amino acid match is found, the closest nucleotide match is reported with "*" symbol (e.g. "TEM-30*" indicates no precise nucleotide or amino acid match is found, but the closest nucleotide match is to TEM-30).
  • If the length of match is less than the length of the reported allele (i.e. a partial match), this is indicated with ?.
  • Note that KpSC carry a core beta-lactamase gene (SHV in K. pneumoniae, LEN in K. variicola, OKP in K. quasipneumoniae) that confers clinically significant resistance to ampicillin. As these are present in all genomes, non-ESBL alleles of these genes are not included in the count of acquired resistance genes or drug classes.
  • ESBL alleles of SHV are almost always carried on plasmids (in addition to the intrinsic narrow-spectrum SHV/LEN/OKP allele in the chromosome). However it is possible to have a mutation in a chromosomal SHV gene that gives a match to an ESBL allele, which would be reported in the ESBL column and counted as an acquired gene (and it is very hard to tell the difference without manual exploration of the genetic context). See this paper for more information.
  • Note that oqxAB and fosA are also core genes in K. pneumoniae and don't confer clinical resistance to fluoroquinolones or fosfomycin, hence Kleborate does not report them.

Chromosomal mutations associated with AMR

Using the --resistance option also turns on screening for chromosomal mutations for which there is strong evidence of an association with clinical resistance in KpSC (note these are ONLY reported if the genome was recognised as part of the KpSC):

  • Fluoroquinolone resistance SNPs: GyrA 83 & 87 and ParC 80 & 84. These appear in the 'Flq' column along with acquired qnr genes.
  • Colistin resistance due to truncation or loss of core genes MgrB or PmrB. If these genes are missing or truncated, this information will be reported in the 'Col' column along with acquired mcr genes (truncations are expressed as % amino acid length from the start codon). Note if MgrB and PmrB are present and not truncated then nothing about them will be reported in the 'Col' column.
  • OmpK35 and OmpK36 truncations and point mutations shown to result in reduced susceptibility to beta-lactamases. This information will be reported in the 'Omp' column (truncations are expressed as % amino acid length from the start codon). Note if these core genes are present and not truncated then nothing about them will be reported in the 'Omp' column. The specific effect of OmpK mutations on drug susceptibility depends on multiple factors including what combinations of OmpK35 and OmpK36 alleles are present and what beta-lactamase genes are present (this is why we report them in their own column separate to Bla genes). See e.g. this paper and this one for more information on OmpK genes and drug resistance.

Note these do not count towards acquired resistance gene counts, but do count towards drug classes (with the exception of Omp mutations, whose spectrum of effects depends on the presence of acquired beta-lactamases and thus their impact on specific beta-lactam drug classes is hard to predict).

Reporting of AMR determinants by drug class

All resistance results (both for the gene screen and mutation screen) are grouped by drug class (according to the ARG-Annot DB), with beta-lactamases broken down into Lahey classes, as follows:

  • AGly (aminoglycosides)
  • Bla (beta-lactamases)
  • Bla_broad (broad spectrum beta-lactamases)
  • Bla_broad_inhR (broad spectrum beta-lactamases with resistance to beta-lactamase inhibitors)
  • Bla_Carb (carbapenemase)
  • Bla_ESBL (extended spectrum beta-lactamases)
  • Bla_ESBL_inhR (extended spectrum beta-lactamases with resistance to beta-lactamase inhibitors)
  • Fcyn (fosfomycin)
  • Flq (fluoroquinolones)
  • Gly (glycopeptides)
  • MLS (macrolides)
  • Ntmdz (nitroimidazole, e.g. metronidazole)
  • Phe (phenicols)
  • Rif (rifampin)
  • Sul (sulfonamides)
  • Tet (tetracyclines)
  • Tmt (trimethoprim)
  • Tgc (tigecycline)

Note there is a separate column 'Omp' reporting known resistance-related mutations in the OmpK35 and OmpK36 osmoporins. See above for details.

Note that Kleborate reports resistance results for all antimicrobial classes with confidently attributable resistance mechanisms in KpSC. Not all of these are actually used clinically for treatment of KpSC infections (e.g. Ntmdz, MLS, Rif) but they are still reported as the presence of acquired resistance determinants to these classes is of interest to researchers for other reasons (e.g. these genes can be useful markers of MGEs and MGE spread; there is potential for use of these drugs against other organisms to select for KpSC in co-infected patients or in the environment). For an overview of antimicrobial resistance and consensus definitions of multidrug resistance (MDR), extreme drug resistance (XDR) and pan drug resistance in Enterobacteriaceae, see Magiorakos 2012.

Scores and counts

Kleborate outputs a simple categorical virulence score, and if resistance screening is enabled, an antimicrobial resistance score as well. These scores provide a rough categorisation of the strains to facilitate monitoring resistance-virulence convergence:

  • The virulence score ranges from 0 to 5:
    • 0 = none of the acquired virulence loci (i.e. negative for all of yersiniabactin, colibactin, aerobactin)
    • 1 = yersiniabactin
    • 2 = yersiniabactin and colibactin (or colibactin only)
    • 3 = aerobactin (without yersiniabactin or colibactin)
    • 4 = aerobactin with yersiniabactin (without colibactin)
    • 5 = yersiniabactin, colibactin and aerobactin

Note salmochelin (iro) is not explicitly considered in the virulence score, for simplicity. Salmochelin typically appears alongside aerobactin on the Kp virulence plasmids, and so presence of aerobactin (score of 3-5) generally implies presence of salmochelin. However we prioritise aerobactin in the calculation of the score, as aerobactin is specifically associated with growth in blood and is a stronger predictor of the hypervirulence phenotype (see this review). Salmochelin is also occasionally present with ybt, in the ICE_Kp_ variant - ICE_Kp1_, but this will still score 1.

  • The resistance score ranges from 0 to 3:
    • 0 = no ESBL, no carbapenemase (regardless of colistin resistance)
    • 1 = ESBL, no carbapenemase (regardless of colistin resistance)
    • 2 = Carbapenemase without colistin resistance (regardless of ESBL genes or OmpK mutations)
    • 3 = Carbapenemase with colistin resistance (regardless of ESBL genes or OmpK mutations)

When resistance screening is enabled, Kleborate also quantifies how many acquired resistance genes are present and how many drug classes (in addition to the intrinsic Bla/ampicillin phenotype) have at least one resistance determinant detected. A few things to note:

  • The presence of resistance mutations, and non-ESBL forms of core genes SHV/LEN/OKP, do not contribute to the resistance gene count.
  • Mutations do contribute to the drug class count, e.g. fluoroquinolone resistance will be counted if a GyrA mutation is encountered regardless of whether or not an acquired quinolone resistance (qnr) gene is also present. The exception is Omp mutations, which do not contribute to the drug class count as their effect depends on the strain background and the presence of acquired beta-lactamase enzymes; hence this information is provided in a separate column, and interpretation is left to the user (see above).
  • Note that since a drug class can have multiple resistance determinants, the gene count is typically higher than the class count.

Serotype prediction

Basic capsule prediction with wzi allele typing

By default, Kleborate will report the closest match amongst the wzi alleles in the BIGSdb. This is a marker of capsule locus (KL) type, which is highly predictive of capsule (K) serotype. Although there is not a 1-1 relationship between wzi allele and KL/K type, there is a strong correlation (see Wyres et al, MGen 2016 and Brisse et al, J Clin Micro 2013). Note the wzi database is populated with alleles from the Klebsiella pneumoniae species complex and is not reliable for other species.

The wzi allele can provide a handy way of spotting the virulence-associated types (wzi=K1, wzi2=K2, wzi5=K5); or spotting capsule switching within clones, e.g. you can tell which ST258 lineage you have from the wzi type (wzi154: the main lineage II; wzi29: recombinant lineage I; others: probably other recombinant lineages). But users who are particularly interested in predicting serotype should switch on Kaptive (--kaptive) as described below.

Capsule (K) and O antigen (LPS) serotype prediction using Kaptive

You can optionally turn on capsule and O antigen typing using the dedicated capsule typing tool Kaptive. Note that the Kaptive database comprises O and K loci characterised in the Klebsiella pneumoniae species complex (see Wyres et al, MGen 2016 for K loci, Wick et al, J Clin Micro 2018 for O loci). These loci are sometimes also found in other Klebsiella species but you should expect many novel loci outside the KpSC that will not be detected here.

  • --kaptive_k turns on Kaptive screening of the K locus
  • --kaptive_o turns on Kaptive screening of the O locus
  • --kaptive turns on both (is equivalent to --kaptive_k --kaptive_o)

Note that running Kaptive will significantly increase the runtime for Kleborate (>1 minute extra per genome), but provide much more detailed information about the K and/or O loci and their genes.

If Kaptive is switched on the Kleborate report will include a column to indicate the Kaptive confidence match for the reported best-matching K or O locus (see here for a description of the logic). We recommend reporting only K and O loci with a confidence level of "Good" or better. Calls with confidence level "Low" or "None" should be considered carefully as they may result from assembly problems (fragmentation) or novel variation in the K/O locus. If you think you have found a novel K or O locus and would like us to add it to the Kaptive database please get in touch or post an issue in the Kaptive GitHub repo.

Example output

Test commands

Run these commands to test out Kleborate using some of the test data provided in the /test directory of this repository:

# 1) basic genotyping (no resistance typing; K serotype prediction using wzi allele only)
kleborate -o results.txt -a Kleborate/test/sequences/GCF_002248955.1.fna.gz Kleborate/test/sequences/GCF_003095495.1.fna.gz Kleborate/test/sequences/GCF_000009885.1.fna.gz Kleborate/test/sequences/GCF_900501255.1.fna.gz Kleborate/test/sequences/GCF_000019565.1.fna.gz Kleborate/test/sequences/GCF_000492415.1.fna.gz Kleborate/test/sequences/GCF_000492795.1.fna.gz

# 2) with resistance typing (K serotype prediction using wzi allele only)
kleborate -o results_res.txt --resistance -a Kleborate/test/sequences/GCF_002248955.1.fna.gz Kleborate/test/sequences/GCF_003095495.1.fna.gz Kleborate/test/sequences/GCF_000009885.1.fna.gz Kleborate/test/sequences/GCF_900501255.1.fna.gz Kleborate/test/sequences/GCF_000019565.1.fna.gz Kleborate/test/sequences/GCF_000492415.1.fna.gz Kleborate/test/sequences/GCF_000492795.1.fna.gz

# 3) with resistance typing & full K/O serotype prediction using Kaptive (slower)
kleborate -o results_res_kaptive.txt --all -a Kleborate/test/sequences/GCF_002248955.1.fna.gz Kleborate/test/sequences/GCF_003095495.1.fna.gz Kleborate/test/sequences/GCF_000009885.1.fna.gz Kleborate/test/sequences/GCF_900501255.1.fna.gz Kleborate/test/sequences/GCF_000019565.1.fna.gz Kleborate/test/sequences/GCF_000492415.1.fna.gz Kleborate/test/sequences/GCF_000492795.1.fna.gz

Concise results (stdout)

These are the concise Kleborate results that are printed to the terminal, for example 1:

strain species ST virulence_score Yersiniabactin YbST Colibactin CbST Aerobactin AbST Salmochelin SmST rmpA rmpA2 wzi K_locus
GCF_002248955.1 Klebsiella pneumoniae ST15 0 - 0 - 0 - 0 - 0 - - wzi29 KL106
GCF_003095495.1 Klebsiella pneumoniae ST258 0 - 0 - 0 - 0 - 0 - - wzi154 KL107
GCF_000009885.1 Klebsiella pneumoniae ST23 4 ybt 2; ICEKp1 326 - 0 iuc 1 1 iro 3 18-1LV rmpA_11(ICEKp1),rmpA_2(KpVP-1) rmpA2_3-47% wzi1 KL1
GCF_900501255.1 Klebsiella pneumoniae ST86 3 - 0 - 0 iuc 1 1 iro 1 1 rmpA_2(KpVP-1) rmpA2_4*-50% wzi2 KL2 (KL30)
GCF_000019565.1 Klebsiella variicola subsp. variicola ST146 0 - 0 - 0 - 0 - 0 - - wzi159 KL30
GCF_000492415.1 Klebsiella quasipneumoniae subsp. quasipneumoniae ST1437 0 - 0 - 0 - 0 - 0 - - wzi185 KL46
GCF_000492795.1 Klebsiella quasipneumoniae subsp. similipneumoniae ST1435 0 - 0 - 0 - 0 - 0 - - wzi183 KL21

For example 2 (ie with resistance typing turned on):

strain species ST virulence_score resistance_score Yersiniabactin YbST Colibactin CbST Aerobactin AbST Salmochelin SmST rmpA rmpA2 wzi K_locus AGly Col Fcyn Flq Gly MLS Ntmdz Phe Rif Sul Tet Tgc Tmt Omp Bla Bla_Carb Bla_ESBL Bla_ESBL_inhR Bla_broad Bla_broad_inhR
GCF_002248955.1 Klebsiella pneumoniae ST15 0 0 - 0 - 0 - 0 - 0 - - wzi29 KL106 Aac3-IId^ Mcr3-1* - GyrA-83F;GyrA-87A;ParC-80I - - - CatA1^ - - TetA - - - SHV-28^ - - - - -
GCF_003095495.1 Klebsiella pneumoniae ST258 0 3 - 0 - 0 - 0 - 0 - - wzi154 KL107 Aac3-IId^;AadA2^;Aph3-Ia^;RmtB;Sat-2A;StrA^;StrB MgrB-62%;PmrB-36% - GyrA-83I;ParC-80I - Erm42*;MphA - CatA1^ - SulI;SulII TetG - DfrA12? OmpK35-25%;OmpK36GD TEM-1D^ KPC-2 CTX-M-14 - SHV-11 -
GCF_000009885.1 Klebsiella pneumoniae ST23 4 0 ybt 2; ICEKp1 326 - 0 iuc 1 1 iro 3 18-1LV rmpA_11(ICEKp1),rmpA_2(KpVP-1) rmpA2_3-47% wzi1 KL1 - - - - - - - - - - - - - - - - - - SHV-11^ -
GCF_900501255.1 Klebsiella pneumoniae ST86 3 0 - 0 - 0 iuc 1 1 iro 1 1 rmpA_2(KpVP-1) rmpA2_4*-50% wzi2 KL2 (KL30) - - - - - - - - - - - - - - SHV-187* - - - - -
GCF_000019565.1 Klebsiella variicola subsp. variicola ST146 0 0 - 0 - 0 - 0 - 0 - - wzi159 KL30 - - - - - - - - - - - - - - LEN-24*;LEN-24* - - - - -
GCF_000492415.1 Klebsiella quasipneumoniae subsp. quasipneumoniae ST1437 0 0 - 0 - 0 - 0 - 0 - - wzi185 KL46 Aac6-Ib;StrA*;StrB* - - - - - - CatA2* - SulII - - DfrA14 - - - - - OKP-A-3* -
GCF_000492795.1 Klebsiella quasipneumoniae subsp. similipneumoniae ST1435 0 0 - 0 - 0 - 0 - 0 - - wzi183 KL21 - - - - - - - - - - - - - - - - - - OKP-B-7* -

For example 3 (ie with resistance typing & Kaptive serotyping turned on):

strain species ST virulence_score resistance_score Yersiniabactin YbST Colibactin CbST Aerobactin AbST Salmochelin SmST rmpA rmpA2 wzi K_locus K_locus_confidence O_locus O_locus_confidence AGly Col Fcyn Flq Gly MLS Ntmdz Phe Rif Sul Tet Tgc Tmt Omp Bla Bla_Carb Bla_ESBL Bla_ESBL_inhR Bla_broad Bla_broad_inhR
GCF_002248955.1 Klebsiella pneumoniae ST15 0 0 - 0 - 0 - 0 - 0 - - wzi29 KL107 None O1/O2v2 Very high Aac3-IId^ Mcr3-1* - GyrA-83F;GyrA-87A;ParC-80I - - - CatA1^ - - TetA - - - SHV-28^ - - - - -
GCF_003095495.1 Klebsiella pneumoniae ST258 0 3 - 0 - 0 - 0 - 0 - - wzi154 KL107 Good O2v2 Good Aac3-IId^;AadA2^;Aph3-Ia^;RmtB;Sat-2A;StrA^;StrB MgrB-62%;PmrB-36% - GyrA-83I;ParC-80I - Erm42*;MphA - CatA1^ - SulI;SulII TetG - DfrA12? OmpK35-25%;OmpK36GD TEM-1D^ KPC-2 CTX-M-14 - SHV-11 -
GCF_000009885.1 Klebsiella pneumoniae ST23 4 0 ybt 2; ICEKp1 326 - 0 iuc 1 1 iro 3 18-1LV rmpA_11(ICEKp1),rmpA_2(KpVP-1) rmpA2_3-47% wzi1 KL1 Perfect O1v2 Very high - - - - - - - - - - - - - - - - - - SHV-11^ -
GCF_900501255.1 Klebsiella pneumoniae ST86 3 0 - 0 - 0 iuc 1 1 iro 1 1 rmpA_2(KpVP-1) rmpA2_4*-50% wzi2 KL2 Very high O1v1 Very high - - - - - - - - - - - - - - SHV-187* - - - - -
GCF_000019565.1 Klebsiella variicola subsp. variicola ST146 0 0 - 0 - 0 - 0 - 0 - - wzi159 KL30 Very high O3/O3a Very high - - - - - - - - - - - - - - LEN-24*;LEN-24* - - - - -
GCF_000492415.1 Klebsiella quasipneumoniae subsp. quasipneumoniae ST1437 0 0 - 0 - 0 - 0 - 0 - - wzi185 KL46 Low O3/O3a Very high Aac6-Ib;StrA*;StrB* - - - - - - CatA2* - SulII - - DfrA14 - - - - - OKP-A-3* -
GCF_000492795.1 Klebsiella quasipneumoniae subsp. similipneumoniae ST1435 0 0 - 0 - 0 - 0 - 0 - - wzi183 KL21 Very high O12 Very high - - - - - - - - - - - - - - - - - - OKP-B-7* -

Full results (file)

Here are the full Kleborate results (including assembly quality metrics, and allele calls for all genes in the five MLST schemes) for example 3, written to results_res_kaptive.txt:

strain species species_match contig_count N50 largest_contig ambiguous_bases ST virulence_score resistance_score num_resistance_classes num_resistance_genes Yersiniabactin YbST Colibactin CbST Aerobactin AbST Salmochelin SmST rmpA rmpA2 wzi K_locus K_locus_problems K_locus_confidence K_locus_identity K_locus_missing_genes O_locus O_locus_problems O_locus_confidence O_locus_identity O_locus_missing_genes Chr_ST gapA infB mdh pgi phoE rpoB tonB ybtS ybtX ybtQ ybtP ybtA irp2 irp1 ybtU ybtT ybtE fyuA clbA clbB clbC clbD clbE clbF clbG clbH clbI clbL clbM clbN clbO clbP clbQ iucA iucB iucC iucD iutA iroB iroC iroD iroN AGly Col Fcyn Flq Gly MLS Ntmdz Phe Rif Sul Tet Tgc Tmt Omp Bla Bla_Carb Bla_ESBL Bla_ESBL_inhR Bla_broad Bla_broad_inhR
GCF_002248955.1 Klebsiella pneumoniae strong 73 194261 362142 no ST15 0 0 5 4 - 0 - 0 - 0 - 0 - - wzi29 KL107 ?-+ None 87.89% KL107_05_wzb,KL107_06_wzc,KL107_07_wbaP,KL107_08,KL107_09,KL107_10,KL107_12,KL107_13,KL107_14,KL107_15 O1/O2v2 none Very high 98.52% ST15 1 1 1 1 1 1 1 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Aac3-IId^ Mcr3-1* - GyrA-83F;GyrA-87A;ParC-80I - - - CatA1^ - - TetA - - - SHV-28^ - - - - -
GCF_003095495.1 Klebsiella pneumoniae strong 676 16918 71716 no ST258 0 3 11 17 - 0 - 0 - 0 - 0 - - wzi154 KL107 ? Good 99.94% O2v2 ? Good 98.38% ST258 3 3 1 1 1 1 79 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Aac3-IId^;AadA2^;Aph3-Ia^;RmtB;Sat-2A;StrA^;StrB MgrB-62%;PmrB-36% - GyrA-83I;ParC-80I - Erm42*;MphA - CatA1^ - SulI;SulII TetG - DfrA12? OmpK35-25%;OmpK36GD TEM-1D^ KPC-2 CTX-M-14 - SHV-11 -
GCF_000009885.1 Klebsiella pneumoniae strong 2 5248520 5248520 no ST23 4 0 1 0 ybt 2; ICEKp1 326 - 0 iuc 1 1 iro 3 18-1LV rmpA_11(ICEKp1),rmpA_2(KpVP-1) rmpA2_3-47% wzi1 KL1 none Perfect 100.00% O1v2 none Very high 99.13% ST23 2 1 1 1 9 4 12 9 7 9 6 5 1 1 6 7 7 6 - - - - - - - - - - - - - - - 1 1 1 1 1 21 2 19 5 - - - - - - - - - - - - - - - - - - SHV-11^ -
GCF_900501255.1 Klebsiella pneumoniae strong 134 303226 623663 no ST86 3 0 0 0 - 0 - 0 iuc 1 1 iro 1 1 rmpA_2(KpVP-1) rmpA2_4*-50% wzi2 KL2 none Very high 99.94% O1v1 none Very high 98.46% ST86 9 4 2 1 1 1 27 - - - - - - - - - - - - - - - - - - - - - - - - - - 1 1 1 1 1 1 1 1 1 - - - - - - - - - - - - - - SHV-187* - - - - -
GCF_000019565.1 Klebsiella variicola subsp. variicola strong 3 5641239 5641239 no ST146 0 0 0 0 - 0 - 0 - 0 - 0 - - wzi159 KL30 * Very high 95.40% O3/O3a none Very high 98.72% ST146 16 24 30 27 36 22 55 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - LEN-24*;LEN-24* - - - - -
GCF_000492415.1 Klebsiella quasipneumoniae subsp. quasipneumoniae strong 10 5263297 5263297 yes ST1437 0 0 5 6 - 0 - 0 - 0 - 0 - - wzi185 KL46 ?+* Low 98.02% O3/O3a none Very high 95.96% ST1437 17 19 69 39 185 21 238 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Aac6-Ib;StrA*;StrB* - - - - - - CatA2* - SulII - - DfrA14 - - - - - OKP-A-3* -
GCF_000492795.1 Klebsiella quasipneumoniae subsp. similipneumoniae strong 2 5142035 5142035 yes ST1435 0 0 1 0 - 0 - 0 - 0 - 0 - - wzi183 KL21 * Very high 96.81% O12 none Very high 99.01% ST1435 18 88 128 116 11 99 237 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - OKP-B-7* -

Code testing

Unit tests are available in the /test directory of this repository

Visualising outputs in microreact

A helper script to assist users in formating their Kleborate results for viewing in Microreact is provided in the /scripts directory of this repository

Typing from Illumina reads

If you don't have good quality assemblies, MLST assignment for the chromosomal & virulence locus schemes can also be achieved direct from K. pneumoniae reads using SRST2:

  • Download the YbST, CbST, AbST, SmST allele sequences and profile tables from the data directory in this repository.
  • Install SRST2 if you don't already have it (git clone https://github.com/katholt/srst2).
  • Run SRST2, setting the --mlst_scheme and --mlst_definitions to point to the YbST or CbST allele sequences and profile tables.

Note that currently you can only run SRST2 with one MLST scheme at a time, so in order to type MLST, YbST and CbST you will need to run three separate commands:

srst2 --input_pe reads_1.fastq.gz reads_2.fastq.gz --output YbST --log --mlst_db ybt_alleles.fasta --mlst_definitions YbST_profiles.txt
srst2 --input_pe reads_1.fastq.gz reads_2.fastq.gz --output CbST --log --mlst_db clb_alleles.fasta --mlst_definitions CbST_profiles.txt
srst2 --input_pe reads_1.fastq.gz reads_2.fastq.gz --output Klebs --log --mlst_db Klebsiella_pneumoniae.fasta --mlst_definitions kpnuemoniae.txt

Contact us

Kleborate is under active development with many other Klebs genomic analysis tools and projects in progress.

Please get in touch via the GitHub issues tracker if you have any issues, questions or ideas.

For more on our lab, including other software, see http://holtlab.net

License

GNU General Public License, version 3

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%