Skip to content

Releases: metagenome-atlas/atlas

All in the sample table

11 Oct 08:24
Compare
Choose a tag to compare

What's Changed

  • Qc reads, assembly are now written in the sample.tsv from the start. This should fix errors of partial writing to the sample.tsv #695
  • It also allows you to add external assemblies.
  • singletons reads are no longer used trough the pipeline.
  • This changes the default paths for raw reads and assemblies.
    assembly are now in Assembly/fasta/{sample}.fasta
    reads: QC/reads/{sample}_{fraction}.fastq.gz

Seemless update: If you update atlas and continue on an old project. Your old files will be copies. Or the path defined in the sample.tsv will be used.

Co-binning

17 Aug 15:10
5caa6ca
Compare
Choose a tag to compare

Co-binning with sub-groups

#683

In this new version, Atlas uses binning with co-abundance as default.
While binning each sample individually is faster, using co-abundance for binning, by quantifying the coverage of contigs across multiple samples provides valuable insights about contig co-variation.

See also my blog post

Starting with version 2.18, atlas places every sample in a single BinGroup and defaults to vamb as the binner unless there are very few samples. For fewer than 8 samples, metabat is the default binner.

The defaults are fine except when you have many samples (>150) where atlas gives a warning that you should put sour samples in more than one bin group.

Note

Previously each sample was put in its own BinGroup optimized for single-sample binning.
Running vamb in those versions would consider all samples, regardless of their BinGroup.
Hence updating to v2.18 might cause errors if using a sample.tsv file from an older Atlas version.
You can resolve this by assigning a unique BinGroup to each sample.

Link to documentation

Full Changelog: v2.17.2...v2.18.0

v2.17.2

21 Jul 12:21
cdd2581
Compare
Choose a tag to compare

Fixes

Use skani for genome clustering

15 Jun 13:11
da97853
Compare
Choose a tag to compare

Skani

The tool Skani claims to be better and faster than the combination of mash + FastANI as used by dRep
I implemented the skin for species clustering.
We now do the species clustering in the atlas run binning step.
So you get information about the number of dereplicated species in the binning report. This allows you to run different binners before choosing the one to use for the genome annotation.
Also, the file storage was improved all important files are in Binning/{binner}/

My custom species clustering does the following steps:

  1. Pre-cluster genomes with single-linkage at 92.5 ANI.
  2. Re-calibrate checkm2 results.
  • If a minority of genomes from a pre-cluster use a different translation table they are removed
  • If some genomes of a pre-cluster don't use the specialed completeness model we re-calibrate completeness to the minimum value.
    This ensures that not a bad genome evaluated on the general model is preferred over a better genome evaluated on the specific model.
    See also https://silask.github.io/post/better_genomes/ Section 2.
  • Drop genomes that don't correspond to the filter criteria after re-calibration
  1. Cluster genomes with ANI threshold default 95%
  2. Select the best genome as representative based on the Quality score Completeness - 5x Contamination

New Contributors

Full Changelog: v2.16.3...v2.17.0

GTDB v8

17 May 13:35
Compare
Choose a tag to compare

Save GTDB v8 in download folder for GTDB v8 Thanky to @strejcem

V2.16

12 May 13:54
af8322b
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v2.15.2...v2.16.1

v2.15.2

04 May 14:56
Compare
Choose a tag to compare

What's Changed

  • Annotate gene catalog with Kegg, CAZy using DRAM
  • You can turn off GUNC

Full Changelog: v2.15.1...v2.15.2

GUNC'n'More

13 Apr 22:42
Compare
Choose a tag to compare

What's Changed

  • Use Gunc
  • New Folder organisation: Main output files for Binning are in the new folder Binning
  • Use hdf-format for gene catalogs. Allow efficient storage and selective access to large count and coverage matrices from the genecatalog. (See docs for how to load them) #621
  • Semibin v. 1.5 by @SilasK in #622

Use checkM2

03 Feb 13:56
c0b97a7
Compare
Choose a tag to compare

What's Changed

Thank you @trickovicmatija for your help.

Full Changelog: v2.13.1...v2.14.0

V2.13

25 Nov 13:04
Compare
Choose a tag to compare

What's Changed

  • use minimap for contigs, genecatalog and genomes in #569 #577
  • filter genomes my self in #568
    The filter function is defined in the config file:
genome_filter_criteria: "(Completeness-5*Contamination >50 ) & (Length_scaffolds >=50000) & (Ambigious_bases <1e6) & (N50 > 5*1e3) & (N_scaffolds < 1e3)"

The genome filtering is similar as other publications in the field, e.g. GTDB. What is maybe a bit different is that genomes with completeness around 50% and contamination around 10% are excluded where as using the default parameters dRep would include those.

  • use Drep again in #579
    We saw better performances using drep. This scales also now to ~1K samples
  • Use new Dram version 1.4 by in #564

Full Changelog: v2.12.0...v2.13.0