Releases: jtamames/SqueezeMeta
Releases · jtamames/SqueezeMeta
v1.6.3
- Conda installations will now prioritize conda binaries instead of the vendored ones in some cases. This will hopefully fix certain issues in which SqueezeMeta was failing on certain distributions/versions.
test_install.pl
now performs additional tests to check that binaries can be executed in the current environment.- Increased speed and reduced memory usage in step 10 (read counting).
- Fixed an error in which projects created with the sequential mode would fail to restart. Note that each sample still has to be restarted individually.
- Fixed an error in which step 16 (DAStool bin merging) would be attempted even if the
--nobins
flag was provided. - SQMtools: fixed an error in
exportPathways
when the requested KEGG map had only arrows. - SQMtools: fixed an error in which figures would not generated properly when `count='percent' was selected if any sample had 0 reads (as could happen when analyzing subsets).
v1.6.2post3
- Update SPAdes to 3.15.5 so it works with python 3.10
v1.6.2post2
- Upgrade to python 3.10 and improve conda packaging, hopefully fix #705 and be more future-proof
v1.6.2post1
- Fix an issue in which pysam was not properly installed when installing SqueezeMeta through conda
v1.6.2
New features
- Added
spades-base
as a possible assembler for SqueezeMeta. This will make SqueezeMeta call SPAdes with no additional flags. Flags for SPAdes can then customized by the user by passing--assembly_options "EXTRA OPTIONS"
when calling SqueezeMeta. More information can be found in the ReadMe and the PDF manual. - Added the utility script
sqm2zip.py
, which allows to pack the essential files from a SqueezeMeta project into a single zip file. - SQMtools:
loadSQM
can now load a project directly from a zip file created bysqm2zip.py
(syntax would be `loadSQM("/path/to/my_project.zip"). - SQMtools: SQMtools is now available in CRAN and can be installed with
install.packages("SQMtools")
in Windows, Mac and Linux computers. - These changes are meant to allow users to easily transfer their data from their clusters/workstations to their personal computers and explore their results there.
- SQMtools:
mostAbundant
andmostVariable
now accept the argumentbycol = TRUE
, which will make these functions operate on columns rather than rows.
Minor changes / bugfixes
- We now use coverage variances in addition to average contig coverages when calling metabat2, which should improve the quality of the resulting bins.
- Mapping results are now stored as BAM files instead of SAM files, which should reduce disk usage.
Known issues / Other announcements
- The
make_databases.pl
script may spend a lot of time in the "Creating SQLite databases" step. We have included a patch to improve this, but still it happens inconsistently (taking a few hours in some systems, and several days in others). Having a lot (1-2 Tb) of free disk space may help.download_databases.pl
should be considered as the preferred way of quickly getting reasonably-up-to-date databases. - We are discontinuing official support for CentOS7, as its default libraries are too outdated now. We plan on supporting SqueezeMeta in Debian, WSL2-Ubuntu and (hopefully) CentOS Upstream in the not so distant future.
v1.6.1post1
- Fix for yesterday's release, which did not include all the intended features.
v1.6.1
New features
- Added the
seqvec2fasta
function toSQMtools
. It will print a named vector containing sequences (as the ones used to store contig and ORF sequences inSQM$contigs$seqs
andSQM$orfs$seqs
as a single fasta-formatted string. - The
make_databases.pl
,download_databases.pl
andconfigure_nodb.pl
scripts now perform more error checking after each database creation step, and will calltest_install.pl
before finishing. This should help detect the instances in which database creation was unsuccessful e.g. due to a failed download.
Minor changes / bugfixes
- Fixed a bug in
remap.pl
. - Fixed a bug introduced in v1.6.0 in which trimmomatic was not being called even when the
--cleaning
flag was provided. - Fixed a bug in which single reads were causing problems during assembly.
- Fixed a bug in which
cover.pl
was using the system's perl interpreter instead the one in the user environment. - Improved SQL queries in
make_databases.pl
to hopefully speed up database creation. - Fixed an issue in which mothur dependencies were not correctly fulfilled by conda.
- Fixed an issue in which restarting a sequential project failed at step 4.
- Fixed several minor issues with the restart mode.
- Fixed
remove_duplicate_markers.pl
so it works in the new binning structure. - Fixed an issue in which SPAdes was using only 400G of memory even if more was available in the system.
engine="data.table
andtax_mode="prokfilter"
are now the default options inloadSQM
.- Fixed an issue in which
subsetSamples
corrupted the binning information, making it impossible to further subset the resulting object. - The PDF SQMtools manual is back. Future availability will depend on whether I can keep getting R's clunky latex interface to produce PDF's in which the tables are rendered correctly.
Known issues
- The
make_databases.pl
may spend a lot of time in the "Creating SQLite databases" step. We have included a patch to improve this, but still it happens inconsistently (taking a few hours in some systems, and several days in others). Having a lot (1-2 Tb) of free disk space may help.download_databases.pl
should be considered as the preferred way of quickly getting reasonably-up-to-date databases.
v1.6.0 - One egg for many baskets
New features
- The script
restart.pl
has been removed. Project restart is now achieved by callingSqueezeMeta.pl --restart -p <project_name>
. The flags-step <STEP> --force-overwrite
can be added to this call in order to restart the pipeline from a specific step. - Users can now control whether the source of bin taxonomy is the LCA algorithm from SqueezeMeta, or the taxonomic assignment performed by CheckM. This can be controlled with the flag
-taxbinmode
. Options ares
(SqueezeMeta only, default),c
(CheckM),s+c
(SqueezeMeta, missing ranks will be completed with CheckM taxonomy when possible) orc+s
(CheckM, missing ranks will be completed with SqueezeMeta taxonomy when possible). - Users can now control the minimum percentage of genes from the same taxa needed in order to taxonomically annotate a contig. This can be done with the flag
-consensus
. sqm_longreads.pl
will now consider partial hits completely contained inside a long read as valid hits. Before, partial hits were only considered valid if they occurred at the beginning or end of the reads. This has a noticeable impact in the annotation percentages. The old behaviour can be reinstated with the flags-n
or-nopartialhits
.sqm2pavian.pl
now works with results fromsqm_reads.pl
andsqm_longreads.pl
.- Added the option
--filter
tosqm_mapper.pl
. When this flag is present, the script will filter a set of input sequences, returning only the ones that did not map to the reference. - SQMtools: SQM objects now track the length, abundance, mapped bases, coverage and coverage per million reads of bins. The corresponding matrices can be found under the
SQM$bins
list. When runningsubsetContigs
, these values will be updated taking in consideration only the contigs from each bin that were selected. - SQMtools: added the
subsetSamples
function to generate subsetted SQM objects containing only the requested samples. - SQMtoools: added the
plotBins
function to generate barcharts with the distribution of bins across samples. - SQMtools: unmapped reads for functions are no longer tracked, since it led to inconsistent results in some cases (see #442). This also affects the tables generated by
sqm2tables.py
. - SQMtools: added the
mostVariable
function, which will return the most variable rows (based on their coefficient of variation) from a data.frame or matrix. The interface is otherwise similar to themostAbundant
function. - SQMtools: SQM objects now track the coverage per million of reads of orfs, contigs, bins and functions. Each can be accessed inside the corresponding list under the
cpm
name."cpm"
is also a validcount
option forplotFunctions
andplotBins
.
Minor changes / bugfixes
- SQMtools will from now on follow the same version numbers as the corresponding SqueezeMeta releases.
- Updated DIAMOND version to 2.0.15.
- Fixed a bug when adding taxonomic assignments to bins, in which a lack of consensus in a high level prevented looking for consensus at deeper levels.
- Fixed a bug in which
data.table
may makeDAStool
crash if it was called with a very high number of threads. - Fixed a bug in which both reads of a pair were counted as mapped even if only one of them actually mapped to the reference. This had little impact in real datasets, but is corrected now.
- Fixed a bug in which custom arguments passed to bowtie2 with
-mapping_options
conflicted in some cases with the--very-sensitive-local
option that we use by default when calling bowtie2.--very-sensitive-local
is now skipped when the user provides custom arguments to bowtie2. - Fixed an uncommon issue in which contigs could end up being assigned to more than one bin after restarting the pipeline.
- Fixed a bug in
sqm_longreads.pl
when using several input files from the same sample. loadSQM
now removes redundant info from the orfs and contigs tables when loading a project intoSQMtools
resulting in less memory usage.- Fixed a bug in which loading a project with
loadSQM
could randomly caused an error. - We no longer provide a PDF manual for SQMtools. The documentation for each function can still be accessed from the R terminal or RStudio.
Compatibility Changes
- Results generated by previous versions of SqueezeMeta will not load into SQMtools 1.6.0 (which corresponds to SqueezeMeta release 1.6.0). Running
19.getcontigs.pl /path/to/project
will make a project generated with SqueezeMeta v1.5 compatible with the new version of SQMtools.
v1.5.2
Minor changes / bugfixes
- Fixed a bug in consensus taxonomy search during binning, in which a bin could get assigned to a low taxonomic rank even if there was no consensus at higher taxonomic ranks.
- Updated DIAMOND version to 2.0.14. This should get rid of several cases in which search against the nr database resulted in out of memory errors.
- Fixed a typo in the PDF manual in which Figure 6 was missing