Releases · bcgsc/NanoSim

13 Apr 07:04

SaberHQ

v3.1.0

23911b6

v3.1.0 Latest

Latest

This release contains several major bugfixes + new added features as outlined below.

General changes:

updated requirements.txt
- added missing package names (fixes #135)
- updated package version (fixes #159, #120, and #131)
fixed bug where head/tail lengths are calculated without considering the strand of the alignment
fixed bug where sequence IDs in _aligned_error_profile do not match those in _aligned_reads.fastq (fixes #151)
set default file compression level to 1 (previously level 6)

genome mode:

fixes bug where -c option crashes

transcriptome mode:

new options for read_analysis.py:
- -c detect chimeric reads
- -q quantify transcript expression
- -n normalize expression values by transcript length
new expression quantification algorithm based on abundance estimation in metagenome mode
fixed bug where identical read lengths are simulated for the same transcript (fixes #155; thanks Haoran Li)
fixed bug where transcripts without a ENS name prefix cannot be simulated, which may result in an infinite loop (fixes #112, #156)
optimized various parts of simulation (see #150, #158)
fixed bug where head/tail lengths are calculated without considering genome alignments in addition to transcriptome alignments (fixes #136)

metagenome mode:

the option --dna_type_list is not required when reference genomes are streamed from RefSeq

Assets 2

17 Sep 21:15

cheny19

v3.0.2

f8970b3

v3.0.2

This release is the version used to generate Meta-NanoSim manuscript.

Changes include:

Update README.md to include more information about dependencies and installation instructions.
Bug fix for the pysam get cigar string function.
Bug fix for simulated read length output.
Included the option for EM base-level abundance quantification without chimeric reads detection.

Assets 2

24 Jun 19:00

cheny19

V3.0.1

fedc3a3

v3.0.1

In this release, we are introducing a new feature about compressed files and have fixed a few bugs as follows:

NanoSim now supports reading .gz sequence files, and bam files. When processing intermediate files, it saves bam files instead of sam files to reduce disk space.
Every subprocess is re-seeded before running, to avoid the repetitive random sequences in simulated reads.
Lognormal distribution simulation and -max_len feature bug was fixed (#118).
Bug fix for read_analysis.py genome mode (#123).
Added clarification to the README file about external programs needed to run NanoSim, including GenomeTools (gt) which is required to work with gtf/gff files for Intron Retention analysis.

Assets 2

19 Apr 18:39

cheny19

v3.0.0

6839a03

V3.0.0

Official release of version 3.0.0

Major improvements from previous beta version:

Quantification of metagenome abundance levels using EM algorithm
Quantification mode now includes metagenome abundance level estimation. Parameters are a bit different now.
requirements.txt includes joblib library, and the version numbers are removed. So users may install the latest versions of each package with best compatibility.

Minor changes:

human_NA12878_cDNA_Bham1_guppy model is re-trained.
README is updated with more info on input files

Assets 3

19 Nov 14:23

cheny19

V3.0.0-beta

699d412

NanoSim v3.0.0 pre-release Pre-release

Pre-release

Here we are announcing NanoSim v3.0.0 pre-release, and we will make it an official one once the manuscript is published. Please note that the tar ball attached doesn't contain any pre-trained models, so the downloading will be much faster.

In this release, NanoSim is able to simulate metagenomes with variable abundance profiles.

Key features include:

Quantify species abundance level, which is not readily available in existing abundance quantification tools
Simulate multiple samples in one batch
Simulate chimeric reads in metagenome mode and genome mode
Simulate abundance variance deviated from expected value

Bug fixes and small improvements:

Fixed the bug in fastq simulation which leads to discrepancy between quality score length and sequence length
Changed the way of importing model files, allowing better compatibility
Re-trained all the models to be compatible with the model importing
Added 2 more pre-trained models for metagenome datasets

Assets 3

09 Jun 04:48

cheny19

v2.6.0

a640384

V2.6.0

In this release, there's a key update in the simulation stage. NanoSim is capable of simulating fastq files now! We characterized a few datasets and used truncated log-normal distribution models to simulate the base quality of unaligned reads, matched bases, erroneous bases for genome and transcriptome reads separately.

Most of the changes in this release are for the simulation stage.

Other features:

Perfect reads can have poly(A) tails now.
Read files and error profiles for unaligned reads are separated from aligned reads now.

Bug fixes:

Minor bugs in IR modeling, eliminated the exon extraction biases and read orientation problem
Reversed the strandness information in simulation, which was opposite to the real orientation
Solved occasional crashes when simulating unaligned reads
Fixed the reversed head/tail length for reads from negative strand in transcriptome simulaiton
Added missing file in pre-trained model

Assets 3

24 Jan 01:49

cheny19

v2.5.1

528a78e

NanoSim v2.5.1

To download only the source scripts, please go to url

Features

In this release, we provide 6 pre-trained models, including 3 RNA-seq models, which are human direct RNA, cDNA, mouse cDNA, and 3 DNA-seq models, which are human DNA dataset basecalled by 3 different basecalling algorithms. These models can be downloaded from pre-trained_models.
We also provide a few more homopolymer models to mimic different basecalling algorithms under different sequencing settings (direct RNA or cDNA 1D2 or cDNA 1D).
We implemented -r, --read_type in transcriptome simulation mode. Since different read type, directRNA, cDNA 1D2, or cDNA 1D, has an effect on the homopolymer bias, users can specify which one to simulate based on.

Minor bug fixes

A minor bug in detect_ir module in the characterization stage.
Prevents python from spawning out processes when only one processor is requested.
Display the correct help information for transcriptome simulation when wrong parameters are passed in.

Assets 3

23 Dec 23:35

cheny19

v2.5.0

5f8d7f9

v2.5.0

In this release, we implemented a few new features and resolved a few bugs.

New features:

Multiprocessing in the simulation stage. Based on our experience, 4 to 12 processers balance well between runtime and memory usage for simulating 1 million reads. The memory increases roughly linearly due to the nature of Python multiprocessing. As a rough estimate, it takes less than 5G memory to simulate human transcriptome with 4 processers.
Homopolymer simulation. For this parameter, we provide three options, each targeting each basecaller: Albacore, Guppy, and Guppy + flipflop model
Simulate aligned reads first, and then unaligned reads. These two types of reads are stored in separate files for better user experience.

Bug fixes:

Fixed retained intron / deleted exon problem in error calculation
Fixed index out of range bug in the simulation stage

Assets 2

18 Jul 21:51

SaberHQ

v2.4-beta

f900404

Simulating transcriptome ONT reads Pre-release

Pre-release

This is a pre-release version which is now capable of simulating both genomic and transcriptomic (cDNA and directRNA) ONT reads with even increased performance. Users may run the pipeline in "genome" or "transcriptome" mode. The transcriptome mode also models features of the library preparation protocols used, including intron retention events in cDNA and direct RNA reads. Further, it profiles transcript expression patterns.

We provided a very comprehensive README file for more information on how to run the pipeline in both modes.

Users who may have tried Trans-NanoSim before, can now rely on this version to simulate transcriptome ONT reads.

Major updates since pre-release v2.3-beta:

Added an optional flag (--uracil) to convert the thymine (T) bases to uracil (U) in the output fasta format. It is helpful if you are dealing with direct RNA reads.
Fixed a bug related to input file requirements when you use (--no_model_ir). Refer here: #63
Increased simulation speed substantially when IR modelling is not set (--no_model_ir). It performs 5-folds faster now. We also removed some redundant and unnecessary code lines to improve the overall performance of the pipeline.
As for "Perfect" reads (--perfect), we are now considering expression profiles when simulating them. Therefore your "perfect" error-free reads are going to follow your desired expression levels as well.

Please keep using the pipeline and share your thoughts on it. Cheers!

Assets 2

17 Jun 23:31

SaberHQ

v2.3-beta

6f44514

Simulating transcriptome ONT reads Pre-release

Pre-release

NOTE: Please do not use this release as it has an input requirement bug.

We provided a very comprehensive README file for more information on how to run the pipeline in both modes.

Users who may have tried Trans-NanoSim before, can now rely on this version to simulate transcriptome ONT reads.

This version has been tested on Python 2.7 and Python 3.6 with the latest compatible packages respectively.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Official release of version 3.0.0

Releases: bcgsc/NanoSim

v3.1.0

v3.0.2

v3.0.1

V3.0.0

Official release of version 3.0.0

NanoSim v3.0.0 pre-release

V2.6.0

NanoSim v2.5.1

v2.5.0

Simulating transcriptome ONT reads

Simulating transcriptome ONT reads