Stuck at simulation stage #193

kainth-amoldeep · 2023-05-31T17:25:55Z

Hi, I am trying to use NanoSim 3.0.0. (downloaded source code from (https://github.com/bcgsc/NanoSim/releases/tag/v3.0.0)) for simulating ONT reads. I created my own profile using: ./read_analysis.py transcriptome -i ont_run.fq -rg ucsc_hg38.fasta -rt hg38_transcriptome_gff.fa -annot hg38_transcriptome.gff3 -t 38 . It seems to run fine with the following output:

2023-05-30 15:21:23: Read pre-process and unaligned reads analysis
2023-05-30 15:21:39: Alignment with minimap2 to reference transcriptome
2023-05-30 15:24:33: Alignment with minimap2 to reference genome
2023-05-30 16:00:19: Processing transcriptome alignment file: sam
2023-05-30 16:00:44: Processing genome alignment file: sam
2023-05-30 16:01:14: Parse the annotation file (GTF/GFF3)
2023-05-30 16:01:47: Modeling Intron Retention
2023-05-30 16:01:47: Reading intron coordinates from GFF file
2023-05-30 16:02:30: Read primary genome alignment for each read
2023-05-30 16:07:03: Read primary transcriptome alignment for each read
2023-05-30 16:09:49: Calculating probabilities for each intron retention event
2023-05-30 16:16:23: Aligned reads analysis
2023-05-30 16:16:40: Computing KDEds processed >> 1215001
2023-05-30 16:16:40: Computing 2D KDE for transcriptome ref length
2023-05-30 16:16:41: Computing KDE for aligned region
2023-05-30 16:16:41: Computing KDE for aligned reads
2023-05-30 16:16:42: Computing KDE for unaligned length
2023-05-30 16:16:42: Computing KDE for ht ratio
2023-05-30 16:16:43: Unaligned reads analysis
2023-05-30 16:16:43: match and error models
2023-05-30 16:19:13: Model fitting
2023-05-30 16:19:18: Mismatch fitting start
Mismatch parameters: 0.10299333509559308 0.787357964367962 0.2182370542984321 Residual 6.509615946814762e-05
2023-05-30 16:19:48: Mismatch fitting done
2023-05-30 16:19:48: Insertion fitting start
WARNING! Insertion parameters may not be optimal! 0.8374719781176181 0.9192989587536269 0.008942175282069826 0.9880211696155594 Residual 0.0030656834500610852
2023-05-30 16:21:14: Insertion fitting done
2023-05-30 16:21:14: Deletion fitting start
WARNING! Deletion parameters may not be optimal! 0.9569706479182295 1.032499061802694 0.2866297773531245 0.8752291450180834 Residual 0.0005721754355394459
2023-05-30 16:22:00: Deletion fitting done
2023-05-30 16:22:00: Finished!

But when I try to run simulator using ./simulator.py transcriptome -rt hg38_transcriptome_gff.fa -rg ucsc_hg38.fasta -c training -e expression_abundance.tsv -n 20000 -max 10000 -min 200 -b guppy -r cDNA_1D --fastq -o simulated_1D it gets stuck at:

2023-05-31 09:48:19: Read in reference
2023-05-31 09:48:21: Read in expression profile
2023-05-31 09:48:21: Read in reference genome and create .fai index file
2023-05-31 09:48:21: Read in IR markov model
2023-05-31 09:48:21: Read in GFF3 annotation file
2023-05-31 09:48:41: Read error profile
2023-05-31 09:48:41: Read KDF of unaligned reads
2023-05-31 09:48:41: Read KDF of aligned reads
2023-05-31 09:48:41: Read chimeric simulation information
2023-05-31 09:48:41: Start simulation of aligned reads

I have ensured that my expression_abundance.tsv file have transcript_id with "ENS".
I would really appreciate if you can help me with this. Thanks.

The text was updated successfully, but these errors were encountered:

kmnip · 2023-06-01T04:16:58Z

I think it has to do with these settings: -n 20000 -max 10000 -min 200
It sounds like a similar issue in metagenome mode: #184

@SaberHQ Possibly a bug in transcriptome mode?

SaberHQ · 2023-06-01T19:06:37Z

Thanks for reporting this @kainth-amoldeep

@kmnip I do not think it is related to the infinite loop bug you mentioned (#184) because, in the transcriptome mode, NanoSim first selects a reference transcriptome based on the expression profiles and then based on the 2 dimensional KDE distributions, it picks an aligned length for the simulation. Therefore, max and min lengths are not used. They are only used in generating unaligned reads I believe.

That being said, dear @kainth-amoldeep would you mind trying again not using the min and max parameters and see if it works or not? In addition, you said you ensured that the transcript ids in the expression file start with "ENS". Can you also double-check if they are the same in your reference transcriptome file?

Did you calculate the expression profiles using the module provided by NanoSim? I mean did you run the NanoSim -quantify module?

kmnip · 2023-06-02T05:30:59Z

@SaberHQ Good points!

@kainth-amoldeep I suggest you can also try our development code on the master branch. We have had a number of bugfixes since v3.0.0.

kainth-amoldeep · 2023-06-09T22:16:14Z

Hi,
Thanks for your comments and suggestions. I tried without min max and it was still stuck. @SaberHQ I made an expression abundance file on the basis of short read sequencing; may something has gone wrong in that in terms of nomenclature. However, I was able to simulate reads on the basis of a pretrained model. So, for now, it serves my purpose. Thanks a lot again.

kmnip added the bug label Jun 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stuck at simulation stage #193

Stuck at simulation stage #193

kainth-amoldeep commented May 31, 2023

kmnip commented Jun 1, 2023 •

edited

SaberHQ commented Jun 1, 2023

kmnip commented Jun 2, 2023

kainth-amoldeep commented Jun 9, 2023

Stuck at simulation stage #193

Stuck at simulation stage #193

Comments

kainth-amoldeep commented May 31, 2023

kmnip commented Jun 1, 2023 • edited

SaberHQ commented Jun 1, 2023

kmnip commented Jun 2, 2023

kainth-amoldeep commented Jun 9, 2023

kmnip commented Jun 1, 2023 •

edited