Nanosim hangs in the middle #184

aastha-batta · 2023-02-17T11:59:48Z

OK, So, We are using a pretrained model for metagenome simulations, We have a standard read count and 2 samples are getting generated, however when we are changing the no. of species that should be included in the sample the process hangs in the middle and does not finish. the read count is 100 and 1000 so it's not very large for NanoSim to run simulations, I am not understanding why is this happening.
Not only this if we are making any small change the process gets stuck and until we restart the terminal it does not run successfully.

kmnip · 2023-02-17T20:30:20Z

Can you please report the exact command?
And are you using the code from the master branch?

aastha-batta · 2023-02-18T06:07:35Z

simulator.py metagenome -gl metagenome_list.tsv -a abundance.tsv -dl dna_type.tsv -c /home/devuser/omega_id/nanosim/NanoSim/pre-trained_models/metagenome_ERR3152364_Even/training -o bacteria --fastq -b guppy -t 10 --abun_var -0.5 0.5 --chimeric --max_len 1700
this is the command we are using.

aastha-batta · 2023-02-18T06:09:06Z

and yes we used it from master branch

kmnip · 2023-02-19T22:27:41Z

As a test, you could try taking out these options: --abun_var -0.5 0.5 and --max_len 1700

rameezmj · 2023-02-20T04:23:11Z

I ran the code removing --max_len 1700. The process is running fine. Out of the nine species DNA sequence provided as input, eight are not longer than 1700bp. The remaining one is whole genome sequence in single fasta which is long ~4500000bp. When I removed this single one (again without --max_len 1700), the process does not complete as mentioned in the original issue. What does it mean? How can I use nanosim to simulate small reads <1700bp?

SaberHQ · 2023-02-25T01:01:07Z

Hi @aastha-batta Do you get any sort of error messages in those cases or is the script stuck with no output? I suspect this might be also related to issue #185

We will look into it. Thanks for reporting this.

SaberHQ · 2023-02-25T01:05:41Z

Not only this if we are making any small change the process gets stuck and until we restart the terminal it does not run successfully.

Would you also kindly provide more information on this @aastha-batta ? What do you mean by making "any small change"?

aastha-batta · 2023-02-27T05:14:20Z

Hello @SaberHQ OK, so basically i tried changing the number of entries in the genome list and the subsequent lists to see if the script simulator.py is able to generate the results every time i run it successfully. It was not happening. I am yet to try this with a self trained model. I was not able to point out the exact reason for it to get stuck, especially when we are using the --max_reads flag.
I went through the issue #185 and i went through the code again after reading the comment posted. If the length variable mentioned in extract_reads and as mentioned in the comment is indeed about --max_len and sequence length passed in the inputs (genome_list) then we are using --max_len 1700 which is larger than the sequence lengths present in the genome_list i passed in my input and i think extract_read is handling that condition well. And i ran the command mentioned by @aastha-batta me in my previous comments simulator.py metagenome -gl metagenome_list.tsv -a abundance.tsv -dl dna_type.tsv -c /home/devuser/omega_id/nanosim/NanoSim/pre-trained_models/metagenome_ERR3152364_Even/training -o bacteria --fastq -b guppy -t 10 --abun_var -0.5 0.5 --chimeric --max_len 1700 , if i run this in a loop (5 times) it's giving a success rate of 40% i.e it completes successfully twice and the rest of the times i had to interrupt. This is an unusual behavior as the process should not stop so abruptly if it is getting completed once or twice. Also, there is a whole genome sequence available in the list which is ~4500000bp long and if we remove this the simulation is not getting completed at all even if i remove --max_len flag. I previously theorized that since we are using an old pre-trained model this anomaly is coming up, there might be some compatibility issues with this model?.. i timed some of the functions in the script there is no defined step where it was getting stuck so i really don't know what is happening. The problem reported in issue #185 might be something but i don't think it is related to my issue. Kindly let me know if my hunch is correct (for my problem) or if there is something else i need to check or if there is any function in simulator.py i should check and run separately to record it's behavior. Thank You.

aastha-batta · 2023-02-27T11:13:49Z

Also, @SaberHQ I was using the conda channel - bioconda nanosim package and since it has not been updated for the most recent commit in simulator.py it is using the older version of simulator.py and is generating error profiles incorrectly.

danpal96 · 2023-03-08T20:38:45Z

I have been trying to undestand the simulator.py script in metagenome mode and when it gets stuck in an infinite loop.

Let's say that I have genomes with max length of 1000 and I specify --max_len 1500 . In the function extract_read ,
if the length given to the function is >1000, the code:

if not s or length > max(seq_len[s].values()):
    s = random.choice(list(seq_len.keys()))

will change the species (variable s) but it will never find a species with the desired length.

Other case is the one I reported in #185, when length == max(seq_len[s].values()).

Now let's say that I have genomes with max length of 1000 and I specify --max_len 999 . In theory this shouldn't get stuck but when adding mutations in the next code the list of lengths (variable seg_length_list in this case) gets modified and the lengths could now be >1000 and get stuck.

for each_ref in ref_length_list:
    middle, middle_ref, error_dict, error_count = \
        error_list(each_ref, match_markov_model, match_ht_list, error_par, trans_error_pr, fastq)
    total += middle
    seg_length_list.append(middle_ref)
    seg_error_dict_list.append(error_dict)
    seg_error_count_list.append(error_count)

In my tests if I specify --max_len of 100 less than the minimum genome length it never gets stuck in an infinite loop.

ahfitzpa · 2023-03-13T11:27:25Z

I am experiencing similar issues. I use the metagenome mode to simulate 24 mixed composition samples (norovirus). I input amplicons of 570 bp in a log normal distribution of abundance, with two to ten genotypes per sample. I have uploaded an example of my input files. I have tried running this with and without max and minimum lengths, with and without --abun_var, with the even and log metagenome training files. I have attached the sample input files in the zip folder. As all my input sequences are 570/579 bp, is this creating the hang issue?

This is what I run on a HPC cluster:
`cd simulated_sequencing

module load nanosim/3.1.0
source activate nanosim

simulator.py metagenome
-gl genome_list.tsv
-a abun.tsv
-dl dna_type.tsv
--chimeric
-b guppy
--fastq
--seed 10032
-t 10
-c NanoSim/pre-trained_models/metagenome_ERR3152364_Even/training

conda deactivate
module unload nanosim/3.1.0

conda deactivate
`
The simulation begins but hangs and never completes. Sometimes nothing is written to any of the files depending on the combination of conditions (lengths, training files). In the example provided I have a few error profiles and fastq outputs generated:

example_simulation.zip

Thanks for your help with this matter. I would really like to apply NanoSim to this case. so any advice is appreciated.

SaberHQ · 2023-03-13T16:51:40Z

Thank you @ahfitzpa and everyone else for reporting this. I am sorry to hear that you had problems running the tool.

My colleagues and I will look at it and fix it. I hope you can expect to hear back from us in 3 weeks.

@kmnip @cheny19 @theottlo

kmnip · 2023-03-18T00:13:26Z

if I specify --max_len of 100 less than the minimum genome length it never gets stuck in an infinite loop.

This worked because the ~100 nt buffer accounted for the insertion error bases (and possibly head/tail regions)

I think @danpal96 is right about the buggy parts of the code.

kmnip · 2023-03-31T06:40:19Z

Hi @danpal96, @ahfitzpa , @aastha-batta , and @rameezmj

I have created pull request #189. Can you please check whether my changes would resolve the hanging issues on your end?

danpal96 · 2023-05-04T22:14:41Z

@kmnip I made some small tests and it didnt get stuck, I will do more tests when I have more time. Sorry for the late response for some reason github is not notifying of the mentions

kmnip · 2023-05-05T05:39:15Z

@danpal96 No worries, we appreciate your help!

aastha-batta · 2023-05-16T11:16:32Z

Hi @kmnip Yes, we will look into the changes. I am sorry I was unable to reply to this. Will let you know asap.

SaberHQ · 2023-05-16T19:43:08Z

Hi @danpal96, @ahfitzpa , @aastha-batta , and @rameezmj

I have created pull request #189. Can you please check whether my changes would resolve the hanging issues on your end?

Thanks, @kmnip. This pull request has also been tested and merged into the master branch.

SaberHQ added help wanted question labels Feb 25, 2023

SaberHQ added the bug label Mar 13, 2023

kmnip mentioned this issue Mar 31, 2023

Potential bug-fixes for infinite loops in metagenome mode #189

Merged

SaberHQ mentioned this issue May 17, 2023

ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required. #190

Open

kmnip mentioned this issue Jun 1, 2023

Stuck at simulation stage #193

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nanosim hangs in the middle #184

Nanosim hangs in the middle #184

aastha-batta commented Feb 17, 2023

kmnip commented Feb 17, 2023

aastha-batta commented Feb 18, 2023

aastha-batta commented Feb 18, 2023

kmnip commented Feb 19, 2023

rameezmj commented Feb 20, 2023 •

edited

SaberHQ commented Feb 25, 2023

SaberHQ commented Feb 25, 2023

aastha-batta commented Feb 27, 2023

aastha-batta commented Feb 27, 2023

danpal96 commented Mar 8, 2023

ahfitzpa commented Mar 13, 2023

SaberHQ commented Mar 13, 2023

kmnip commented Mar 18, 2023 •

edited

kmnip commented Mar 31, 2023

danpal96 commented May 4, 2023

kmnip commented May 5, 2023

aastha-batta commented May 16, 2023

SaberHQ commented May 16, 2023

Nanosim hangs in the middle #184

Nanosim hangs in the middle #184

Comments

aastha-batta commented Feb 17, 2023

kmnip commented Feb 17, 2023

aastha-batta commented Feb 18, 2023

aastha-batta commented Feb 18, 2023

kmnip commented Feb 19, 2023

rameezmj commented Feb 20, 2023 • edited

SaberHQ commented Feb 25, 2023

SaberHQ commented Feb 25, 2023

aastha-batta commented Feb 27, 2023

aastha-batta commented Feb 27, 2023

danpal96 commented Mar 8, 2023

ahfitzpa commented Mar 13, 2023

SaberHQ commented Mar 13, 2023

kmnip commented Mar 18, 2023 • edited

kmnip commented Mar 31, 2023

danpal96 commented May 4, 2023

kmnip commented May 5, 2023

aastha-batta commented May 16, 2023

SaberHQ commented May 16, 2023

rameezmj commented Feb 20, 2023 •

edited

kmnip commented Mar 18, 2023 •

edited