Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hangs at start of simulation #179

Open
omarkr8 opened this issue Dec 29, 2022 · 5 comments
Open

hangs at start of simulation #179

omarkr8 opened this issue Dec 29, 2022 · 5 comments

Comments

@omarkr8
Copy link

omarkr8 commented Dec 29, 2022

Hi,

I'm finding myself stuck at the start of simulations. Im running the metagenome sim.
for example it will say :
2022-12-28 23:26:25: Read in seq1
2022-12-28 23:26:25: Read in seq2
2022-12-28 23:26:25: Read in abundance profile
2022-12-28 23:26:25: Read error profile
2022-12-28 23:26:25: Read KDF of unaligned reads
2022-12-28 23:26:25: Read KDF of aligned reads
2022-12-28 23:26:25: Read chimeric simulation information
2022-12-28 23:26:25: Simulating sample sample0
2022-12-28 23:26:25: Start simulation of aligned reads

and it just stays here, the output files are created but remain empty.
the strange thing is that it has sort of worked one time in my many attempts, the same command.
but the one time that data is produced and the pipeline seemed to progress, it went through sample0, sample1, but when it starts to simulate sample2, it got stuck.

Now im finding that it gets stuck even at sample0.
I will keep tinkering to see if its resources on my end.

@omarkr8
Copy link
Author

omarkr8 commented Dec 29, 2022

Okay so i had a run successfully complete. perhaps I was asking for too many reads?
my metagenome simulations was setup initially for 4 samples. 2k,4k,8k,16k reads. and it would get stuck throughout.
changing it to 500,1k,2k,4k; i got it to run complete. will do a bit more testing to see if the read numbers really did cause the hang.

@omarkr8 omarkr8 closed this as completed Jan 2, 2023
@omarkr8 omarkr8 reopened this Jan 2, 2023
@omarkr8
Copy link
Author

omarkr8 commented Jan 4, 2023

So now i can consistently run nanosim.
might be related to min and max length of reads im trying to simulate.

my target region is short 500bp, and nanosim will hang(seems to) if i try to make reads with little variation in length ex 490-500. the least i need is 20bp difference.

so question... if im trying to simulate perfect reads, why not make them length perfect too? basically exact copies of the references. or is there a simpler way i can have perfect/perfect reads.

@aastha-batta
Copy link

OK, So, We are using a pretrained model for metagenome simulations, We have a standard read count and 2 samples are getting generated, however when we are changing the no. of species that should be included in the sample the process hangs in the middle and doesnot finish. the readcount is 100 and 1000 so it's not very large for Nanosim to run simulations, I am not understanding why is this happening.

@aastha-batta
Copy link

@omarkr8 do you happen to have any idea or solution for this?

@SaberHQ
Copy link
Member

SaberHQ commented Feb 25, 2023

So now i can consistently run nanosim. might be related to min and max length of reads im trying to simulate.

Nice to hear that you were able to run it consistently now. A couple of questions before we can help you better:

Did you use the pre-trained models or train your own model? Would you please provide the exact command you used? And lastly would you confirm you used the master branch version?

my target region is short 500bp, and nanosim will hang(seems to) if i try to make reads with little variation in length ex 490-500. the least i need is 20bp difference.

Thanks for sharing this. I also suggest you to try training your own model and then using it to generate reads. With this approach, the length distribution in your trained model would be around the range you mentioned, and therefore, the length distribution of synthetic reads would be in the same range. This also helps you with the "perfect" read question you had.

It would be nice to also hear @cheny19 thoughts on this min/max issue.

so question... if im trying to simulate perfect reads, why not make them length perfect too? basically exact copies of the references. or is there a simpler way i can have perfect/perfect reads.

The --perfect option in simulatory.py allows you to generate reads without any errors introduced. The length distribution is derived from the training samples though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants