-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndexError: list index out of range #202
Comments
Hey @weishwu It looks like that the IndexError happens when NanoSim is trying to calculate the ratio of head sequence length over the head+tail sequence length. This step is needed to add the head and tail regions to the generated reads. The "IndexError: list index out of range" was a bug related to setting Can you try the latest committed version instead of the released version and see if it works? Besides, I see you asked for 20k reads to be simulated. Can you confirm how many reads are simulated? |
Hi @SaberHQ Traceback (most recent call last):
File "/home/wgallego/mambaforge/envs/nanosim/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/wgallego/mambaforge/envs/nanosim/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/wgallego/mambaforge/envs/nanosim/bin/simulator.py", line 1294, in simulation_aligned_genome
head_vs_ht_ratio = head_vs_ht_ratio_list[each_read]
IndexError: list index out of range I installed NanoSim on a new env using mamba
I'm using the latest model available for download on github, and I requested 22519449 reads to be simulated. Here is the input command: simulator.py genome -rg ../germ_hap_1.fasta -c ./human_giab_hg002_sub1M_kitv14_dorado/hg002_nanosim_sub1M -t 15 -n 22519449 I have counted the generated reads and they seem to correspond (maybe missing one): $ wc -l simulated_aligned_reads.fasta
44941612 simulated_aligned_reads.fasta
$wc -l simulated_unaligned_reads.fasta
97284 simulated_unaligned_reads.fasta
# Obtained lines:
44941612 + 97284 = 45038896
# expected lines:
22519449 * 2 = 45038898 Here is the full output: simulator.py genome -rg ../germ_hap_1.fasta -c /mnt/trcanmed/wgalleg
o/simul/nanosim_models/human_giab_hg002_sub1M_kitv14_dorado/hg002_nanosim_sub1M -t 15 -n 22519449
running the code with following parameters:
ref_g ../germ_hap_1.fasta
model_prefix /mnt/trcanmed/wgallego/simul/nanosim_models/human_giab_hg002_sub1M_kitv14_dorado/hg002_nanosim_sub1M
out simulated
number [22519449]
perfect False
kmer_bias None
basecaller None
dna_type linear
strandness None
sd_len None
median_len None
max_len inf
min_len 50
fastq False
chimeric False
num_threads 15
2024-06-03 18:12:19: /home/wgallego/mambaforge/envs/nanosim/bin/simulator.py genome -rg ../germ_hap_1.fasta -c /mnt/trcanmed/wgallego/simul/nanosim_models/human_giab_hg002_sub1M_kitv14_dorado/hg002_nanosim_sub1M -t 15 -n 22519449
2024-06-03 18:12:19: Read in reference
2024-06-03 18:12:48: Read error profile
2024-06-03 18:12:48: Read KDF of unaligned reads
/home/wgallego/mambaforge/envs/nanosim/lib/python3.7/site-packages/sklearn/base.py:318: UserWarning: Trying to unpickle estimator KernelDensity from version 0.23.2 when using version 0.22.1. This might lead to breaking code or invalid results. Use at your own risk.
UserWarning)
2024-06-03 18:12:49: Read KDF of aligned reads
2024-06-03 18:12:51: Read chimeric simulation information
2024-06-03 18:12:51: Start simulation of aligned reads
Process Process-10:: Number of reads simulated >> 22400001
Traceback (most recent call last):
File "/home/wgallego/mambaforge/envs/nanosim/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/wgallego/mambaforge/envs/nanosim/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/wgallego/mambaforge/envs/nanosim/bin/simulator.py", line 1294, in simulation_aligned_genome
head_vs_ht_ratio = head_vs_ht_ratio_list[each_read]
IndexError: list index out of range
2024-06-03 23:47:13: Number of reads simulated >> 22470001
2024-06-04 03:31:30: Start simulation of random reads
2024-06-04 03:32:48: Number of reads simulated >> 22510001
2024-06-04 03:33:13: Finished! |
Hi @waltergallegog ! Thanks for your interest in using NanoSim and reporting this issue. It is a bit hard for me to trace back the issue without running and testing it on my end and without having access to the exact reference genome. However, from the error reported here, the error occurs at the following line:
My best guess would be that the error is related to that filtering stage. However, I need to do some benchmarks to narrow it down and make sure that is the case. Unfortunately I am busy until end of June. However, I will have some time to take a look next month for sure. I will keep you updated on this. In the meantime, please try simulating some small number of reads and see if this happens or not. |
Hi @SaberHQ Thanks for the feedback Regarding the version, I'm using the one installed with mamba (v3.1.0) which from what I see is outdated with respect to the latest github commit, so I will update NanoSim and test again. |
My commdand-lines:
log:
Is this error "IndexError: list index out of range" ignorable?
The text was updated successfully, but these errors were encountered: