Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nextpolish2.py never stop #108

Open
yangfangyuan0102 opened this issue Feb 27, 2023 · 6 comments
Open

nextpolish2.py never stop #108

yangfangyuan0102 opened this issue Feb 27, 2023 · 6 comments

Comments

@yangfangyuan0102
Copy link

yangfangyuan0102 commented Feb 27, 2023

Describe the bug
Hi, Dear Author,
After all ONT reads were mapped, I run nextpolish2.py, but this running never stop even if all CPUs and MEM have stopped working.
It is able to output some polished contigs. But it's incomplete. It looks like stucking at one contig. But this may not be a problem with my genome assembly, as I tried several genomes.
I'm sure my step is no problem because it worked fine on my previous machine (CPU AMD 5900X).

Operating system
Ubuntu 22.10
CPU intel 13900K. This CPU cores are heterogeneous, does it matter?

GCC
I have reinstalled nextpolish1.4.1 using conda.
python3.10

Thanks very much

@moold
Copy link
Member

moold commented Feb 27, 2023

Try to kill the main task, and rerun.

@yangfangyuan0102
Copy link
Author

Try to kill the main task, and rerun.

Hi, Dr. Hu,
Thanks for your quick reply. Rerun nextpolish2.py will skip some corrected contigs, but will stuck as where I stuck before. The task "sleep" again. The modified time of output file was reflashed, but the file size had no change.

@moold
Copy link
Member

moold commented Feb 27, 2023

Maybe you have encountered a bug, so could you extract the unfinished sequence and its corresponding bam file and send it to me? so I can reproduce this bug and fix it.

@yangfangyuan0102
Copy link
Author

yangfangyuan0102 commented Feb 27, 2023

The ONT reads were corrected by illumina reads first of all. I use the corrected reads as input for nextdenovo and polish. Is this a cause?

Please download these files in 7 days:
链接:https://pan.baidu.com/s/1xKnUyq-2tmtRskofy-4u3w

@SalvadorGJ
Copy link

Hi,

I may have the same problem also with nextpolish2.py. I made my own custom alignment using minimap2, and used samtools to filter and index the BAM. I'm working using 32 cores, trying to polish a huge genome of 26 Gb using ONT reads, with a window size of 100M.

    ls `pwd`/${bam_file} > ${bam_file}.fofn
    nextpolish2.py -g ${genome_file} -l ${bam_file}.fofn -r ont \\
        -p ${task.cpus} -w ${window_size}M -sp False -o ${file_name}.nextpolish_polished.no-splitting.fa

The process outputs a FASTQ file of 20 Gb after 4 hours, but keeps running up to 12 hours, and the size of the output never changed.

[70014 INFO] 2023-06-28 11:35:06 Corrected step options:
[70014 INFO] 2023-06-28 11:35:06 
split:                        0
auto:                         True
block:                        None
process:                      32
read_type:                    1
block_index:                  all
uppercase:                    False
window:                       100000000
alignment_score_ratio:        0.8
alignment_identity_ratio:     0.8
bam_list:                     purgedClipped.ntLink_round1_k56.w1000.a1.gap_fill.sorted.merged.bam.fofn
genome:                       Amex6.0-purgedClipped_contigs.k56.w1000.z1000.ntLink_round1_k56.w1000.a1.gap_fill.fa
out:                          Amex6.0-purgedClipped_contigs.k56.w1000.z1000.ntLink_round1_k56.w1000.a1.gap_fill.nextpolish_polished.no-splittin
g.fa

Aditionally, the log has this warning repeated a tons of times:

[70014 WARNING] 2023-06-28 11:37:14 Adjust -p from 32 to 32, -w from 100000000 to 5000000, logical CPUs:152, available RAM:~1416G, use -a to di
sable automatic adjustment.
[109706 INFO] 2023-06-28 11:41:34 Start a corrected worker in 109706 from parent 70014
python: ctg_cns.c:2787: find_sup_alns: Assertion `i != sup_aln->i' failed.
python: ctg_cns.c:2787: find_sup_alns: Assertion `i != sup_aln->i' failed.
[W::hts_idx_load2] The index file is older than the data file: /scratch-cbe/users/salvador.gonzales/1_AmexGenomeAnnotation/0_AmexGenomeUpgrade/2_HiC_Scaffolding/97/ffdd9a7e1b0ee1aa080c55b2a6353c/purgedClipped.ntLink_round1_k56.w1000.a1.gap_fill.sorted.merged.bam.csi

@yangfangyuan0102 were you able to solve the issue?

Kind regards,
Salvador

@yangfangyuan0102
Copy link
Author

@SalvadorGJ

Hi, I still don't know how this happened. My tips are: make sure the genome comes from the raw ONT reads you used for polishing. That is, the genome and ONT are preferably without additional modifications before polishing. You also need to ensure that raw ONT reads are correctly mapped to the genome, e.g., using minimap2 -x map-ont. I'm guessing that extra processing of BAM files can also cause problems, as you said "filter the BAM".
Best wishes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants