Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Division by zero when trying to build trees? #5

Open
mrmrwinter opened this issue May 18, 2022 · 12 comments
Open

Division by zero when trying to build trees? #5

mrmrwinter opened this issue May 18, 2022 · 12 comments

Comments

@mrmrwinter
Copy link

Hi, when running SubPhaser i get the following error:

Traceback (most recent call last):
  File "/home/531734/.conda/envs/SubPhaser/bin/subphaser", line 33, in <module>
    sys.exit(load_entry_point('subphaser==1.2.5', 'console_scripts', 'subphaser')())
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/__main__.py", line 779, in main
    pipeline.run()
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/__main__.py", line 516, in run
    ltr_bedlines, enrich_ltr_bedlines = self.step_ltr(d_kmers) if not self.disable_ltr else ([],[])
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/__main__.py", line 615, in step_ltr
    d_files = tree.build(job_args=job_args)
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/LTR.py", line 210, in build
    ncpus = [max(1, int(self.ncpu*v/tprop)) for v in prop]
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/LTR.py", line 210, in <listcomp>
    ncpus = [max(1, int(self.ncpu*v/tprop)) for v in prop]
ZeroDivisionError: division by zero

I believe this could be because only one scaffold was identified as a subgenome. Does this sound possible?

Many thanks

Mike

@zhangrengang
Copy link
Owner

Yes, it is because preceding subgenome assignment is failed. Can you post the clustering heatmap?

@zhangrengang
Copy link
Owner

zhangrengang commented May 18, 2022

@mrmrwinter The immediate cause should be that there is no subgenome-specific LTR for building tree. You can skip this step by -disable_ltrtree. And check the results according to the Supplementary Material:

On the clustering heatmap (Fig. 2B) and PCA plot (Fig. 2C), a subgenome is defined as well-phased if it has clearly distinguishable patterns of both differential k-mers and homeologous chromosomes, indicating that each subgenome shares subgenome-specific features as expected. If the subgenomes are not well-phased, the following analyses are meaningless and should be ignored.

@mrmrwinter
Copy link
Author

Hi @zhangrengang, thanks for the fast reply. I had a look at the subgenome assignment results and only one scaffold was assigned to subgenome 2.

image

I double checked the content of this scaffold and it is definitely not a contaminant or particularly difficult region - it is around 120 Kbp and contains a ribosomal subunit. When i remove this scaffold from the assembly and from the configureation file i get the following error:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/Jellyfish.py", line 641, in _filter_kmer
    return kmer, [c/l for c,l in zip(counts, lengths)], tot
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/Jellyfish.py", line 641, in <listcomp>
    return kmer, [c/l for c,l in zip(counts, lengths)], tot
ZeroDivisionError: division by zero
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/531734/.conda/envs/SubPhaser/bin/subphaser", line 33, in <module>
    sys.exit(load_entry_point('subphaser==1.2.5', 'console_scripts', 'subphaser')())
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/__main__.py", line 779, in main
    pipeline.run()
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/__main__.py", line 412, in run
    d_mat = dumps.filter(d_mat, lengths, self.sgs, outfig=histfig, #d_targets=d_targets, 
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/Jellyfish.py", line 487, in filter
    for kmer, freqs, tot_freq in pool_func(_filter_kmer, args, self.ncpu, 
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/site-packages/subphaser-1.2.5-py3.8.egg/subphaser/RunCmdsMP.py", line 346, in pool_func
    for returned in pool_map(func, iterable, **kargs):
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/531734/.conda/envs/SubPhaser/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
ZeroDivisionError: division by zero

I really hope i can get this working as the software looks very promising.

Many thanks

@zhangrengang
Copy link
Owner

@mrmrwinter How about the plot *.kmer.mat.pdf? Your genome may be not a neoallopolyploid. Do you have a dot plot to show the homoeologs?
By the way, you do not need to remove the seqence from the assembly.

@mrmrwinter
Copy link
Author

@zhangrengang Here is the plot you requested:
image

Though definitely allopolyploid, im not sure how recently the hybridisation would have been, but certainly nearer the half a million year mark than last month. This species is also a mitotic parthenogen, with potential for homeologous recombination. I do not have a dotplot showing the dotplots but i have CDS homology data shown in the circos plot below, and links in the chromatin contact map between suspected pairings.

image

image

Hope this helps.

@zhangrengang
Copy link
Owner

@mrmrwinter According to the heatmap and PCA plot, Subphaser is failed to your assembly.
From the Hi-C map, there are many assembly errors. Some contigs are mis-joined into one chromosome and some contigs that belong to the same chromosome are placed into different chromosomes. These massive errors will disrupt SubPhaser. You need to reassembly your chromosomes firstly.
And the homoelogous relationships in the circos plot are not clear (maybe due to the assembly errors). After reassembly, if it is still not clear, you may need to make a dot plot (to show clear relationships) and Ks plot (to show divergent time) using MCSCAN and/or wgdi. According to the Hi-C map, it is probablely a neopolyploid.

@mrmrwinter
Copy link
Author

Thankyou for your feedback. The HiC scaffolding was performed by a third party company so i had little control over the process. I will get in touch with them and see what has happened. What things indicate assembly errors? This is their second attempt at Hi-c scaffolding.

@zhangrengang
Copy link
Owner

360截图17060223645074
@mrmrwinter Three cases as shown in arrows: the green indicates clear boundary between chromosoms that should be split; the yellow indicates contacts that should be joined; and the blue indicates signals that may be homoelogs.

@mrmrwinter
Copy link
Author

@zhangrengang thankyou for taking the time out to address this for me. I had put my faith in the third party but some things have clearly been overlooked. I will fix these things and rerun my analyses. Hopefully it will clear up the multiple hits in the CDS pairings, and allow SubPhaser to run correctly. Please could you recommend any resources for learning to interpret contact maps? I read all i could but still did not feel i could confidently callout features like misassemblies etc.

Again thankyou @zhangrengang!

@zhangrengang
Copy link
Owner

@mrmrwinter You can have a look on this paper:

Dudchenko O, Batra S S, Omer A D et. al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds [J]. Science, 2017, 356 (6333): 92

And if possible, make a practice on Hi-C assembly.

@mrmrwinter
Copy link
Author

Many thanks @zhangrengang.
I will hopefully be in touch when ive fixed these things and have SUbPhaser running!

@zhangrengang
Copy link
Owner

zhangrengang commented May 18, 2022

@mrmrwinter OK. Looking forward to your feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants