Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpretation of (maybe) hybrid genomes - multiple species #140

Open
Sofieagerbaek opened this issue Feb 9, 2024 · 1 comment
Open
Labels
0.2.5 Double-hung Smudgeplot done with the 0.2.5 Double-hung with curtains version genomescope_included smudgeplot_included if smudgeplot was posted with the quesiton / problem

Comments

@Sofieagerbaek
Copy link

Hiya!

First of all thanks for a great tool, i have mainly used it to double check my assembly sizes, as the high contents of repetitive regions have made assembly a bit tricky and I need to ensure they are haploid.

I have a few different species of the fungal genus Malassezia, and while it is not detrimental to my research to determine the ploidy I found these result pretty interesting, and maybe you do aswell - and can give a hint of what they might suggest.
I am not very experienced in genomics, so I am sorry if this is perhaps just my lack of understanding the subject!

Current literature have described some hybrid strains in this genus, which are diploid, but the species that I have in my research are not as well described (and not believed to be hybrids), and they are still supposed to be mainly haploid.

The initial genomescope (k=21) for 1 M. globosa - genome size should be ~8.7 mb
image
And the accompanying smudge:
image

Tried to capture that first peak as part of the model and not error, as you did in the Saccharomyces tutorial:

gs_mglob2

And the smudgeplot for this coverage estimate:

8744_n131_smudgeplot

I am a bit confused about the smudges being so far to the left?

For comparison here a some more plots for other species in the same genus (i haven't lowered the posteior coverage on these ones)
M. restricta, should be 7,2 mb
image

image

Another M. globosa:
image

image

Those two look very similar - so it can't really be a mistake..

And a hybrid strain: (M. furfur) ~14mb
image

image

All of these samples are isolates, and the species are 'potentially' capable of sexual reproduction, but no one has observed it (yet), so current consensus is only asexual propagation..

Please let me know what you think? I got the 'share this weird smudeplot on github' messages quite a few times these last few days running all of my fungal reads, so I thought i should take up the offer :))

-Sofie

@KamilSJaron
Copy link
Owner

Hi Sophie, I do like to see all the strange smudgeplots.

Sequencing strians/isolates and especially fungi quite often generates data without super satisfactory explanations. We had a similar case like this in bdeloid rotifer (https://kamilsjaron.github.io/peculiar-genomic-observations/unknown/2020/01/rotifer.html) - we never properly explained that peak!!!

That initial peak is a bit puzzling. To me it almost look like there is a genetic variation in the sample, but with only very few haplotypes (and the coverage of the first peak / 2n coverage would be the frequency of the least common haplotype and 2n would be all the homozygous k-mers in all). This is just a wild guess, but there is only so much I can guess from these plots.

The reason why you see the smudges all the way to right is becase k-mers from the peak on left pair up with the core genomic k-mers (those have have a few hundred x cov) and that creates lots of pairs that have very very small ratios. If you increase your L, you will def erase that smudge.

That is in fact what I usually advise people when they show me smudgeplots like that, but looking at the k-mer spectra, I am not so sure they are errors. if they were errors, how comes they form a distinct peak? I would imagine them to be ... together with all the other errors that have only few x cov. That's why I think there is something else going on.

THe second guess would be contamination, but then it would not pair up with the core genome. Unless the contamination was something incredibly closely related, but not the same. That would be a very very very unfortunate coincidence for you.

I hope this helps, I just skimmed through all the graphs (paternity leave, very limited time to support people these days). Let me know if you would have more questions (or if you figure it out)

@KamilSJaron KamilSJaron added smudgeplot_included if smudgeplot was posted with the quesiton / problem genomescope_included 0.2.5 Double-hung Smudgeplot done with the 0.2.5 Double-hung with curtains version labels Feb 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.2.5 Double-hung Smudgeplot done with the 0.2.5 Double-hung with curtains version genomescope_included smudgeplot_included if smudgeplot was posted with the quesiton / problem
Projects
None yet
Development

No branches or pull requests

2 participants