Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

population-specific recombination rates? #2127

Open
gtsambos opened this issue Nov 29, 2022 · 10 comments
Open

population-specific recombination rates? #2127

gtsambos opened this issue Nov 29, 2022 · 10 comments

Comments

@gtsambos
Copy link
Member

gtsambos commented Nov 29, 2022

Milan and Marion have inquired (in a private communication) about implementing population-specific recombination rates in msprime. I now have a few more technical thoughts about it, and I'll stick them here so there's a record for the future.

(I note also that it's not possible to do this in SLiM at present -- @bhaller, am I right here? Have you thought about this before, and if so, are there any more caveats we should be aware of?)

  • If implemented, I think this would need to be a feature of the discrete-time Wright-Fisher simulator only. (Under a classic Hudson coalescent, you can't guarantee that the recombination event actually happens in a given population, since you aren't necessarily simulating any segments at the time of the recombination event itself. I think?)

To see how recombination is currently implemented, have a look at the dtwf_recombine method in algorithms.py (here), and also at the dtwf_generate_breakpoint method it references.
As far as I can tell, here are the essential steps:

  1. dtwf_generate_breakpoint uses the recombination map to generate physical position for the breakpoints (here and at a few later stages too). This is continually updated, and functions as something like a "distance to the next recombination breakpoint".
  2. we iterate through all of the extant segments in a big while loop (here) to place recombination breakpoints inside these segments (here) or between them (here)

Here's how we might extend this to multiple populations:

  1. allow the input to be a list of RateMaps, one for each population in the specified demography
  2. generate initial breakpoints for each population separately (here) So instead of k, we'd have k[i] for each population i.
  3. in the big while loop, check which population segment x is from. If x.population == i, we'd use and update k[i] in place of k.

This might still have some little problems -- for instance, what if we need to insert a recombination breakpoint between two segments from different populations? However, I think this shouldn't be too big a deal unless you have lots of migration and admixture (in which case, this model of recombination is a bit silly anyway)

@jeromekelleher
Copy link
Member

jeromekelleher commented Nov 29, 2022

I think this general strategy would work for the DTWF. It would be much more difficult (or indeed not make sense at all as you point out) in the Hudson coalescent because we have a global Fenwick tree for recombinations.

See previous related discussions:

@bhaller
Copy link

bhaller commented Nov 29, 2022

Hi @gtsambos,

(I note also that it's not possible to do this in SLiM at present -- @bhaller, am I right here? Have you thought about this before, and if so, are there any more caveats we should be aware of?)

It isn't specifically supported in SLiM, in the sense that there is no specific facility for setting one recombination rate for one subpopulation and a different rate for a different subpopulation. (I'm curious why this is desirable – what does it mean biologically? Is one population subject to some environmental chemical or stressor that increases their recombination rate? Who wants to model this and why?) But it is straightforward to do in SLiM even though there is no specific support. You would (a) set the recombination rate to the highest rate that you would want any individual to experience, and then (b) write a recombination() callback that would, for individuals in a given subpopulation, discard some of the recombination breakpoints that had been drawn by SLiM, keeping/discarding each based upon a binomial draw. Alternatively, you could simply draw the recombination breakpoints yourself, based upon whatever logic you want, and use addRecombinant() to generate your offspring with your chosen breakpoints. Either of those strategies would work fine, I think. The only built-in variation in recombination rates among individuals that SLiM specifically supports is different recombination rates for males versus females.

@gtsambos
Copy link
Member Author

gtsambos commented Dec 2, 2022

Thanks for this detailed reply @bhaller, and sorry that I'm coming back to it so late! It's great that there is more 'manual' way to do this in SLiM -- I might ping the people who asked us about this, just in case this is a viable option for them.

I'm curious why this is desirable – what does it mean biologically? Is one population subject to some environmental chemical or stressor that increases their recombination rate? Who wants to model this and why?

It's not something I'm super well versed on myself, but my understanding is that there's quite a bit of observed evidence for the evolution of recombination rates over time, both within and across species. I'm not sure how well-understood the biological causes of these changes are, except perhaps in rare cases like PRDM9, and ofc in these cases it's more that population membership is just a rough proxy for some relevant causal allele. However, allowing different simulated 'populations' to use different recombination rates may be a simple first step that helps us model this more realistically (regardless of what the underlying 'cause' is) ✨

@gtsambos
Copy link
Member Author

gtsambos commented Dec 2, 2022

so basically -- if it's observed that recombination rates vary between sets of organisms, then having some way of modelling that will make the simulated patterns of variation a bit more realistic. And that's regardless of what the downstream application of those simulations is.

@bhaller
Copy link

bhaller commented Dec 3, 2022

so basically -- if it's observed that recombination rates vary between sets of organisms, then having some way of modelling that will make the simulated patterns of variation a bit more realistic. And that's regardless of what the downstream application of those simulations is.

I see. Well, the same approach in SLiM would allow you to actually model the evolution of recombination rates, without the approximation of just making the rate different in one subpopulation than in another subpopulation. You could specifically check for a particular mutation in a given individual, and if the mutation is present, use a different recombination rate to generate a gamete coming from that individual. Indeed, you could model a whole set of QTLs that together determine the phenotype of the recombination rate for a given individual's gamete genesis. All of that sort of thing is actually quite trivial to do in SLiM. :->

@jeromekelleher
Copy link
Member

I think this does the trick for a simple three population split model with no migration and different recombination rates:

import msprime
import tskit

split_time = 200
demography = msprime.Demography()
demography.add_population(name="A", initial_size=10_000)
demography.add_population(name="B", initial_size=5_000)
demography.add_population(name="C", initial_size=1_000)
demography.add_population_split(time=split_time, derived=["A", "B"], ancestral="C")

L = 100_000
seed = 1236
tsA = msprime.sim_ancestry(
    {"A": 3},
    demography=demography,
    recombination_rate=1e-8,
    sequence_length=L,
    random_seed=seed,
    end_time=split_time,
)

tsB = msprime.sim_ancestry(
    {"B": 3},
    demography=demography,
    recombination_rate=5e-8,
    sequence_length=L,
    # NOTE probably important we use different seeds here. For real
    # sims just leave out the seed entirely, just setting it here
    # for reproducibility.
    random_seed=seed + 1,
    end_time=split_time,
)

print(tsA.draw_text())
print(tsB.draw_text())

tablesA = tsA.dump_tables()
tablesB = tsB.dump_tables()

tablesB.edges.child += tsA.num_nodes
tablesB.edges.parent += tsA.num_nodes
tablesB.nodes.individual += tsA.num_individuals

individuals_dict = tablesB.individuals.asdict()
del individuals_dict["metadata_schema"]
tablesA.individuals.append_columns(**individuals_dict)
nodes_dict = tablesB.nodes.asdict()
del nodes_dict["metadata_schema"]
tablesA.nodes.append_columns(**nodes_dict)
edges_dict = tablesB.edges.asdict()
del edges_dict["metadata_schema"]
tablesA.edges.append_columns(**edges_dict)
tablesA.sort()
tablesA.build_index()
ts_joint = tablesA.tree_sequence()

tsC = msprime.sim_ancestry(
    initial_state=ts_joint,
    demography=demography,
    recombination_rate=2.5e-8,
    random_seed=seed + 2,
)

print(tsC)

We should probably improve the API in tskit for "appending" a table to another, but this looks OK otherwise?

@gtsambos
Copy link
Member Author

gtsambos commented Dec 9, 2022

nice, thanks Jerome! I didn't realise it was possible to modify columns of (lone) tables with stuff like this:

tablesB.edges.child += tsA.num_nodes

and yes, I agree that the bit at the end where the tables are converted in and out of dicts is likely to be the most confusing. It would also be nice to have the nodes relabelled so they conform to the usual time ordering. I'm happy to look at doing that properly, but hopefully this gives Milan and Marion something to start with in the meantime.

@gtsambos
Copy link
Member Author

gtsambos commented Dec 9, 2022

also, users doing this^ requires them to have a fairly deep understanding of the table structures

@jeromekelleher
Copy link
Member

It would also be nice to have the nodes relabelled so they conform to the usual time ordering

Could run simplify on it, that would rejigger things in the expected way.

also, users doing this^ requires them to have a fairly deep understanding of the table structures

Yes, we should have some functionality to make this easier either here or in tskit.

@millanek
Copy link

This is perfect, a big thank you @jeromekelleher and @gtsambos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants