Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demography.from_demes(): ancestral population sizes and migration rates missing #2255

Open
simonharnqvist opened this issue Feb 2, 2024 · 6 comments

Comments

@simonharnqvist
Copy link

Hi,

I'm having issues converting my Demes graphs to msprime Demography objects; both population sizes and migration rates are missing unexpectedly. My Demes graphs match what I think I'm trying to specify, so it looks to me like a bug in msprime.Demography.from_demes() rather than in Demes.

I first encountered this problem with msprime 1.2.0, and upgraded to 1.3.0 in a fresh environment, with the exact same behaviour. The versions of key packages for this example are:

demes                     0.2.3              pyhd8ed1ab_0    conda-forge
demesdraw                 0.4.0              pyhd8ed1ab_0    conda-forge
msprime                   1.3.0           py312h0767fef_0    conda-forge
python                    3.12.1          hdf0ec26_1_cpython    conda-forge
tskit                     0.5.6           py312hf635c46_0    conda-forge

Missing population sizes

I've specified a Demes graph with three current demes (a, b, c) and their ancestors (ab_anc and abc_anc):

b = demes.Builder(time_units="generations")
b.add_deme("abc_anc", epochs=[{"start_size":1e6, "end_time":1e5}])
b.add_deme("ab_anc", ancestors=["abc_anc"], epochs=[{"start_size":1e6,  "end_time":1e4}])
b.add_deme("a", ancestors=["ab_anc"],  epochs=[{"start_size":1e6}])
b.add_deme("b", ancestors=["ab_anc"],  epochs=[{"start_size":1e6}])
b.add_deme("c", ancestors=["abc_anc"], epochs=[{"start_size":1e6}])
graph = b.resolve()

The graph looks like what I was trying to create:
image

And indeed population sizes are what we specified:

[(pop_idx, pop.epochs[0].start_size) for pop_idx, pop in enumerate(graph.demes)]
[(0, 1000000.0),
 (1, 1000000.0),
 (2, 1000000.0),
 (3, 1000000.0),
 (4, 1000000.0)]

But when converting to an msprime Demography, population sizes are not preserved for the ancestral populations:

msprime.Demography.from_demes(graph)

image

This is not just a table formatting error either:

[Population(initial_size=0, growth_rate=0, name='abc_anc', description='', extra_metadata={}, default_sampling_time=100000.0, initially_active=False, id=0),
 Population(initial_size=0, growth_rate=0, name='ab_anc', description='', extra_metadata={}, default_sampling_time=10000.0, initially_active=False, id=1),
 Population(initial_size=1000000.0, growth_rate=0, name='a', description='', extra_metadata={}, default_sampling_time=0.0, initially_active=True, id=2),
 Population(initial_size=1000000.0, growth_rate=0, name='b', description='', extra_metadata={}, default_sampling_time=0.0, initially_active=True, id=3),
 Population(initial_size=1000000.0, growth_rate=0, name='c', description='', extra_metadata={}, default_sampling_time=0.0, initially_active=True, id=4)]

I tried also specifying "end_size"...

b = demes.Builder(time_units="generations")
b.add_deme("abc_anc", epochs=[{"start_size":1e6, "end_time":1e5}])
b.add_deme("ab_anc", ancestors=["abc_anc"], epochs=[{"start_size":1e6,  "end_time":1e4}])
b.add_deme("a", ancestors=["ab_anc"],  epochs=[{"start_size":1e6, "end_size":1e6}])
b.add_deme("b", ancestors=["ab_anc"],  epochs=[{"start_size":1e6, "end_size":1e6}])
b.add_deme("c", ancestors=["abc_anc"], epochs=[{"start_size":1e6, "end_size":1e6}])
    
graph2 = b.resolve()

...but that made no difference.

Missing migration rates
Adding some migration to the demes specification:

b = demes.Builder(time_units="generations")
b.add_deme("abc_anc", epochs=[{"start_size":1e6, "end_time":1e5}])
b.add_deme("ab_anc", ancestors=["abc_anc"], epochs=[{"start_size":1e6,  "end_time":1e4}])
b.add_deme("a", ancestors=["ab_anc"],  epochs=[{"start_size":1e6}])
b.add_deme("b", ancestors=["ab_anc"],  epochs=[{"start_size":1e6}])
b.add_deme("c", ancestors=["abc_anc"], epochs=[{"start_size":1e6}])
b.add_migration(rate=1e-5, source="a", dest="b")
b.add_migration(rate=1e-5, source="b", dest="a")
b.add_migration(rate=1e-5, source="c", dest="ab_anc")
    
graph3 = b.resolve()

demesdraw.tubes(graph3)

Again, demes builds the graph that I intended to specify:
image

But the c -> ab_anc migration rate is missing:
image

@simonharnqvist simonharnqvist changed the title Demography.from_demes(): parameter values missing Demography.from_demes(): ancestral population sizes and migration rates missing Feb 2, 2024
@apragsdale
Copy link
Contributor

Hi @simonharnqvist -- thanks for opening the issue. I believe everything is working as expected, and those missing population sizes and migration rates will appear as the populations come into existence.

If we specify demog = msprime.Demography.from_demes(graph3), and then look at demog.events, we should see

[PopulationParametersChange(time=10000.0, initial_size=1000000.0, growth_rate=None, population='ab_anc'),
 PopulationSplit(time=10000.0, derived=['a', 'b'], ancestral='ab_anc'),
 MigrationRateChange(time=10000.0, rate=0, source='b', dest='a'),
 MigrationRateChange(time=10000.0, rate=0, source='a', dest='b'),
 MigrationRateChange(time=10000.0, rate=1e-05, source='ab_anc', dest='c'),
 PopulationParametersChange(time=100000.0, initial_size=1000000.0, growth_rate=None, population='abc_anc'),
 PopulationSplit(time=100000.0, derived=['c', 'ab_anc'], ancestral='abc_anc'),
 MigrationRateChange(time=100000.0, rate=0, source='ab_anc', dest='c')]

For example, population ab_anc is initialized at time zero with population size 0, since it won't exist until the split event involving populations a and b. At that time (10000), the population size is set the split event takes place. The appropriate migration rates are also turned on and off in the next few entries in the demog.events list (noting that the meaning of "source" and "dest" are reversed, since msprime looks backwards in time).

Hope this helps explain the behavior, but please let us know if there seem to still be issues or something else could be clarified.

@jeromekelleher
Copy link
Member

I think @apragsdale is correct here, and it's just a quirk of how msprime represents these models. The 'debug()' method should resolve these problems - hopefully this section of the docs will clarify:
https://tskit.dev/msprime/docs/stable/demography.html#debugging-tools

Does that help?

@simonharnqvist
Copy link
Author

Thanks @apragsdale and @jeromekelleher for getting back to me so quickly. You are (of course) right that the population sizes aren't lost (and clearly I could have checked for that very easily - thanks for your patience).

Unless there is a logic to why models specified directly as a Demography object are displayed differently to those generated from Demes, can I suggest that the representations are standardised at some point to both show the full Demography table with all the parameter values? I find it very helpful to have the whole model represented, as it is when creating an msprime-native Demography.

That's an enhancement suggestion, however - so I'll close this bug report now. Thanks both!

@jeromekelleher
Copy link
Member

What was your resolution here @simonharnqvist - does the debug() output show what you expect? I'm not sure there's a difference between demes and non-demes models, I think it's probably just how msprime does things generally. Maybe this is something we need to be clearer about in the docs?

@simonharnqvist
Copy link
Author

@jeromekelleher - both .events and .debug() shows that msprime indeed creates the model I want

I'm not sure I've understood your point correctly here - this is not the behaviour if I specify a model directly in msprime. Let's create the first model without migration directly as an msprime Demography:

msprime_demography = msprime.Demography()
msprime_demography.add_population(initial_size=1e6, name="abc_anc")
msprime_demography.add_population(initial_size=1e6, name="ab_anc")
msprime_demography.add_population(initial_size=1e6, name="a")
msprime_demography.add_population(initial_size=1e6, name="b")
msprime_demography.add_population(initial_size=1e6, name="c")
msprime_demography.add_population_split(time=1e6, ancestral="abc_anc", derived=["ab_anc", "c"])
msprime_demography.add_population_split(time=1e5, ancestral="ab_anc", derived=["a", "b"])
msprime_demography.sort_events()

Here the representation of the Demography object includes all the population sizes (which IMO is far less confusing):
image

If I convert this to a Demes graph and then back to a Demography...

demesgraph = msprime_demography.to_demes()
msprime.Demography.from_demes(demesgraph)

... the representation now loses its population sizes:
image

If this is intended or expected behaviour, then explaning this in the docs would be very helpful - and perhaps clarifying to users whether this would have any effects on downstream simulations.

@jeromekelleher
Copy link
Member

Ah thanks - that is confusing! I suspect there may be some slight semantic differences here in the Demes model vs msprime. Let's keep this open to see if we can do any better.

@jeromekelleher jeromekelleher reopened this Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants