Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to simulate "above" local roots #2157

Open
jeromekelleher opened this issue Feb 21, 2023 · 8 comments
Open

Add option to simulate "above" local roots #2157

jeromekelleher opened this issue Feb 21, 2023 · 8 comments
Milestone

Comments

@jeromekelleher
Copy link
Member

A natural addition to being able to keep a record of all genomes that we simulate through (i.e. keeping unary #2128 ) is to keep a record of the nodes that we go through after a local root has been reached.

This is simple enough to do, we just stop snipping out sections of the ancestry that have fully coalesced and change the stopping condition of the simulation.

cc @sgravel

@jeromekelleher jeromekelleher added this to the 1.3.0 milestone Feb 21, 2023
@jeromekelleher
Copy link
Member Author

What should we call it? stop_on_coalescence or something?

@sgravel
Copy link

sgravel commented Feb 21, 2023

stop_at_MRCA ?

@sgravel
Copy link

sgravel commented Feb 21, 2023

Perhaps simulate_above_MRCA would be less ambiguous.

Would need to make sure to catch people giving no termination criterion, though.

@jeromekelleher
Copy link
Member Author

How about stop_at_local_mrca=True as the default? Keeping the overall stopping condition as all intervals have reached a local MRCA seems fine. The only case when we wouldn't want to do this is when we have a fixed pedigree, and I think it would be unusual for us to have full coalescence before leaving the pedigree in most situations?

But, perhaps we need to think about local and global stopping conditions explicitly?

@sgravel
Copy link

sgravel commented Feb 21, 2023

stop_at_local_mrca is quite clear.
I'd rather be explicit about global stopping conditions. Say if someone runs a test simulation on a short span of the genome, the global stopping may be unexpected. We've had cases where we want to simulate ancestry to a specific time in the past in a DTWF model (for example to look at contributions from a given ancestor).

@jeromekelleher
Copy link
Member Author

OK, sounds good to me. Let's call it stop_at_local_mrca so, and think about the global stopping condition separately (I'll open an issue)

@jeromekelleher
Copy link
Member Author

                    min_overlap = 2
                    if not self.stop_at_local_mrca:
                        min_overlap = 0

                    # Update the number of extant segments.
                    if self.S[left] == min_overlap:
                        self.S[left] = 0
                        right = self.S.succ_key(left)
                    else:
                        right = left
                        while right < r_max and self.S[right] != min_overlap:
                            self.S[right] -= 1
                            right = self.S.succ_key(right)
                        alpha = self.alloc_segment(
                            left=left,
                            right=right,
                            node=u,
                            population=population_index,
                            label=label,
                        )

@GertjanBisschop and I had a look at this, and we think something like this snippet in the core merge_x_ancestors functions will do the trick.

Actually ,maybe this is more generally useful. What if we define a parameter min_local_lineages which tells us when to stop tracking the ancestry of segments along the genome? By default this would be 1 (stop when the region has coalesced). We can set it to 0 to say "never stop simulating local regions" (you can't hit zero), or if we set it to 2 (say) this would stop simulating each region when there are only two lineages left.

I'm not sure how useful the last idea is, but worth thinking about?

@jeromekelleher
Copy link
Member Author

For reference, the current code looks like this:

                    if self.S[left] == 2:
                        self.S[left] = 0 
                        right = self.S.succ_key(left)
                    else:
                        right = left
                        while right < r_max and self.S[right] != 2:
                            self.S[right] -= 1
                            right = self.S.succ_key(right)
                        alpha = self.alloc_segment(
                            left=left,
                            right=right,
                            node=u,
                            population=population_index,
                            label=label,
                        )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants