Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRAGEN Joint Detection DO NOT MERGE #8616

Open
wants to merge 51 commits into
base: master
Choose a base branch
from

Conversation

davidbenjamin
Copy link
Contributor

Initial implementation of DRAGEN joint detection. Functional equivalence with respect to DRAGEN joint detection is actually worse than before due to many outstanding questions.

We currently have no idea how joint detection is supposed to interaction with BQD and FRD. I have tried a few guesses and none have worked (see below for functional equivalence results of the particular guess used in this PR). The interplay of joint detection with BQD and FRD is complicated for several reasons.

Naively one would simply define the BDQ and FRD likelihoods on entire haplotypes rather than alleles at one locus. Unresolved difficulties with this include:

  • BQD and FRD are defined with respect to one particular variant position. How would we define them for a haplotype that has no particular locus?
  • BQD involves the base qualities at one particular variant locus, how would this be defined for an entire haplotype?
  • The above is especially thorny for haplotypes that exhibit multiple variants.
  • The FRD prior is only defined for individual events, not haplotypes.
  • The BQD and FRD models use reads that overlap a variant site, but it is not clear how to use reads that only partially intersect a haplotype.
  • BQD and FRD likelihoods are only defined for homozygous haplotypes, but heterozygous combinations of haplotypes contribute to homozygous genotypes all loci where the distinct haplotypes agree.

Clearly, generalizing BQD and FRD to entire haplotypes is not straightforward. Nor does it suffice to produce "raw" genotype likelihoods using the joint detection approach and then apply BQD and FRD on variant loci afterwards. Some difficulties with this include:

  • BQD and FRD require the read-allele likelihoods matrix. Where are these likelihoods supposed to come from? The pre-joint-detection unrigorous "marginalization" where to each allele we assign the maximum likelihood over all haplotypes supporting that allele? Some read-allele likelihoods matrix derived from the read-haplotype likelihoods matrix?
  • The drawbacks of the faulty "marginalization" actually become more severe with joint detection since genotyping multiple alleles together in a single determined span produces more haplotypes, which in turn increases the risk of the read-allele likelihoods cherry-picking from too many different haplotypes for different reads.
  • The BQD and FRD models produce likelihoods on an absolute scale that is only meaningful relative to genotyping likelihoods from the pre-joint-detection approach. They do not inherently "play nicely" with the posterior probabilities produced by joint detection.
  • BQD and FRD as currently implemented in the GATK modify likelihoods before applying a prior, whereas joint detection yields posterior probabilities. Are we supposed to somehow un-apply the prior to joint detection likelihoods, apply BQD and FRD, then re-apply the prior? It is not clear.

…etermined and undetermined, but tests are failing since I haven't updated genotyping
…he pre-joint detection PD haplotypes with just one determined event
… in the QUALS maybe from GL to PL conversion
@davidbenjamin
Copy link
Contributor Author

davidbenjamin commented Dec 11, 2023

Here is the functional equivalence report for this branch. It is slighlty less functionally-equivalent than the GATK master branch.
jd-report.pdf

@davidbenjamin
Copy link
Contributor Author

@eitanbanks @droazen @jamesemery @ldgauthier Shelving this pending guidance on implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant