Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BIOIN-1570] Fixed edge case in variant annotation #8810

Conversation

ilyasoifer
Copy link
Collaborator

@ilyasoifer ilyasoifer commented May 3, 2024

Fixes #8788. Crash in calculating some annotations when the variant is very near the edge of a contig

@ilyasoifer ilyasoifer marked this pull request as draft May 3, 2024 16:49
@ilyasoifer ilyasoifer marked this pull request as ready for review May 3, 2024 18:18
@gokalpcelik
Copy link
Contributor

I'Il excuse myself out. Sorry for my late reply. I am not very much fond of this code to provide proper review. James will be back from vacation soon I guess.

@ilyasoifer
Copy link
Collaborator Author

I'Il excuse myself out. Sorry for my late reply. I am not very much fond of this code to provide proper review. James will be back from vacation soon I guess.

Sounds good!

@droazen droazen self-requested a review May 13, 2024 16:45
@droazen droazen self-assigned this May 13, 2024
Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any conception of how often this should happen? This requires indexing errors for indels that are close to the edge of the reference context? Should that ever be a problem?

@@ -250,7 +250,11 @@ protected void isHmerIndel(final VariantContext vc, final LocalContext localCont
// get byte before and after
final byte before = getReferenceNucleotide(localContext, vc.getStart() - 1);
final byte[] after = getReferenceHmerPlus(localContext, vc.getEnd() + 1, MOTIF_SIZE);

if (after.length==0){
logger.warn("Failed to get hmer from reference, isHmerIndel and RightMotif annotations will not be calculated. " +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to log many many times for every variant. We have the concept of a OneShotLogger in GATK that wraps the logging code and skips reporting if this error has already come up before. Wrap these messages in that at the very least so it doesn't inundate the user with millions of error lines.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. I actually used the logger on purpose (there is a oneshotlogger in this file already) because I thought that I would like the user to see that they are trying to call variants really close to the edge and not get this warning buried in the log. I don't expect to get millions messages like this (there are not that many edges of the contigs in a reasonable genome).
If this is convincing, let me know and I will revert this change.

logger.warn("Failed to get hmer from reference, isHmerIndel and RightMotif annotations will not be calculated. " +
"Start index: " + vc.getEnd() + 1 + " Reference length: " + localContext.ref.getBases().length);
return;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking inside of establishReadGroupFlowOrder() and it looks like it tries to disable annotation if it can't establish what the flow order is for the local context (which is explicitly the case for the user that brought up this issue)... Can you disentangle what is supposed to happen for this annotation class in the event that no-flow order can be established? The annotation requires a flow order in order to determine what kind of indel it is no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a method called isActualFlowOrderRequired in this abstract class that returns false, the annotations that the user enabled all inherit from that abstract class and do not require flow order. The only annotation that does require it is CycleSkipStatus.

@ilyasoifer
Copy link
Collaborator Author

ilyasoifer commented May 29, 2024

@jamesemery - thanks for the review. This is obviously very rare issue that should not happen on hg38 chromosomes at all (there are long stretches of Ns on the edges), but can probably happen if one has a genome with short contigs where they want to call variants, like happened in #8788 (there was a contig of length 333 or so).

In any case, it is probably a bug worth fixing, no harm that I can envision.

@ilyasoifer
Copy link
Collaborator Author

@jamesemery - could you take a look?

Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not fully understand what was being fixed in the previous review pass...

I still want the guard of one-shot loggers but i think you should update the messages to point out that it is likely a result of the variant context being too close to the edge of the reveference... i'm not sure its worth the trouble to do the ideal thing, which is only to warn once, per reference contig. Just update the messages to make it clear "motif not found, some info about it, probably you are too close to the reference Y annotation will not be computed at the edges of a contig" so its crystal clear what is going on. Otherwise this is good.

@@ -69,7 +69,25 @@ public Object[][] getTestData() {
// ins hmer indel
"TATCT C ATTGACCAA", "CA",
"ins", "1", "2", "A", "ATCTC", "TTGAC", "0.3", "NA", "h-indel"
},
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add to these comments a little more context for the nature of this error and what those are testing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments

try {
Utils.validIndex(index, bases.length);
} catch (IllegalArgumentException e) {
flowMissingOneShotLogger.warn("Failed to get hmer from reference. Start index: " + index + " Reference length: " + bases.length);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this also tied to being toward the end of the reference? You aren't asking the reference context for its actual length but the lenght of the bases view which is arbitrary and going to be deeply confusing to users. This error message should either try to give the actual reference name and index or give a clue as to why this could happen since that bases.length view is confusing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, could not find an easy way to get the length of the reference, localContext.ref.getContigLength is somehow hidden

@ilyasoifer
Copy link
Collaborator Author

@jamesemery - thanks for the useful comments, good catch about the reference length. Back to you

Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments look good

@ilyasoifer ilyasoifer merged commit d4744f7 into broadinstitute:master Jun 3, 2024
17 checks passed
@ilyasoifer ilyasoifer deleted the ilyasoifer/BIOIN-1570-fix-bug-flow-annotations branch June 3, 2024 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prevent users enabling annotations with mismatching data type (flow etc)
4 participants