Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GATK VariantsToTable: QUAL="." in input becomes QUAL="-10.0" in output #8748

Open
friedav opened this issue Mar 22, 2024 · 0 comments
Open

Comments

@friedav
Copy link

friedav commented Mar 22, 2024

Bug Report

Affected tool(s) or class(es)

gatk VariantsToTable

Affected version(s)

The Genome Analysis Toolkit (GATK) v4.5.0.0

Description

Hi,

when running the VariantsToTable tool on a VCF file in which no genotype data are included and no QUAL information is given (i.e. .), the QUAL column in the output table contains -10.0.

In an attempt to understand this unexpected behavior, I traced it back to the getPhredScaledQual() function in
https://github.com/samtools/htsjdk/blob/master/src/main/java/htsjdk/variant/variantcontext/CommonInfo.java,
where getLog10PError() * -10) + 0.0 is returning -10.0, thus getLog10PError() seems to be returning 1.

This is some example input and output:

$ # columns in input VCF: no genotype data
$ zcat variants_chr2.vcf.gz  | grep '#CHROM'
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO

$ # in input VCF, QUAL is "."
$ zcat variants_chr2.vcf.gz  | grep -v '##' | cut -f1-8 | head
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
2       10797   rs28888107      C       T       .       PASS    IMPUTED;AF=0.0127291;MAF=0.0127291;AVG_CS=0.998805;R2=0.925789
2       11336   rs113656530     C       G       .       PASS    IMPUTED;AF=0.0282261;MAF=0.0282261;AVG_CS=0.997855;R2=0.940762
2       11357   rs111385029     A       G       .       PASS    IMPUTED;AF=0.0282287;MAF=0.0282287;AVG_CS=0.997853;R2=0.940673
2       11486   rs73138514      A       G       .       PASS    IMPUTED;AF=0.0283085;MAF=0.0283085;AVG_CS=0.997773;R2=0.938685
2       11594   rs114792740     G       T       .       PASS    IMPUTED;AF=0.0183994;MAF=0.0183994;AVG_CS=0.99201;R2=0.654262
2       11607   rs73138516      T       C       .       PASS    IMPUTED;AF=0.0282278;MAF=0.0282278;AVG_CS=0.997854;R2=0.940712
2       11834   rs73910134      A       G       .       PASS    IMPUTED;AF=0.0282808;MAF=0.0282808;AVG_CS=0.9978;R2=0.938963
2       11842   rs13390778      C       G       .       PASS    IMPUTED;AF=0.0282808;MAF=0.0282808;AVG_CS=0.9978;R2=0.938963
2       11944   rs10172629      C       T       .       PASS    IMPUTED;AF=0.0124383;MAF=0.0124383;AVG_CS=0.998878;R2=0.928775

$ gatk VariantsToTable -V variants_chr2.vcf.gz  -O variants_chr2.tsv

$ # in input table, QUAL is "-10.0"
$ head variants_chr2.tsv
CHROM   POS     ID      REF     ALT     QUAL    FILTER  AF      MAF     AVG_CS  R2      ER2     IMPUTED TYPED
2       10797   rs28888107      C       T       -10.0   PASS    0.0127291       0.0127291       0.998805        0.925789        NA      true    NA
2       11336   rs113656530     C       G       -10.0   PASS    0.0282261       0.0282261       0.997855        0.940762        NA      true    NA
2       11357   rs111385029     A       G       -10.0   PASS    0.0282287       0.0282287       0.997853        0.940673        NA      true    NA
2       11486   rs73138514      A       G       -10.0   PASS    0.0283085       0.0283085       0.997773        0.938685        NA      true    NA
2       11594   rs114792740     G       T       -10.0   PASS    0.0183994       0.0183994       0.99201 0.654262        NA      true    NA
2       11607   rs73138516      T       C       -10.0   PASS    0.0282278       0.0282278       0.997854        0.940712        NA      true    NA
2       11834   rs73910134      A       G       -10.0   PASS    0.0282808       0.0282808       0.9978  0.938963        NA      true    NA
2       11842   rs13390778      C       G       -10.0   PASS    0.0282808       0.0282808       0.9978  0.938963        NA      true    NA
2       11944   rs10172629      C       T       -10.0   PASS    0.0124383       0.0124383       0.998878        0.928775        NA      true    NA

Expected behavior

If QUAL = . in input and no genotype data present, QUAL = . in output

Actual behavior

If QUAL = . in input, QUAL = -10.0 in output

In case this is the expected behavior and not a bug, I would appreciate some insights in why that is.

Thanks,
Friederike

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant