CHIPMIX NA value #36

iagooteroc · 2022-02-11T11:58:46Z

Hello again. We are successfully using VerifyBamID and we are very glad with it. Now, I'm wondering why the CHIPMIX values are NA, when it is stated that it is "NA if the external genotype is unavailable". I'm using --SVDPrefix $(VERIFY_BAM_ID_HOME)/resource/1000g.phase3.10k.b38.vcf.gz.dat so the external genotype should be available, right?
Thank you again for your time. This is an example of our output:

#SEQ_ID            RG  CHIP_ID  #SNPS  #READS  AVG_DP   FREEMIX      FREELK1  FREELK0  FREE_RH  FREE_RA  CHIPMIX  CHIPLK1  CHIPLK0  CHIP_RH  CHIP_RA  DPREF  RDPHET  RDPALT
XXXXXXXXX-XXX_XXX  NA  NA       10000  329131  32.9131  1.61423e-05  -82725   -132873  NA       NA       NA       NA       NA       NA       NA       NA     NA      NA

EDIT:
I also have two more questions:
Would you expect FREEMIX value to be higher due to ONT error rate?
How do you think the program will perform with tumours? What is the expected impact of CN aberrations in the statistics?
Thank you.

The text was updated successfully, but these errors were encountered:

Griffan · 2022-02-20T03:59:10Z

Hi, @iagooteroc
The presence of CHIPMIX field in the final report is for backward compatibility with VB1.
VB2 itself doesn't have an option to accept the genotype info for the intended sample. Instead, it estimates the genotype from the sequencing data( AKA "free" of genotyping data, FREEMIX). If you do have this matched chip-based genotyping data and considering using this information, you could try VB1.
The genotype information you saw in the resource file comes from the samples in the reference panel((not the intended sample nor the contaminating sample), which served the purpose of estimating the allele frequency at each marker.

Would you expect FREEMIX value to be higher due to the ONT error rate?

I don't have a direct answer to this question. Both sequencing error and alignment error for ONT reads will be very different from short reads(e.g. different configurations of aligner parameters could change its behavior to choose indel over SNP error, and the dominant indel errors are rare in Illumina reads). It could lead to both losses or gain of heterozygosity depending on various conditions. It is recommended to choose markers in regions that are easy for ONT reads to pass through if permitted.

How do you think the program will perform with tumours? What is the expected impact of CN aberrations in the statistics?

VB2 is implemented under the assumption of the diploid genomes. If you can detect and skip the CN aberration regions, the assumption will still hold.

Fan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHIPMIX NA value #36

CHIPMIX NA value #36

iagooteroc commented Feb 11, 2022 •

edited

Griffan commented Feb 20, 2022 •

edited

CHIPMIX NA value #36

CHIPMIX NA value #36

Comments

iagooteroc commented Feb 11, 2022 • edited

Griffan commented Feb 20, 2022 • edited

iagooteroc commented Feb 11, 2022 •

edited

Griffan commented Feb 20, 2022 •

edited