Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHIPMIX NA value #36

Open
iagooteroc opened this issue Feb 11, 2022 · 1 comment
Open

CHIPMIX NA value #36

iagooteroc opened this issue Feb 11, 2022 · 1 comment

Comments

@iagooteroc
Copy link

iagooteroc commented Feb 11, 2022

Hello again. We are successfully using VerifyBamID and we are very glad with it. Now, I'm wondering why the CHIPMIX values are NA, when it is stated that it is "NA if the external genotype is unavailable". I'm using --SVDPrefix $(VERIFY_BAM_ID_HOME)/resource/1000g.phase3.10k.b38.vcf.gz.dat so the external genotype should be available, right?
Thank you again for your time. This is an example of our output:

#SEQ_ID            RG  CHIP_ID  #SNPS  #READS  AVG_DP   FREEMIX      FREELK1  FREELK0  FREE_RH  FREE_RA  CHIPMIX  CHIPLK1  CHIPLK0  CHIP_RH  CHIP_RA  DPREF  RDPHET  RDPALT
XXXXXXXXX-XXX_XXX  NA  NA       10000  329131  32.9131  1.61423e-05  -82725   -132873  NA       NA       NA       NA       NA       NA       NA       NA     NA      NA

EDIT:
I also have two more questions:
Would you expect FREEMIX value to be higher due to ONT error rate?
How do you think the program will perform with tumours? What is the expected impact of CN aberrations in the statistics?
Thank you.

@Griffan
Copy link
Owner

Griffan commented Feb 20, 2022

Hi, @iagooteroc
The presence of CHIPMIX field in the final report is for backward compatibility with VB1.
VB2 itself doesn't have an option to accept the genotype info for the intended sample. Instead, it estimates the genotype from the sequencing data( AKA "free" of genotyping data, FREEMIX). If you do have this matched chip-based genotyping data and considering using this information, you could try VB1.
The genotype information you saw in the resource file comes from the samples in the reference panel((not the intended sample nor the contaminating sample), which served the purpose of estimating the allele frequency at each marker.

Would you expect FREEMIX value to be higher due to the ONT error rate?

I don't have a direct answer to this question. Both sequencing error and alignment error for ONT reads will be very different from short reads(e.g. different configurations of aligner parameters could change its behavior to choose indel over SNP error, and the dominant indel errors are rare in Illumina reads). It could lead to both losses or gain of heterozygosity depending on various conditions. It is recommended to choose markers in regions that are easy for ONT reads to pass through if permitted.

How do you think the program will perform with tumours? What is the expected impact of CN aberrations in the statistics?

VB2 is implemented under the assumption of the diploid genomes. If you can detect and skip the CN aberration regions, the assumption will still hold.

Fan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants