SKAT/CMC: Missing covariates are not imputed, but dropped #144

katyaorlova · 2023-02-06T20:26:39Z

Thank you for creating and maintaining this software.

In the wiki, you state that

Note: Missing data in the covariate file can be labeled by any non-numeric value (e.g. NA). They will be automatically imputed to the mean value in the data file.

However, samples with missing covariates are simply dropped from my analysis, per the .log file when running SKAT, CMC, FamCMC, FamSKAT:

[WARN] Total [ 63 ] samples are dropped from VCF file due to missing covariate.

How should I assure that my samples with missing covariates are not dropped?

For reference, here's a simplified version of my codewhen running FamSKAT + FamCMC:
rvtest --inVcf exons.vcf.gz --pheno phenos.txt --pheno-name dft --freqUpper 0.01 --impute drop --covar cov.txt --covar-name AgeAtExam,Sex,V7,V8,V9,WV,ChipNum,CohortNum,PC1_C12,PC2_C12,PC3_C12 --geneFile refFlat_hg19.txt.gz --burden famcmc --kernel famskat --kinship C1C2.kinship --numThread 3 --out output;
(Note, I tried removing the --impute drop flag, which prevents imputation of missing genotypes, but this doesn't alter covariate dropping)

Thank you in advance,
Katya

The text was updated successfully, but these errors were encountered:

zhanxw · 2023-02-06T20:39:26Z

Can you recode the covariate file and imputed the missing covariates?Sent from my iPhoneOn Feb 6, 2023, at 2:26 PM, katyaorlova ***@***.***> wrote: Thank you for creating and maintaining this software. In the wiki, you state that Note: Missing data in the covariate file can be labeled by any non-numeric value (e.g. NA). They will be automatically imputed to the mean value in the data file. However, samples with missing covariates are simply dropped from my analysis, per the .log file when running SKAT, CMC, FamCMC, FamSKAT: [WARN] Total [ 63 ] samples are dropped from VCF file due to missing covariate. How should I assure that my samples with missing covariates are not dropped? For reference, here's a simplified version of my codewhen running FamSKAT + FamCMC: rvtest --inVcf exons.vcf.gz --pheno phenos.txt --pheno-name dft --freqUpper 0.01 --impute drop --covar cov.txt --covar-name AgeAtExam,Sex,V7,V8,V9,WV,ChipNum,CohortNum,PC1_C12,PC2_C12,PC3_C12 --geneFile refFlat_hg19.txt.gz --burden famcmc --kernel famskat --kinship C1C2.kinship --numThread 3 --out output; (Note, I tried removing the --impute drop flag, which prevents imputation of missing genotypes, but this doesn't alter covariate dropping) Thank you in advance, Katya —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

katyaorlova · 2023-02-06T21:02:22Z

Yes, thank you for the quick reply; I ended up doing just that. I mostly wrote to double check whether there was a different issue that was causing this in my code, but it sounds like it is a default setting to drop samples with NA covariates.

Here's the code if anyone wants to save time:

`cols_to_impute <- c("V7", "V8", "V9")

for (col_name in cols_to_impute) {
col <- cov[, col_name]
col_mean <- mean(col, na.rm = TRUE)
cov[is.na(col), col_name] <- col_mean
} `

Thanks again,
Katya

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SKAT/CMC: Missing covariates are not imputed, but dropped #144

SKAT/CMC: Missing covariates are not imputed, but dropped #144

katyaorlova commented Feb 6, 2023

zhanxw commented Feb 6, 2023 via email

katyaorlova commented Feb 6, 2023 •

edited

SKAT/CMC: Missing covariates are not imputed, but dropped #144

SKAT/CMC: Missing covariates are not imputed, but dropped #144

Comments

katyaorlova commented Feb 6, 2023

zhanxw commented Feb 6, 2023 via email

katyaorlova commented Feb 6, 2023 • edited

katyaorlova commented Feb 6, 2023 •

edited