Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SKAT/CMC: Missing covariates are not imputed, but dropped #144

Open
katyaorlova opened this issue Feb 6, 2023 · 2 comments
Open

SKAT/CMC: Missing covariates are not imputed, but dropped #144

katyaorlova opened this issue Feb 6, 2023 · 2 comments

Comments

@katyaorlova
Copy link

Thank you for creating and maintaining this software.

In the wiki, you state that

Note: Missing data in the covariate file can be labeled by any non-numeric value (e.g. NA). They will be automatically imputed to the mean value in the data file.

However, samples with missing covariates are simply dropped from my analysis, per the .log file when running SKAT, CMC, FamCMC, FamSKAT:

[WARN] Total [ 63 ] samples are dropped from VCF file due to missing covariate.

How should I assure that my samples with missing covariates are not dropped?

For reference, here's a simplified version of my codewhen running FamSKAT + FamCMC:
rvtest --inVcf exons.vcf.gz --pheno phenos.txt --pheno-name dft --freqUpper 0.01 --impute drop --covar cov.txt --covar-name AgeAtExam,Sex,V7,V8,V9,WV,ChipNum,CohortNum,PC1_C12,PC2_C12,PC3_C12 --geneFile refFlat_hg19.txt.gz --burden famcmc --kernel famskat --kinship C1C2.kinship --numThread 3 --out output;
(Note, I tried removing the --impute drop flag, which prevents imputation of missing genotypes, but this doesn't alter covariate dropping)

Thank you in advance,
Katya

@zhanxw
Copy link
Owner

zhanxw commented Feb 6, 2023 via email

@katyaorlova
Copy link
Author

katyaorlova commented Feb 6, 2023

Yes, thank you for the quick reply; I ended up doing just that. I mostly wrote to double check whether there was a different issue that was causing this in my code, but it sounds like it is a default setting to drop samples with NA covariates.

Here's the code if anyone wants to save time:

`cols_to_impute <- c("V7", "V8", "V9")

for (col_name in cols_to_impute) {
col <- cov[, col_name]
col_mean <- mean(col, na.rm = TRUE)
cov[is.na(col), col_name] <- col_mean
} `

Thanks again,
Katya

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants