Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I identify the REF from a VCF #1170

Open
hyanwong opened this issue Jan 19, 2024 · 4 comments · May be fixed by #1175
Open

How do I identify the REF from a VCF #1170

hyanwong opened this issue Jan 19, 2024 · 4 comments · May be fixed by #1175

Comments

@hyanwong
Copy link

When I use cyvcf2, I can access a variant.REF value, but I can't find any obvious references to the word "REF" in the Sgkit docs, for looking at a VCF read in with vcf_to_zarr. I assume that the first in the variant_alleles array is the vcf REF: is that guaranteed, and if so, can it be documented?

@tomwhite
Copy link
Collaborator

Yes, it is guaranteed. The format is documented here: https://github.com/pystatgen/vcf-zarr-spec/blob/main/vcf_zarr_spec.md, where it says:

Note that the REF and ALT fields are combined into a single Zarr array.

@hyanwong
Copy link
Author

Thanks. Is it worth noting that in SGkit somewhere too? By my nitpicky reading of the docs, it doesn't say that they are combine with REF first, either (just that they are "combined")?

@tomwhite
Copy link
Collaborator

Very true! Are you interested in contributing a patch to fix these issues?

@hyanwong
Copy link
Author

I'm happy to do so.

@hyanwong hyanwong linked a pull request Jan 22, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants